Différences
Ci-dessous, les différences entre deux révisions de la page.
Les deux révisions précédentes Révision précédente Prochaine révision | Révision précédente Dernière révision Les deux révisions suivantes | ||
public:support:labex-efl-gpu:tal-labex-gpu [2021/11/30 15:28] garciaflores |
public:support:labex-efl-gpu:tal-labex-gpu [2022/05/13 11:27] garciaflores [2. Copying data] |
||
---|---|---|---|
Ligne 1: | Ligne 1: | ||
# Accesing the TAL/LABEX EFL GPU server | # Accesing the TAL/LABEX EFL GPU server | ||
- | + | The server gives access to **8 GPUs Nvidia GEForce RTX 2080 with 8 GB of RAM each** in one node. This server is reserved to external @LipnLab [LABEX EFL](https:// | |
- | - **`lipn-tal-labex`** | + | |
## 1. Connecting to the server | ## 1. Connecting to the server | ||
Ligne 105: | Ligne 104: | ||
$ scp -P 60022 my_file.txt user_name@lipnssh.univ-paris13.fr: | $ scp -P 60022 my_file.txt user_name@lipnssh.univ-paris13.fr: | ||
# copying folders recursevly | # copying folders recursevly | ||
- | $ scp -r local_folder user_name@tal.lipn.univ-paris13.fr: | + | $ scp -P 60022 -r local_folder user_name@tal.lipn.univ-paris13.fr: |
</ | </ | ||
Any data that you need to copy back from the server to your computer must be copied to your NFS home: | Any data that you need to copy back from the server to your computer must be copied to your NFS home: | ||
Ligne 234: | Ligne 233: | ||
[Slurm](https:// | [Slurm](https:// | ||
- | ### `srun` | + | ### $ srun |
This is the basic command for running jobs in Slurm. This example shows how to check the GPU models you are using and the CUDA version running `nvidia-smi`command with `srun`. | This is the basic command for running jobs in Slurm. This example shows how to check the GPU models you are using and the CUDA version running `nvidia-smi`command with `srun`. | ||
<code bash> | <code bash> | ||
Ligne 270: | Ligne 269: | ||
**You can use it to run Python code, but as you are working in a shared server, it is better to run your code with `sbatch`** | **You can use it to run Python code, but as you are working in a shared server, it is better to run your code with `sbatch`** | ||
- | ### `sinfo` and `scontrol` | + | ### $ sinfo / scontrol |
This command shows how many nodes are available in the server. | This command shows how many nodes are available in the server. | ||
Ligne 299: | Ligne 298: | ||
</ | </ | ||
- | ### `squeue` | + | ### $ squeue |
If the server is full, your job will be put in wait on a queue by Slurm. You can check the queue state with . | If the server is full, your job will be put in wait on a queue by Slurm. You can check the queue state with . | ||
Ligne 311: | Ligne 310: | ||
</ | </ | ||
- | ### `sbatch` | + | ### $ sbatch |
If you simply run your code with `srun`, your job will try to use all the available resources (like in the `gpu_test.py` example from Section 3 - Pytorch) . So the `sbatch` command is useful to configure inputs, outputs and resource requirements for your job. The following example configures the `gpu_test.py` example to use only 3 GPUs, and specifies output files for the job. | If you simply run your code with `srun`, your job will try to use all the available resources (like in the `gpu_test.py` example from Section 3 - Pytorch) . So the `sbatch` command is useful to configure inputs, outputs and resource requirements for your job. The following example configures the `gpu_test.py` example to use only 3 GPUs, and specifies output files for the job. | ||
Ligne 339: | Ligne 338: | ||
</ | </ | ||
- | ### `scancel` | + | ### $ scancel |
From time to time you need to kill a job. You need to use the `JOBID` number from the `squeue` command | From time to time you need to kill a job. You need to use the `JOBID` number from the `squeue` command | ||
Ligne 355: | Ligne 354: | ||
</ | </ | ||
+ | |||
+ | ## Troubleshooting | ||
+ | |||
+ | Any questions about this doc, write to [Jorge Garcia Flores](mailto: |