Différences
Ci-dessous, les différences entre deux révisions de la page.
| Les deux révisions précédentes Révision précédente Prochaine révision | Révision précédente | ||
| 
                    public:support:labex-efl-gpu:tal-labex-gpu [2021/11/30 15:29] garciaflores  | 
                
                    public:support:labex-efl-gpu:tal-labex-gpu [2023/01/27 16:57] (Version actuelle) garciaflores  | 
            ||
|---|---|---|---|
| Ligne 1: | Ligne 1: | ||
| # Accesing the TAL/LABEX EFL GPU server | # Accesing the TAL/LABEX EFL GPU server | ||
| - | + | The server gives access to **8 GPUs Nvidia GEForce RTX 2080 with 8 GB of RAM each** in one node. This server is reserved to external @LipnLab [LABEX EFL](https:// | |
| - | - **`lipn-tal-labex`**  | + | |
| ## 1. Connecting to the server | ## 1. Connecting to the server | ||
| Ligne 67: | Ligne 66: | ||
| </ | </ | ||
| - | **For the moment, you need to source manually the `.bashrc` file of your NFS home every time you connect to the Labex server, in order to activate your *miniconda* GPU environment  | + | **For the moment, you need to source manually the `‧bashrc` file of your NFS home every time you connect to the Labex server, in order to activate your *miniconda* GPU environment  | 
| <code bash> | <code bash> | ||
| Ligne 90: | Ligne 89: | ||
| # INSIDE the lab commands | # INSIDE the lab commands | ||
| # copying one file from your computer to the Labex server | # copying one file from your computer to the Labex server | ||
| - | $ scp my_file.txt user_name@ssh.tal.univ-paris13.fr: | + | $ scp my_file‧txt user_name@ssh.tal.univ-paris13.fr: | 
| # copying a whole folder  | # copying a whole folder  | ||
| $ scp -r local_folder user_name@ssh.tal.univ-paris13.fr: | $ scp -r local_folder user_name@ssh.tal.univ-paris13.fr: | ||
| Ligne 103: | Ligne 102: | ||
| # OUTSIDE the lab commands | # OUTSIDE the lab commands | ||
| # copying files | # copying files | ||
| - | $ scp -P 60022 my_file.txt user_name@lipnssh.univ-paris13.fr:~/ | + | $ scp -P 60022 my_file‧txt user_name@tal.lipn.univ-paris13.fr  | 
| # copying folders recursevly  | # copying folders recursevly  | ||
| - | $ scp -r local_folder user_name@tal.lipn.univ-paris13.fr: | + | $ scp -P 60022 -r local_folder user_name@tal.lipn.univ-paris13.fr: | 
| </ | </ | ||
| Any data that you need to copy back from the server to your computer must be copied to your NFS home: | Any data that you need to copy back from the server to your computer must be copied to your NFS home: | ||
| Ligne 111: | Ligne 110: | ||
| <code bash> | <code bash> | ||
| #OUTSIDE the lab commands | #OUTSIDE the lab commands | ||
| - | user_name@lipn-tal-labex: | + | user_name@lipn-tal-labex: | 
| user_name@lipn-tal-labex: | user_name@lipn-tal-labex: | ||
| my_user@my_local_computer: | my_user@my_local_computer: | ||
| Ligne 165: | Ligne 164: | ||
| </ | </ | ||
| - | Now you will be asked if you want to add *conda* base environment in your `.bashrc` file. Answer yes. | + | Now you will be asked if you want to add *conda* base environment in your `‧bashrc` file. Answer yes. | 
| <code bash> | <code bash> | ||
| Ligne 176: | Ligne 175: | ||
| </ | </ | ||
| - | Source manually your `.bashrc` file on your NFS home to activate the *miniconda* environment before installing Pytorch. | + | Source manually your `‧bashrc` file on your NFS home to activate the *miniconda* environment before installing Pytorch. | 
| <code bash> | <code bash> | ||
| Ligne 206: | Ligne 205: | ||
| (Type `y`to proceed). After a while, you need to test your Pytorch install.  | (Type `y`to proceed). After a while, you need to test your Pytorch install.  | ||
| - | To test it, create the following `gpu_test.py` program with your favorite editor | + | To test it, create the following `gpu_test‧py` program with your favorite editor | 
| </ | </ | ||
| # Python program to count GPU cards in the server using Pytorch | # Python program to count GPU cards in the server using Pytorch | ||
| import torch | import torch | ||
| - | available_gpus = [torch.cuda.device(i) for i in range(torch.cuda.device_count())] | + | available_gpus = [torch‧cuda‧device(i) for i in range(torch‧cuda‧device_count())] | 
| for gpu in available_gpus: | for gpu in available_gpus: | ||
| print(gpu) | print(gpu) | ||
| Ligne 219: | Ligne 218: | ||
| <code bash> | <code bash> | ||
| - | (base) user_name@lipn-tal-labex: | + | (base) user_name@lipn-tal-labex: | 
| - | <torch.cuda.device object at 0x7f29f0602d10> | + | <torch‧cuda‧device object at 0x7f29f0602d10> | 
| - | <torch.cuda.device object at 0x7f29f0602d90> | + | <torch‧cuda‧device object at 0x7f29f0602d90> | 
| - | <torch.cuda.device object at 0x7f29f0602e90> | + | <torch‧cuda‧device object at 0x7f29f0602e90> | 
| - | <torch.cuda.device object at 0x7f29f0618cd0> | + | <torch‧cuda‧device object at 0x7f29f0618cd0> | 
| - | <torch.cuda.device object at 0x7f29f0618d10> | + | <torch‧cuda‧device object at 0x7f29f0618d10> | 
| - | <torch.cuda.device object at 0x7f29f0618d90> | + | <torch‧cuda‧device object at 0x7f29f0618d90> | 
| - | <torch.cuda.device object at 0x7f29f0618dd0> | + | <torch‧cuda‧device object at 0x7f29f0618dd0> | 
| - | <torch.cuda.device object at 0x7f29f0618e10> | + | <torch‧cuda‧device object at 0x7f29f0618e10> | 
| </ | </ | ||
| Ligne 306: | Ligne 305: | ||
| $ squeue | $ squeue | ||
|              JOBID PARTITION  |              JOBID PARTITION  | ||
| - |               8795     labex QKVRegLA ghazi.fe  R 2-23: | + |               8795     labex QKVRegLA ghazi‧fe  R 2-23: | 
| - |               8796     labex QKVRegLA ghazi.fe  R 2-23: | + |               8796     labex QKVRegLA ghazi‧fe  R 2-23: | 
|               8812     labex MicrofEx gerardo.  |               8812     labex MicrofEx gerardo.  | ||
| </ | </ | ||
| ### $ sbatch | ### $ sbatch | ||
| - | If you simply run your code with `srun`, your job will try to use all the available resources (like in the `gpu_test.py` example from Section 3 - Pytorch) . So the `sbatch` command is useful to configure inputs, outputs and resource requirements for your job. The following example configures the `gpu_test.py` example to use only 3 GPUs, and specifies output files for the job. | + | If you simply run your code with `srun`, your job will try to use all the available resources (like in the `gpu_test‧py` example from Section 3 - Pytorch) . So the `sbatch` command is useful to configure inputs, outputs and resource requirements for your job. The following example configures the `gpu_test‧py` example to use only 3 GPUs, and specifies output files for the job. | 
| - | First, you will create a `myfirst_gpu_job.sh` file | + | First, you will create a `myfirst_gpu_job‧sh` file | 
| <code bash> | <code bash> | ||
| Ligne 322: | Ligne 321: | ||
| #SBATCH --qos=qos_gpu-t4 | #SBATCH --qos=qos_gpu-t4 | ||
| #SBATCH --cpus-per-task=5 | #SBATCH --cpus-per-task=5 | ||
| - | #SBATCH --output=./ | + | #SBATCH --output=./ | 
| - | #SBATCH --error=./ | + | #SBATCH --error=./ | 
| #SBATCH --time=100: | #SBATCH --time=100: | ||
| #SBATCH --nodes=1 | #SBATCH --nodes=1 | ||
| #SBATCH --cpus-per-task=5 | #SBATCH --cpus-per-task=5 | ||
| #SBATCH --ntasks-per-node=1 | #SBATCH --ntasks-per-node=1 | ||
| - | srun python available_gpus.py | + | srun python available_gpus‧py | 
| </ | </ | ||
| - | These parameters specify a job to be run on 1 node, 3 GPUs and in a maximum time of 100 hours. Normal output will be sent to `MyFirstJob.out` | + | These parameters specify a job to be run on 1 node, 3 GPUs and in a maximum time of 100 hours. Normal output will be sent to `MyFirstJob‧out` | 
| Then you run the script with `sbatch` | Then you run the script with `sbatch` | ||
| <code bash> | <code bash> | ||
| - | $ sbatch myfirst_gpu_job.sh | + | $ sbatch myfirst_gpu_job‧sh | 
| </ | </ | ||
| Ligne 349: | Ligne 348: | ||
|               4760      lipn my_first garciafl PD        |               4760      lipn my_first garciafl PD        | ||
|               4761      lipn SUPER_En  |               4761      lipn SUPER_En  | ||
| - |               4675      lipn    GGGS1 xudong.z  R 6-20: | + |               4675      lipn    GGGS1 xudong‧z  R 6-20: | 
|               4715      lipn SUPER_En  |               4715      lipn SUPER_En  | ||
|               4752      lipn SUPER_En  |               4752      lipn SUPER_En  | ||
| Ligne 355: | Ligne 354: | ||
| </ | </ | ||
| + | |||
| + | ## Troubleshooting | ||
| + | |||
| + | Any questions about this doc, write to [Jorge Garcia Flores](mailto: | ||