public:support:labex-efl-gpu:tal-labex-gpu

Différences

Ci-dessous, les différences entre deux révisions de la page.

Lien vers cette vue comparative

Les deux révisions précédentes Révision précédente
Prochaine révision
Révision précédente
public:support:labex-efl-gpu:tal-labex-gpu [2021/11/30 15:31]
garciaflores
public:support:labex-efl-gpu:tal-labex-gpu [2023/01/27 16:57] (Version actuelle)
garciaflores
Ligne 1: Ligne 1:
 # Accesing the TAL/LABEX EFL GPU server # Accesing the TAL/LABEX EFL GPU server
- +The server gives access to **8 GPUs Nvidia GEForce RTX 2080 with 8 GB of RAM each** in one node. This server is reserved to external @LipnLab [LABEX EFL](https://www.labex-efl.fr/) research partners. You need to [send us an email](mailto:jgflores@lipn.fr) to ask for a `tal-lipn` account in order to get access to this server. 
-- **`lipn-tal-labex`**  provides access to **8 GPUs Nvidia GEForce RTX 2080 with 8GB of RAM each** in one node. You need to write an email  to [Jorge Garcia Flores](mailto:jgflores@lipn.fr) to ask for a `tal-lipn` account in order to get access to this server. +
  
 ## 1. Connecting to the server ## 1. Connecting to the server
Ligne 67: Ligne 66:
 </code> </code>
  
-**For the moment, you need to source manually the `.bashrc` file of your NFS home every time you connect to the Labex server, in order to activate your *miniconda* GPU environment  (see section 3). So, each time you login, you need to type**+**For the moment, you need to source manually the `bashrc` file of your NFS home every time you connect to the Labex server, in order to activate your *miniconda* GPU environment  (see section 3). So, each time you login, you need to type**
  
 <code bash> <code bash>
Ligne 90: Ligne 89:
 # INSIDE the lab commands # INSIDE the lab commands
 # copying one file from your computer to the Labex server # copying one file from your computer to the Labex server
-$ scp my_file.txt user_name@ssh.tal.univ-paris13.fr:~/+$ scp my_filetxt user_name@ssh.tal.univ-paris13.fr:~/
 # copying a whole folder  # copying a whole folder 
 $ scp -r local_folder user_name@ssh.tal.univ-paris13.fr:~/remote_folder $ scp -r local_folder user_name@ssh.tal.univ-paris13.fr:~/remote_folder
Ligne 103: Ligne 102:
 # OUTSIDE the lab commands # OUTSIDE the lab commands
 # copying files  # copying files 
-$ scp -P 60022 my_file.txt user_name@lipnssh.univ-paris13.fr:~/+$ scp -P 60022 my_filetxt user_name@tal.lipn.univ-paris13.fr 
 # copying folders recursevly  # copying folders recursevly 
-$ scp -r local_folder user_name@tal.lipn.univ-paris13.fr:~/remote_folder+$ scp -P 60022 -r local_folder user_name@tal.lipn.univ-paris13.fr:~/remote_folder
 </code> </code>
 Any data that you need to copy back from the server to your computer must be copied to your NFS home: Any data that you need to copy back from the server to your computer must be copied to your NFS home:
Ligne 111: Ligne 110:
 <code bash> <code bash>
 #OUTSIDE the lab commands #OUTSIDE the lab commands
-user_name@lipn-tal-labex:~$ cp any_file.txt /users/username/my_folder/+user_name@lipn-tal-labex:~$ cp any_filetxt /users/username/my_folder/
 user_name@lipn-tal-labex:~$ exit user_name@lipn-tal-labex:~$ exit
 my_user@my_local_computer:~$ scp -P 60022 user_name@tal.lipn.univ-paris13.fr:~/my_folder/any_file.txt . my_user@my_local_computer:~$ scp -P 60022 user_name@tal.lipn.univ-paris13.fr:~/my_folder/any_file.txt .
Ligne 165: Ligne 164:
 </code> </code>
  
-Now you will be asked if you want to add *conda* base environment in your `.bashrc` file. Answer yes. +Now you will be asked if you want to add *conda* base environment in your `bashrc` file. Answer yes. 
  
 <code bash> <code bash>
Ligne 176: Ligne 175:
 </code> </code>
  
-Source manually your `.bashrc` file on your NFS home to activate the *miniconda* environment before installing Pytorch.+Source manually your `bashrc` file on your NFS home to activate the *miniconda* environment before installing Pytorch.
  
 <code bash> <code bash>
Ligne 206: Ligne 205:
 (Type `y`to proceed). After a while, you need to test your Pytorch install.  (Type `y`to proceed). After a while, you need to test your Pytorch install. 
  
-To test it, create the following `gpu_test.py` program with your favorite editor+To test it, create the following `gpu_testpy` program with your favorite editor
  
 </code>python </code>python
 # Python program to count GPU cards in the server using Pytorch # Python program to count GPU cards in the server using Pytorch
 import torch import torch
-available_gpus = [torch.cuda.device(i) for i in range(torch.cuda.device_count())]+available_gpus = [torchcudadevice(i) for i in range(torchcudadevice_count())]
 for gpu in available_gpus: for gpu in available_gpus:
     print(gpu)     print(gpu)
Ligne 219: Ligne 218:
  
 <code bash> <code bash>
-(base) user_name@lipn-tal-labex:~$ srun python3 gpu_test.py +(base) user_name@lipn-tal-labex:~$ srun python3 gpu_testpy 
-<torch.cuda.device object at 0x7f29f0602d10> +<torchcudadevice object at 0x7f29f0602d10> 
-<torch.cuda.device object at 0x7f29f0602d90> +<torchcudadevice object at 0x7f29f0602d90> 
-<torch.cuda.device object at 0x7f29f0602e90> +<torchcudadevice object at 0x7f29f0602e90> 
-<torch.cuda.device object at 0x7f29f0618cd0> +<torchcudadevice object at 0x7f29f0618cd0> 
-<torch.cuda.device object at 0x7f29f0618d10> +<torchcudadevice object at 0x7f29f0618d10> 
-<torch.cuda.device object at 0x7f29f0618d90> +<torchcudadevice object at 0x7f29f0618d90> 
-<torch.cuda.device object at 0x7f29f0618dd0> +<torchcudadevice object at 0x7f29f0618dd0> 
-<torch.cuda.device object at 0x7f29f0618e10>+<torchcudadevice object at 0x7f29f0618e10>
 </code> </code>
  
Ligne 306: Ligne 305:
 $ squeue $ squeue
              JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)              JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
-              8795     labex QKVRegLA ghazi.fe  R 2-23:50:30      1 tal-gpu-labex1 +              8795     labex QKVRegLA ghazife  R 2-23:50:30      1 tal-gpu-labex1 
-              8796     labex QKVRegLA ghazi.fe  R 2-23:41:19      1 tal-gpu-labex1+              8796     labex QKVRegLA ghazife  R 2-23:41:19      1 tal-gpu-labex1
               8812     labex MicrofEx gerardo.  R      24:31      1 tal-gpu-labex1               8812     labex MicrofEx gerardo.  R      24:31      1 tal-gpu-labex1
 </code> </code>
  
 ### $ sbatch ### $ sbatch
-If you simply run your code with `srun`, your job will try to use all the available resources (like in the `gpu_test.py` example from Section 3 - Pytorch) . So the `sbatch` command is useful to configure inputs, outputs and resource requirements for your job. The following example configures the `gpu_test.py` example to use only 3 GPUs, and specifies output files for the job. +If you simply run your code with `srun`, your job will try to use all the available resources (like in the `gpu_testpy` example from Section 3 - Pytorch) . So the `sbatch` command is useful to configure inputs, outputs and resource requirements for your job. The following example configures the `gpu_testpy` example to use only 3 GPUs, and specifies output files for the job. 
  
-First, you will create a `myfirst_gpu_job.sh` file +First, you will create a `myfirst_gpu_jobsh` file 
  
 <code bash> <code bash>
Ligne 322: Ligne 321:
 #SBATCH --qos=qos_gpu-t4 #SBATCH --qos=qos_gpu-t4
 #SBATCH --cpus-per-task=5 #SBATCH --cpus-per-task=5
-#SBATCH --output=./MyFirstJob.out +#SBATCH --output=./MyFirstJobout 
-#SBATCH --error=./MyFirstJob.err+#SBATCH --error=./MyFirstJoberr
 #SBATCH --time=100:00:00 #SBATCH --time=100:00:00
 #SBATCH --nodes=1 #SBATCH --nodes=1
 #SBATCH --cpus-per-task=5 #SBATCH --cpus-per-task=5
 #SBATCH --ntasks-per-node=1 #SBATCH --ntasks-per-node=1
-srun python available_gpus.py+srun python available_gpuspy
 </code> </code>
  
-These parameters specify a job to be run on 1 node, 3 GPUs and in a maximum time of 100 hours. Normal output will be sent to `MyFirstJob.out`+These parameters specify a job to be run on 1 node, 3 GPUs and in a maximum time of 100 hours. Normal output will be sent to `MyFirstJobout`
  
 Then you run the script with `sbatch` Then you run the script with `sbatch`
  
 <code bash> <code bash>
-$ sbatch myfirst_gpu_job.sh+$ sbatch myfirst_gpu_jobsh
 </code> </code>
  
Ligne 349: Ligne 348:
               4760      lipn my_first garciafl PD       0:00      1 (Priority)               4760      lipn my_first garciafl PD       0:00      1 (Priority)
               4761      lipn SUPER_En   leroux PD       0:00      1 (Priority)               4761      lipn SUPER_En   leroux PD       0:00      1 (Priority)
-              4675      lipn    GGGS1 xudong.z  R 6-20:30:00      1 lipn-rtx1+              4675      lipn    GGGS1 xudongz  R 6-20:30:00      1 lipn-rtx1
               4715      lipn SUPER_En   leroux  R 5-00:03:11      1 lipn-rtx2               4715      lipn SUPER_En   leroux  R 5-00:03:11      1 lipn-rtx2
               4752      lipn SUPER_En   leroux  R 2-21:37:05      1 lipn-rtx2               4752      lipn SUPER_En   leroux  R 2-21:37:05      1 lipn-rtx2
Ligne 358: Ligne 357:
 ## Troubleshooting ## Troubleshooting
  
-Any questions about this doc, write to [Jorge Garcia Flores](mailto:jgflores@lipn.fr) or to the [lipn-gpu](mailto:) list (if you have a LIPN-Intranet account)+Any questions about this doc, write to [Jorge Garcia Flores](mailto:jgflores@lipn.fr).
  • Dernière modification: il y a 2 ans