Différences

Ci-dessous, les différences entre deux révisions de la page.

--- public:support:labex-efl-gpu:tal-labex-gpu [2021/11/30 15:29]
garciaflores
+++ public:support:labex-efl-gpu:tal-labex-gpu [2023/01/27 16:57] (Version actuelle)
garciaflores
@@ Ligne 1: / Ligne 1: @@
 # Accesing the TAL/LABEX EFL GPU server
+The server gives access to **8 GPUs Nvidia GEForce RTX 2080 with 8 GB of RAM each** in one node. This server is reserved to external @LipnLab [LABEX EFL](https://www.labex-efl.fr/) research partners. You need to [send us an email](mailto:jgflores@lipn.fr) to ask for a `tal-lipn` account in order to get access to this server.
-- **`lipn-tal-labex`**  provides access to **8 GPUs Nvidia GEForce RTX 2080 with 8GB of RAM each** in one node. You need to write an email  to [Jorge Garcia Flores](mailto:jgflores@lipn.fr) to ask for a `tal-lipn` account in order to get access to this server. (Spoiler alert: for the moment, a standard LIPN intranet account  is useless for this server: you really need a `tal-lipn` account to gain access this part of the network).
 ## 1. Connecting to the server
@@ Ligne 67: / Ligne 66: @@
 </code>
-**For the moment, you need to source manually the `.bashrc` file of your NFS home every time you connect to the Labex server, in order to activate your *miniconda* GPU environment  (see section 3). So, each time you login, you need to type**
+**For the moment, you need to source manually the `‧bashrc` file of your NFS home every time you connect to the Labex server, in order to activate your *miniconda* GPU environment  (see section 3). So, each time you login, you need to type**
 <code bash>
@@ Ligne 90: / Ligne 89: @@
 # INSIDE the lab commands
 # copying one file from your computer to the Labex server
-$ scp my_file.txt user_name@ssh.tal.univ-paris13.fr:~/
+$ scp my_file‧txt user_name@ssh.tal.univ-paris13.fr:~/
 # copying a whole folder
 $ scp -r local_folder user_name@ssh.tal.univ-paris13.fr:~/remote_folder
@@ Ligne 103: / Ligne 102: @@
 # OUTSIDE the lab commands
 # copying files
-$ scp -P 60022 my_file.txt user_name@lipnssh.univ-paris13.fr:~/
+$ scp -P 60022 my_file‧txt user_name@tal.lipn.univ-paris13.fr
 # copying folders recursevly
-$ scp -r local_folder user_name@tal.lipn.univ-paris13.fr:~/remote_folder
+$ scp -P 60022 -r local_folder user_name@tal.lipn.univ-paris13.fr:~/remote_folder
 </code>
 Any data that you need to copy back from the server to your computer must be copied to your NFS home:
@@ Ligne 111: / Ligne 110: @@
 <code bash>
 #OUTSIDE the lab commands
-user_name@lipn-tal-labex:~$ cp any_file.txt /users/username/my_folder/
+user_name@lipn-tal-labex:~$ cp any_file‧txt /users/username/my_folder/
 user_name@lipn-tal-labex:~$ exit
 my_user@my_local_computer:~$ scp -P 60022 user_name@tal.lipn.univ-paris13.fr:~/my_folder/any_file.txt .
@@ Ligne 165: / Ligne 164: @@
 </code>
-Now you will be asked if you want to add *conda* base environment in your `.bashrc` file. Answer yes.
+Now you will be asked if you want to add *conda* base environment in your `‧bashrc` file. Answer yes.
 <code bash>
@@ Ligne 176: / Ligne 175: @@
 </code>
-Source manually your `.bashrc` file on your NFS home to activate the *miniconda* environment before installing Pytorch.
+Source manually your `‧bashrc` file on your NFS home to activate the *miniconda* environment before installing Pytorch.
 <code bash>
@@ Ligne 206: / Ligne 205: @@
 (Type `y`to proceed). After a while, you need to test your Pytorch install.
-To test it, create the following `gpu_test.py` program with your favorite editor
+To test it, create the following `gpu_test‧py` program with your favorite editor
 </code>python
 # Python program to count GPU cards in the server using Pytorch
 import torch
-available_gpus = [torch.cuda.device(i) for i in range(torch.cuda.device_count())]
+available_gpus = [torch‧cuda‧device(i) for i in range(torch‧cuda‧device_count())]
 for gpu in available_gpus:
     print(gpu)
@@ Ligne 219: / Ligne 218: @@
 <code bash>
-(base) user_name@lipn-tal-labex:~$ srun python3 gpu_test.py
+(base) user_name@lipn-tal-labex:~$ srun python3 gpu_test‧py
-<torch.cuda.device object at 0x7f29f0602d10>
+<torch‧cuda‧device object at 0x7f29f0602d10>
-<torch.cuda.device object at 0x7f29f0602d90>
+<torch‧cuda‧device object at 0x7f29f0602d90>
-<torch.cuda.device object at 0x7f29f0602e90>
+<torch‧cuda‧device object at 0x7f29f0602e90>
-<torch.cuda.device object at 0x7f29f0618cd0>
+<torch‧cuda‧device object at 0x7f29f0618cd0>
-<torch.cuda.device object at 0x7f29f0618d10>
+<torch‧cuda‧device object at 0x7f29f0618d10>
-<torch.cuda.device object at 0x7f29f0618d90>
+<torch‧cuda‧device object at 0x7f29f0618d90>
-<torch.cuda.device object at 0x7f29f0618dd0>
+<torch‧cuda‧device object at 0x7f29f0618dd0>
-<torch.cuda.device object at 0x7f29f0618e10>
+<torch‧cuda‧device object at 0x7f29f0618e10>
 </code>
@@ Ligne 306: / Ligne 305: @@
 $ squeue
              JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
-     labex QKVRegLA ghazi.fe  R 2-23:50:30      1 tal-gpu-labex1
+     labex QKVRegLA ghazi‧fe  R 2-23:50:30      1 tal-gpu-labex1
-     labex QKVRegLA ghazi.fe  R 2-23:41:19      1 tal-gpu-labex1
+     labex QKVRegLA ghazi‧fe  R 2-23:41:19      1 tal-gpu-labex1
      labex MicrofEx gerardo.  R      24:31      1 tal-gpu-labex1
 </code>
 ### $ sbatch
-If you simply run your code with `srun`, your job will try to use all the available resources (like in the `gpu_test.py` example from Section 3 - Pytorch) . So the `sbatch` command is useful to configure inputs, outputs and resource requirements for your job. The following example configures the `gpu_test.py` example to use only 3 GPUs, and specifies output files for the job.
+If you simply run your code with `srun`, your job will try to use all the available resources (like in the `gpu_test‧py` example from Section 3 - Pytorch) . So the `sbatch` command is useful to configure inputs, outputs and resource requirements for your job. The following example configures the `gpu_test‧py` example to use only 3 GPUs, and specifies output files for the job.
-First, you will create a `myfirst_gpu_job.sh` file
+First, you will create a `myfirst_gpu_job‧sh` file
 <code bash>
@@ Ligne 322: / Ligne 321: @@
 #SBATCH --qos=qos_gpu-t4
 #SBATCH --cpus-per-task=5
-#SBATCH --output=./MyFirstJob.out
+#SBATCH --output=./MyFirstJob‧out
-#SBATCH --error=./MyFirstJob.err
+#SBATCH --error=./MyFirstJob‧err
 #SBATCH --time=100:00:00
 #SBATCH --nodes=1
 #SBATCH --cpus-per-task=5
 #SBATCH --ntasks-per-node=1
-srun python available_gpus.py
+srun python available_gpus‧py
 </code>
-These parameters specify a job to be run on 1 node, 3 GPUs and in a maximum time of 100 hours. Normal output will be sent to `MyFirstJob.out`
+These parameters specify a job to be run on 1 node, 3 GPUs and in a maximum time of 100 hours. Normal output will be sent to `MyFirstJob‧out`
 Then you run the script with `sbatch`
 <code bash>
-$ sbatch myfirst_gpu_job.sh
+$ sbatch myfirst_gpu_job‧sh
 </code>
@@ Ligne 349: / Ligne 348: @@
       lipn my_first garciafl PD       0:00      1 (Priority)
       lipn SUPER_En   leroux PD       0:00      1 (Priority)
-      lipn    GGGS1 xudong.z  R 6-20:30:00      1 lipn-rtx1
+      lipn    GGGS1 xudong‧z  R 6-20:30:00      1 lipn-rtx1
       lipn SUPER_En   leroux  R 5-00:03:11      1 lipn-rtx2
       lipn SUPER_En   leroux  R 2-21:37:05      1 lipn-rtx2
@@ Ligne 355: / Ligne 354: @@
 </code>
+## Troubleshooting
+Any questions about this doc, write to [Jorge Garcia Flores](mailto:jgflores@lipn.fr).