public:support:labex-efl-gpu:tal-labex-gpu

Différences

Ci-dessous, les différences entre deux révisions de la page.

Lien vers cette vue comparative

Les deux révisions précédentes Révision précédente
Prochaine révision
Révision précédente
Dernière révision Les deux révisions suivantes
public:support:labex-efl-gpu:tal-labex-gpu [2021/11/30 15:28]
garciaflores
public:support:labex-efl-gpu:tal-labex-gpu [2022/05/13 11:27]
garciaflores [2. Copying data]
Ligne 1: Ligne 1:
 # Accesing the TAL/LABEX EFL GPU server # Accesing the TAL/LABEX EFL GPU server
- +The server gives access to **8 GPUs Nvidia GEForce RTX 2080 with 8 GB of RAM each** in one node. This server is reserved to external @LipnLab [LABEX EFL](https://www.labex-efl.fr/) research partners. You need to [send us an email](mailto:jgflores@lipn.fr) to ask for a `tal-lipn` account in order to get access to this server. 
-- **`lipn-tal-labex`**  provides access to **8 GPUs Nvidia GEForce RTX 2080 with 8GB of RAM each** in one node. You need to write an email  to [Jorge Garcia Flores](mailto:jgflores@lipn.fr) to ask for a `tal-lipn` account in order to get access to this server. (Spoiler alert: for the moment, a standard LIPN intranet account  is useless for this server: you really need a `tal-lipn` account to gain access this part of the network).+
  
 ## 1. Connecting to the server ## 1. Connecting to the server
Ligne 105: Ligne 104:
 $ scp -P 60022 my_file.txt user_name@lipnssh.univ-paris13.fr:~/ $ scp -P 60022 my_file.txt user_name@lipnssh.univ-paris13.fr:~/
 # copying folders recursevly  # copying folders recursevly 
-$ scp -r local_folder user_name@tal.lipn.univ-paris13.fr:~/remote_folder+$ scp -P 60022 -r local_folder user_name@tal.lipn.univ-paris13.fr:~/remote_folder
 </code> </code>
 Any data that you need to copy back from the server to your computer must be copied to your NFS home: Any data that you need to copy back from the server to your computer must be copied to your NFS home:
Ligne 234: Ligne 233:
 [Slurm](https://slurm.schedmd.com/overview.html) is the Linux workload manager we use at LIPN to schedule and queue GPU jobs.  [Slurm](https://slurm.schedmd.com/overview.html) is the Linux workload manager we use at LIPN to schedule and queue GPU jobs. 
  
-### `srun`+### srun
 This is the basic command for running jobs in Slurm. This example shows how to check the GPU models you are using and the CUDA version running `nvidia-smi`command with `srun`. This is the basic command for running jobs in Slurm. This example shows how to check the GPU models you are using and the CUDA version running `nvidia-smi`command with `srun`.
 <code bash> <code bash>
Ligne 270: Ligne 269:
 **You can use it to run Python code, but as you are working in a shared server, it is better to run your code with `sbatch`**  **You can use it to run Python code, but as you are working in a shared server, it is better to run your code with `sbatch`** 
  
-### `sinfo` and `scontrol`+### sinfo scontrol
 This command shows how many nodes are available in the server.  This command shows how many nodes are available in the server. 
  
Ligne 299: Ligne 298:
 </code> </code>
  
-### `squeue`+### squeue
  
 If the server is full, your job will be put in wait on a queue by Slurm. You can check the queue state with .  If the server is full, your job will be put in wait on a queue by Slurm. You can check the queue state with . 
Ligne 311: Ligne 310:
 </code> </code>
  
-### `sbatch`+### sbatch
 If you simply run your code with `srun`, your job will try to use all the available resources (like in the `gpu_test.py` example from Section 3 - Pytorch) . So the `sbatch` command is useful to configure inputs, outputs and resource requirements for your job. The following example configures the `gpu_test.py` example to use only 3 GPUs, and specifies output files for the job.  If you simply run your code with `srun`, your job will try to use all the available resources (like in the `gpu_test.py` example from Section 3 - Pytorch) . So the `sbatch` command is useful to configure inputs, outputs and resource requirements for your job. The following example configures the `gpu_test.py` example to use only 3 GPUs, and specifies output files for the job. 
  
Ligne 339: Ligne 338:
 </code> </code>
  
-### `scancel`+### scancel
  
 From time to time you need to kill a job. You need to use the `JOBID` number from the `squeue` command From time to time you need to kill a job. You need to use the `JOBID` number from the `squeue` command
Ligne 355: Ligne 354:
 </code> </code>
  
 +
 +## Troubleshooting
 +
 +Any questions about this doc, write to [Jorge Garcia Flores](mailto:jgflores@lipn.fr).
  • Dernière modification: il y a 16 mois