Différences

Ci-dessous, les différences entre deux révisions de la page.

--- public:support:labex-efl-gpu:tal-labex-gpu [2021/11/30 15:28]
garciaflores
+++ public:support:labex-efl-gpu:tal-labex-gpu [2022/05/13 11:27]
garciaflores [2. Copying data]
@@ Ligne 1: / Ligne 1: @@
 # Accesing the TAL/LABEX EFL GPU server
+The server gives access to **8 GPUs Nvidia GEForce RTX 2080 with 8 GB of RAM each** in one node. This server is reserved to external @LipnLab [LABEX EFL](https://www.labex-efl.fr/) research partners. You need to [send us an email](mailto:jgflores@lipn.fr) to ask for a `tal-lipn` account in order to get access to this server.
-- **`lipn-tal-labex`**  provides access to **8 GPUs Nvidia GEForce RTX 2080 with 8GB of RAM each** in one node. You need to write an email  to [Jorge Garcia Flores](mailto:jgflores@lipn.fr) to ask for a `tal-lipn` account in order to get access to this server. (Spoiler alert: for the moment, a standard LIPN intranet account  is useless for this server: you really need a `tal-lipn` account to gain access this part of the network).
 ## 1. Connecting to the server
@@ Ligne 105: / Ligne 104: @@
 $ scp -P 60022 my_file.txt user_name@lipnssh.univ-paris13.fr:~/
 # copying folders recursevly
-$ scp -r local_folder user_name@tal.lipn.univ-paris13.fr:~/remote_folder
+$ scp -P 60022 -r local_folder user_name@tal.lipn.univ-paris13.fr:~/remote_folder
 </code>
 Any data that you need to copy back from the server to your computer must be copied to your NFS home:
@@ Ligne 234: / Ligne 233: @@
 [Slurm](https://slurm.schedmd.com/overview.html) is the Linux workload manager we use at LIPN to schedule and queue GPU jobs.
-### `srun`
+### $ srun
 This is the basic command for running jobs in Slurm. This example shows how to check the GPU models you are using and the CUDA version running `nvidia-smi`command with `srun`.
 <code bash>
@@ Ligne 270: / Ligne 269: @@
 **You can use it to run Python code, but as you are working in a shared server, it is better to run your code with `sbatch`**
-### `sinfo` and `scontrol`
+### $ sinfo / scontrol
 This command shows how many nodes are available in the server.
@@ Ligne 299: / Ligne 298: @@
 </code>
-### `squeue`
+### $ squeue
 If the server is full, your job will be put in wait on a queue by Slurm. You can check the queue state with .
@@ Ligne 311: / Ligne 310: @@
 </code>
-### `sbatch`
+### $ sbatch
 If you simply run your code with `srun`, your job will try to use all the available resources (like in the `gpu_test.py` example from Section 3 - Pytorch) . So the `sbatch` command is useful to configure inputs, outputs and resource requirements for your job. The following example configures the `gpu_test.py` example to use only 3 GPUs, and specifies output files for the job.
@@ Ligne 339: / Ligne 338: @@
 </code>
-### `scancel`
+### $ scancel
 From time to time you need to kill a job. You need to use the `JOBID` number from the `squeue` command
@@ Ligne 355: / Ligne 354: @@
 </code>
+## Troubleshooting
+Any questions about this doc, write to [Jorge Garcia Flores](mailto:jgflores@lipn.fr).