How to use Spark on Grid5000 : Différence entre versions

De BIGDATA
Aller à : navigation, rechercher
Ligne 18 : Ligne 18 :
 
Some examples
 
Some examples
  
oarsub -t allow_classic_ssh -l nodes=10,walltime=2 -r '2015-06-14 19:30:00'
+
    oarsub -t allow_classic_ssh -l nodes=10,walltime=2 -r '2015-06-14 19:30:00'
  
oarsub -p "cluster='paranoia'" -t allow_classic_ssh -l nodes=8,walltime=12 -r '2015-07-09 21:14:01'
+
    oarsub -p "cluster='paranoia'" -t allow_classic_ssh -l nodes=8,walltime=12 -r '2015-07-09 21:14:01'
  
oarsub -I -p "cluster='paranoia'" -t allow_classic_ssh -l nodes=8,walltime=12
+
    oarsub -I -p "cluster='paranoia'" -t allow_classic_ssh -l nodes=8,walltime=12
  
 
'''Take a reservation'''
 
'''Take a reservation'''
  
oarsub -C job_ID
+
    oarsub -C job_ID
  
 
'''Take nodes directly'''  
 
'''Take nodes directly'''  
  
oarsub -I -t allow_classic_ssh -l nodes=6,walltime=2
+
    oarsub -I -t allow_classic_ssh -l nodes=6,walltime=2
  
 
'''Cluster initialization'''
 
'''Cluster initialization'''
Ligne 37 : Ligne 37 :
 
   Depending on which cluster you are, you need to have the compressed versions of Hadoop and Spark in one of your directory, here public.
 
   Depending on which cluster you are, you need to have the compressed versions of Hadoop and Spark in one of your directory, here public.
  
hg5k --create $OAR_NODEFILE --version 2
+
    hg5k --create $OAR_NODEFILE --version 2
  
hg5k --bootstrap /home/yourUserName/public/hadoop-2.6.0.tar.gz
+
    hg5k --bootstrap /home/yourUserName/public/hadoop-2.6.0.tar.gz
  
hg5k --initialize feeling_lucky --start
+
    hg5k --initialize feeling_lucky --start
  
spark_g5k --create YARN --hid 1
+
    spark_g5k --create YARN --hid 1
  
spark_g5k --bootstrap /home/yourUserName/public/spark-1.6.0-bin-hadoop2.6.tgz
+
    spark_g5k --bootstrap /home/yourUserName/public/spark-1.6.0-bin-hadoop2.6.tgz
  
spark_g5k --initialize feeling_lucky --start
+
    spark_g5k --initialize feeling_lucky --start
  
  
 
'''Put files on HDFS'''
 
'''Put files on HDFS'''
  
hg5k --putindfs myfile.csv /myfile.csv
+
    hg5k --putindfs myfile.csv /myfile.csv
  
 
'''Execute jar file'''
 
'''Execute jar file'''
  
spark_g5k --scala_job myprgm.jar
+
    spark_g5k --scala_job myprgm.jar
spark_g5k --scala_job --exec_params executor-memory=1g driver-memory=1g num-executors=2 executor-cores=3 myprgm.jar
+
    spark_g5k --scala_job --exec_params executor-memory=1g driver-memory=1g num-executors=2 executor-cores=3 myprgm.jar
  
 
'''Find files on HDFS'''
 
'''Find files on HDFS'''
  
hg5k --state files
+
    hg5k --state files
  
 
'''Get result file named res'''
 
'''Get result file named res'''
  
hg5k --getfromdfs res /home/yourUserName/reims
+
    hg5k --getfromdfs res /home/yourUserName/reims
  
  
 
'''Destroy properly cluster'''
 
'''Destroy properly cluster'''
spark_g5k --delete
+
    spark_g5k --delete
hg5k --delete
+
    hg5k --delete
  
  

Version du 29 janvier 2016 à 14:02

Welcome to spark on Grid5000

1 : Install hadoop_g5k https://github.com/mliroz/hadoop_g5k/wiki

Create file .bash_profile if it doesn't exist at /home/yourUserName/.bash_profile

Add the following lines :

         PATH="/home/yourUserName/.local/bin:$PATH”
         export PATH

Initialize cluster

Reserve nodes

https://www.grid5000.fr/mediawiki/index.php/Getting_Started

Some examples

   oarsub -t allow_classic_ssh -l nodes=10,walltime=2 -r '2015-06-14 19:30:00'
   oarsub -p "cluster='paranoia'" -t allow_classic_ssh -l nodes=8,walltime=12 -r '2015-07-09 21:14:01'
   oarsub -I -p "cluster='paranoia'" -t allow_classic_ssh -l nodes=8,walltime=12

Take a reservation

   oarsub -C job_ID

Take nodes directly

   oarsub -I -t allow_classic_ssh -l nodes=6,walltime=2

Cluster initialization

Prerequisite

 Depending on which cluster you are, you need to have the compressed versions of Hadoop and Spark in one of your directory, here public.
   hg5k --create $OAR_NODEFILE --version 2
   hg5k --bootstrap /home/yourUserName/public/hadoop-2.6.0.tar.gz
   hg5k --initialize feeling_lucky --start
   spark_g5k --create YARN --hid 1
   spark_g5k --bootstrap /home/yourUserName/public/spark-1.6.0-bin-hadoop2.6.tgz
   spark_g5k --initialize feeling_lucky --start


Put files on HDFS

   hg5k --putindfs myfile.csv /myfile.csv

Execute jar file

   spark_g5k --scala_job myprgm.jar
   spark_g5k --scala_job --exec_params executor-memory=1g driver-memory=1g num-executors=2 executor-cores=3 myprgm.jar

Find files on HDFS

   hg5k --state files

Get result file named res

   hg5k --getfromdfs res /home/yourUserName/reims


Destroy properly cluster

   spark_g5k --delete
   hg5k --delete


Accesory

  1. list of resources of your reservation

uniq $OAR_NODEFILE