How to use Spark on Grid5000 : Différence entre versions

De BIGDATA
Aller à : navigation, rechercher
Ligne 34 : Ligne 34 :
 
'''Cluster initialization'''
 
'''Cluster initialization'''
  
Prerequisite
+
Prerequisite : Depending on which cluster you are, you need to have the compressed versions of Hadoop and Spark in one of your directory, here public.
  Depending on which cluster you are, you need to have the compressed versions of Hadoop and Spark in one of your directory, here public.
+
  
 
     hg5k --create $OAR_NODEFILE --version 2
 
     hg5k --create $OAR_NODEFILE --version 2

Version du 29 janvier 2016 à 14:03

Welcome to spark on Grid5000

1 : Install hadoop_g5k https://github.com/mliroz/hadoop_g5k/wiki

Create file .bash_profile if it doesn't exist at /home/yourUserName/.bash_profile

Add the following lines :

   PATH="/home/yourUserName/.local/bin:$PATH”
   export PATH

Initialize cluster

Reserve nodes

https://www.grid5000.fr/mediawiki/index.php/Getting_Started

Some examples

   oarsub -t allow_classic_ssh -l nodes=10,walltime=2 -r '2015-06-14 19:30:00'
   oarsub -p "cluster='paranoia'" -t allow_classic_ssh -l nodes=8,walltime=12 -r '2015-07-09 21:14:01'
   oarsub -I -p "cluster='paranoia'" -t allow_classic_ssh -l nodes=8,walltime=12

Take a reservation

   oarsub -C job_ID

Take nodes directly

   oarsub -I -t allow_classic_ssh -l nodes=6,walltime=2

Cluster initialization

Prerequisite : Depending on which cluster you are, you need to have the compressed versions of Hadoop and Spark in one of your directory, here public.

   hg5k --create $OAR_NODEFILE --version 2
   hg5k --bootstrap /home/yourUserName/public/hadoop-2.6.0.tar.gz
   hg5k --initialize feeling_lucky --start
   spark_g5k --create YARN --hid 1
   spark_g5k --bootstrap /home/yourUserName/public/spark-1.6.0-bin-hadoop2.6.tgz
   spark_g5k --initialize feeling_lucky --start


Put files on HDFS

   hg5k --putindfs myfile.csv /myfile.csv

Execute jar file

   spark_g5k --scala_job myprgm.jar
   spark_g5k --scala_job --exec_params executor-memory=1g driver-memory=1g num-executors=2 executor-cores=3 myprgm.jar

Find files on HDFS

   hg5k --state files

Get result file named res

   hg5k --getfromdfs res /home/yourUserName/reims


Destroy properly cluster

   spark_g5k --delete
   hg5k --delete


Accesory

  1. list of resources of your reservation

uniq $OAR_NODEFILE