Apache Spark : Différence entre versions

De BIGDATA
Aller à : navigation, rechercher
Ligne 12 : Ligne 12 :
 
'''Use case 2 : SOM-MapReduce/ Spark  (Self-Organizing Map using MapReduce with Tugdual sarazin)'''
 
'''Use case 2 : SOM-MapReduce/ Spark  (Self-Organizing Map using MapReduce with Tugdual sarazin)'''
 
      
 
      
Self-organizing maps are increasingly used as tools for visualization, as they allow projection in small spaces that
+
Self-organizing maps are increasingly used as tools for visualization, as they allow projection in small spaces that
are generally two dimensional. We have
+
are generally two dimensional. We have
designed two scalable implementations of SOM-MapReduce
+
designed two scalable implementations of SOM-MapReduce
algorithm.  
+
algorithm.  
  
https://github.com/TugdualSarazin/spark-clustering
+
https://github.com/TugdualSarazin/spark-clustering
  
  
 
'''Use case 3 : BITM-MR/ Spark  (Biclustering using Self-Organizing Map and  MapReduce with Tugdual sarazin) '''
 
'''Use case 3 : BITM-MR/ Spark  (Biclustering using Self-Organizing Map and  MapReduce with Tugdual sarazin) '''
  
https://github.com/TugdualSarazin/spark-clustering
+
https://github.com/TugdualSarazin/spark-clustering
  
  
Ligne 30 : Ligne 30 :
  
 
'''Use case 5 : G-Stream: Growing Neural Gas for Clustering Data Streams using Spark Streaming (With Mohammed Ghesmoune) '''
 
'''Use case 5 : G-Stream: Growing Neural Gas for Clustering Data Streams using Spark Streaming (With Mohammed Ghesmoune) '''
    to appear
+
    to appear

Version du 30 janvier 2016 à 14:00

We use an emerged open-source implementation named Spark , which is adapted to machine learning algorithms and supports applications with working sets while providing similar scalability and fault tolerance properties to MapReduce.

The great Spark power is being able to put the RDD in RAM, the time saved is considerable on algorithms iteratively using the same data set.

Use case 1 : Spark implementation of Nearest Neighbours Mean Shift using LSH (with Gael Beck)

We use Spark in order to implement a well know clustering algorithm, the mean shift. Results are encouraging and show that if we multiply by 3 the number of nodes in a cluster, we decrease the execution time by 2.

https://github.com/Kybe67/Mean-Shift-LSH


Use case 2 : SOM-MapReduce/ Spark (Self-Organizing Map using MapReduce with Tugdual sarazin)

Self-organizing maps are increasingly used as tools for visualization, as they allow projection in small spaces that
are generally two dimensional. We have
designed two scalable implementations of SOM-MapReduce
algorithm. 
https://github.com/TugdualSarazin/spark-clustering


Use case 3 : BITM-MR/ Spark (Biclustering using Self-Organizing Map and MapReduce with Tugdual sarazin)

https://github.com/TugdualSarazin/spark-clustering


Use case 4 : WADA, Web Application for Data Analysis (with students from Villetaneuse IUT)

       https://github.com/CamilleGR/Wada

Use case 5 : G-Stream: Growing Neural Gas for Clustering Data Streams using Spark Streaming (With Mohammed Ghesmoune)

    to appear