Apache Spark : Différence entre versions
Ligne 12 : | Ligne 12 : | ||
'''Use case 2 : SOM-MapReduce/ Spark (Self-Organizing Map using MapReduce with Tugdual sarazin)''' | '''Use case 2 : SOM-MapReduce/ Spark (Self-Organizing Map using MapReduce with Tugdual sarazin)''' | ||
− | Self-organizing maps are increasingly used as tools for visualization, as they allow projection in small spaces that | + | Self-organizing maps are increasingly used as tools for visualization, as they allow projection in small spaces that |
− | are generally two dimensional. We have | + | are generally two dimensional. We have |
− | designed two scalable implementations of SOM-MapReduce | + | designed two scalable implementations of SOM-MapReduce |
− | algorithm. | + | algorithm. |
− | https://github.com/TugdualSarazin/spark-clustering | + | https://github.com/TugdualSarazin/spark-clustering |
'''Use case 3 : BITM-MR/ Spark (Biclustering using Self-Organizing Map and MapReduce with Tugdual sarazin) ''' | '''Use case 3 : BITM-MR/ Spark (Biclustering using Self-Organizing Map and MapReduce with Tugdual sarazin) ''' | ||
− | https://github.com/TugdualSarazin/spark-clustering | + | https://github.com/TugdualSarazin/spark-clustering |
Ligne 30 : | Ligne 30 : | ||
'''Use case 5 : G-Stream: Growing Neural Gas for Clustering Data Streams using Spark Streaming (With Mohammed Ghesmoune) ''' | '''Use case 5 : G-Stream: Growing Neural Gas for Clustering Data Streams using Spark Streaming (With Mohammed Ghesmoune) ''' | ||
− | + | to appear |
Version du 30 janvier 2016 à 14:00
We use an emerged open-source implementation named Spark , which is adapted to machine learning algorithms and supports applications with working sets while providing similar scalability and fault tolerance properties to MapReduce.
The great Spark power is being able to put the RDD in RAM, the time saved is considerable on algorithms iteratively using the same data set.
Use case 1 : Spark implementation of Nearest Neighbours Mean Shift using LSH (with Gael Beck)
We use Spark in order to implement a well know clustering algorithm, the mean shift. Results are encouraging and show that if we multiply by 3 the number of nodes in a cluster, we decrease the execution time by 2.
https://github.com/Kybe67/Mean-Shift-LSH
Use case 2 : SOM-MapReduce/ Spark (Self-Organizing Map using MapReduce with Tugdual sarazin)
Self-organizing maps are increasingly used as tools for visualization, as they allow projection in small spaces that are generally two dimensional. We have designed two scalable implementations of SOM-MapReduce algorithm.
https://github.com/TugdualSarazin/spark-clustering
Use case 3 : BITM-MR/ Spark (Biclustering using Self-Organizing Map and MapReduce with Tugdual sarazin)
https://github.com/TugdualSarazin/spark-clustering
Use case 4 : WADA, Web Application for Data Analysis (with students from Villetaneuse IUT)
https://github.com/CamilleGR/Wada
Use case 5 : G-Stream: Growing Neural Gas for Clustering Data Streams using Spark Streaming (With Mohammed Ghesmoune)
to appear