Apache Spark : Différence entre versions
Ligne 2 : | Ligne 2 : | ||
The great Spark power is being able to put the RDD in RAM, the time saved is considerable on algorithms iteratively using the same data set. | The great Spark power is being able to put the RDD in RAM, the time saved is considerable on algorithms iteratively using the same data set. | ||
+ | |||
'''Use case 1 : Spark implementation of Nearest Neighbours Mean Shift using LSH (with Gael Beck)''' | '''Use case 1 : Spark implementation of Nearest Neighbours Mean Shift using LSH (with Gael Beck)''' | ||
− | We use Spark in order to implement a well know clustering algorithm, the mean shift. Results are encouraging and show that if we multiply by 3 the number of nodes in a cluster, we decrease the execution time by 2. | + | We use Spark in order to implement a well know clustering algorithm, the mean shift. Results are encouraging and show that if we multiply by 3 the number of nodes in a cluster, we decrease the execution time by 2. |
− | https://github.com/Kybe67/Mean-Shift-LSH | + | https://github.com/Kybe67/Mean-Shift-LSH |
Version du 30 janvier 2016 à 14:00
We use an emerged open-source implementation named Spark , which is adapted to machine learning algorithms and supports applications with working sets while providing similar scalability and fault tolerance properties to MapReduce.
The great Spark power is being able to put the RDD in RAM, the time saved is considerable on algorithms iteratively using the same data set.
Use case 1 : Spark implementation of Nearest Neighbours Mean Shift using LSH (with Gael Beck)
We use Spark in order to implement a well know clustering algorithm, the mean shift. Results are encouraging and show that if we multiply by 3 the number of nodes in a cluster, we decrease the execution time by 2.
https://github.com/Kybe67/Mean-Shift-LSH
Use case 2 : SOM-MapReduce/ Spark (Self-Organizing Map using MapReduce with Tugdual sarazin)
Self-organizing maps are increasingly used as tools for visualization, as they allow projection in small spaces that are generally two dimensional. We have designed two scalable implementations of SOM-MapReduce algorithm.
https://github.com/TugdualSarazin/spark-clustering
Use case 3 : BITM-MR/ Spark (Biclustering using Self-Organizing Map and MapReduce with Tugdual sarazin)
https://github.com/TugdualSarazin/spark-clustering
Use case 4 : WADA, Web Application for Data Analysis (with students from Villetaneuse IUT)
https://github.com/CamilleGR/Wada
Use case 5 : G-Stream: Growing Neural Gas for Clustering Data Streams using Spark Streaming (With Mohammed Ghesmoune)
to appear