Apache Spark : Différence entre versions

Version du 30 janvier 2016 à 14:00

We use an emerged open-source implementation named Spark , which is adapted to machine learning algorithms and supports applications with working sets while providing similar scalability and fault tolerance properties to MapReduce.

The great Spark power is being able to put the RDD in RAM, the time saved is considerable on algorithms iteratively using the same data set.

Use case 1 : Spark implementation of Nearest Neighbours Mean Shift using LSH (with Gael Beck)

We use Spark in order to implement a well know clustering algorithm, the mean shift. Results are encouraging and show that if we multiply by 3 the number of nodes in a cluster, we decrease the execution time by 2.

https://github.com/Kybe67/Mean-Shift-LSH

Use case 2 : SOM-MapReduce/ Spark (Self-Organizing Map using MapReduce with Tugdual sarazin)

Self-organizing maps are increasingly used as tools for visualization, as they allow projection in small spaces that
are generally two dimensional. We have
designed two scalable implementations of SOM-MapReduce
algorithm.

https://github.com/TugdualSarazin/spark-clustering

Use case 3 : BITM-MR/ Spark (Biclustering using Self-Organizing Map and MapReduce with Tugdual sarazin)

https://github.com/TugdualSarazin/spark-clustering

Use case 4 : WADA, Web Application for Data Analysis (with students from Villetaneuse IUT)

       https://github.com/CamilleGR/Wada

Use case 5 : G-Stream: Growing Neural Gas for Clustering Data Streams using Spark Streaming (With Mohammed Ghesmoune)

    to appear

Apache Spark : Différence entre versions

Version du 30 janvier 2016 à 14:00

Menu de navigation

Outils personnels

Espaces de noms

Variantes

Affichages

Plus

Rechercher

Navigation

Outils