Data Science or not Data Science? : Différence entre versions

De BIGDATA
Aller à : navigation, rechercher
Ligne 11 : Ligne 11 :
 
   [https://www.grid5000.fr/mediawiki/index.php/Moving_Data_around_Grid%275000 Tutorial on moving data around Grid'5000]
 
   [https://www.grid5000.fr/mediawiki/index.php/Moving_Data_around_Grid%275000 Tutorial on moving data around Grid'5000]
 
   [https://www.oreilly.com/ideas/clustering-geolocated-data-using-spark-and-dbscan?imm_mid=0dfe1f&cmp=em-data-na-na-newsltr_20160203&utm_content=bufferaa451&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer Clustering geolocated data using Spark and DBSCAN] (illustrative example)
 
   [https://www.oreilly.com/ideas/clustering-geolocated-data-using-spark-and-dbscan?imm_mid=0dfe1f&cmp=em-data-na-na-newsltr_20160203&utm_content=bufferaa451&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer Clustering geolocated data using Spark and DBSCAN] (illustrative example)
   [https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html Spark et Scikit-Learn] (illustrative example)
+
   [https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html Spark and Scikit-Learn] (illustrative example)
  
 
'''Our experience is with the following tools:'''
 
'''Our experience is with the following tools:'''

Version du 10 février 2016 à 14:07

Welcome to LIPN Wiki on Big Data and Cloud Computing

With more and more data produced every day, we need to pay a special attention on the technologies to use in order to be able to analyze large amount of data. Big Data is often characterized by the 4 V for Volume, Variety, Velocity, Veracity that constitute challenges for the required tools.

Machine learning is to extract knowledge from data. In short it's a family of algorithms that transform data into model or description with the aim to predict or categorize data. In this field we use also analytics tools consisting to present informations in a more readable way as for the Square Predict project. Others projects related to big data and cloud computing are the Wendelin and the Resilience projects.

The wiki is related to our experience on the Grid5000, Teralab and CIRRUS testbeds for the study of the Software, Platform, Infrastructure and Network layers that push forward the Data Science field according to an experimental scientific method. We must not confuse the 'scientific problem' term and the 'System scientific problem' term to serve the scientific problem.

General discussion on Systems for Big-Data

  Infrastructure, programming models, frameworks
  Tutorial on moving data around Grid'5000
  Clustering geolocated data using Spark and DBSCAN (illustrative example)
  Spark and Scikit-Learn (illustrative example)

Our experience is with the following tools:

   Apache Spark
   Apache Flink
   TensorFlow
   Wendelin See also this link
   SlapOS
   Spark-notebook

Testbeds we use in conjunction with our experimental method:

   Grid5000
   CIRRUS @ Université Sorbonne Paris Cité
   Teralab
   Amazon

Apache Spark

  Some Apache Spark implementations (since 2011/2012)
  How to use Spark on Grid5000

SlapOS cloud

  General information on SlapOS
  BOINC as a Service for the SlapOS Cloud: Tools and Methods
  Déploiement de la plate-forme SlapOS dans l'environnement Grid'5000
  Synthesis of the LIPN work for the FUI Resilience project

TeraLab

  General information on TeraLab
  How to use TeraLab
  TeraLab and SlapOS and VFIB fees for managing your infrasructure.

Thesis corner

  To be augmented
  Leila Abidi: Revisiter les grilles de PCs avec des technologies du Web et le cloud computing (2015)
  [ Walid Saad]: Gestion de données pour le calcul scientifique dans les environnements grilles et cloud (2016)
  [Tugdual Sarazin] Apprentissage massivement distribué dans un environnement "Big data" (2016)
  [Mohammed Ghesmoune] Fouille de flux de données massives. Application aux “BigData” d’assurance (2016)
  [Hippolyte Léger] Apprentissage Relationnel Massif (2018)




MediaWiki a été installé avec succès.

Consultez le Guide de l’utilisateur pour plus d’informations sur l’utilisation de ce logiciel de wiki.

Pour démarrer