Data Science or not Data Science?
Welcome on LIPN Wiki about Big Data
With more and more data produced every day, we need to pay a special attention on the technologies to use in order to be able to analyze large amount of data. Big Data is often characterized by the 4 V for Volume, Variety, Velocity, Veracity that constitute challenges for the required tools.
Machine learning is to extract knowledge from data. In short it's a family of algorithms which transform data into model or description with the aim to predict or categorize data. In this field we use also analytics tools which consist to present informations in a more readable way.
'General discussions on Systems for Big-Data
Infrastructure, programming models, frameworks
Tools we use
Apache Spark : http://spark.apache.org/ Apache Flink : https://flink.apache.org/ TenserFlow : https://www.tensorflow.org/ Wendelin : http://www.nexedi.com/NXD-Document.Blog.Wendelin.Release.0.4.alpha
Testbeds we use
Grid5000 : https://www.grid5000.fr/mediawiki/index.php/Grid5000:Home Cirrus : http://cirrus.uspc.fr
Apache Spark
General information on Apache Spark How to use Spark on Grid5000
MediaWiki a été installé avec succès.
Consultez le Guide de l’utilisateur pour plus d’informations sur l’utilisation de ce logiciel de wiki.