'''1- Introduction'''
 
'''1- Introduction'''
   −
In the context of Data-Science (https://en.wikipedia.org/wiki/Data_science + mettre une reference sur un bouquin académique)
+
In the context of Data-Science (https://en.wikipedia.org/wiki/Data_science , http://statweb.stanford.edu/~tibs/ElemStatLearn/)
 
large scale systems must address numerous issues, among them the issues of provenance, analysis, preservation. Computing models are also numerous, among them the cloud family (private, public, hybrid), the cluster family (including the federation of clusters that we call grids), and soon computing models for the platforms of the Internet of Things (IoT).  
 
large scale systems must address numerous issues, among them the issues of provenance, analysis, preservation. Computing models are also numerous, among them the cloud family (private, public, hybrid), the cluster family (including the federation of clusters that we call grids), and soon computing models for the platforms of the Internet of Things (IoT).  
   −
'''2- A service's view for Data-Science'''
+
'''2- A service oriented view for Data-Science'''
    
To introduce the different issues of Data-Science, from a System perspective, we can take the analogy with the following category of cloud computing services:
 
To introduce the different issues of Data-Science, from a System perspective, we can take the analogy with the following category of cloud computing services:
   −
o SaaS (Software as a Service): It is sometimes referred to as "on-demand software ». In the context of Data-Science and data analysis software, It may concern to provide to end user data mining tools, algorithms, analytics suites… All these tools are available through a Web browser.  
+
* SaaS (Software as a Service): It is sometimes referred to as "on-demand software ». In the context of Data-Science and data analysis software, It may concern to provide to end user data mining tools, algorithms, analytics suites… All these tools are available through a Web browser.  
   −
o PaaS (Platform as a Service): it allows users to develop, run, and manage applications without the complexity of building and maintaining the infrastructure and through a Web browser In the context of Data-Science it provides to end users platforms to build their own data analytics applications or to extend and existing suite without any idea about the underlying physical architecture;
+
* PaaS (Platform as a Service): it allows users to develop, run, and manage applications without the complexity of building and maintaining the infrastructure and through a Web browser In the context of Data-Science it provides to end users platforms to build their own data analytics applications or to extend and existing suite without any idea about the underlying physical architecture;
   −
o IaaS (Infrastructure as a Service): In the context of Data-Science, but not only, it provides a set of virtualized resources (services, processors…) that developers can assemble to run analytics applications or to store data.
+
* IaaS (Infrastructure as a Service): In the context of Data-Science, but not only, it provides a set of virtualized resources (services, processors…) that developers can assemble to run analytics applications or to store data.
   −
o NaaS (Network as a Service): it describes services for network transport connectivity. In the context of Data-Science it may concern the production of Virtual Private Network that enable a host computer to send and receive data across shared or public networks with the functionalities and policies of the private network.
+
* NaaS (Network as a Service): it describes services for network transport connectivity. In the context of Data-Science it may concern the production of Virtual Private Network that enable a host computer to send and receive data across shared or public networks with the functionalities and policies of the private network.
    
Typical examples of frameworks that can be offered as a service (with some efforts to cloudily them) are Apache Haddop and Mahout, SciDB, CloudFlows, Spark, Flink, TensorFlow, BigML, Splunk Hunh… (mettre des hyper liens sur chacun des termes)
 
Typical examples of frameworks that can be offered as a service (with some efforts to cloudily them) are Apache Haddop and Mahout, SciDB, CloudFlows, Spark, Flink, TensorFlow, BigML, Splunk Hunh… (mettre des hyper liens sur chacun des termes)
 
In the different disciplines of e-Sciences we find some generic vocabulary to  talk about tasks (and elementary unit of work) and jobs (the composition of many jobs):
 
In the different disciplines of e-Sciences we find some generic vocabulary to  talk about tasks (and elementary unit of work) and jobs (the composition of many jobs):
   −
o Tasks:
+
* Tasks:
    o Single task application. For data Science it may concern with supervised or unsupervised classification, clustering, association rules discovery…
+
** Single task application. For data Science it may concern with supervised or unsupervised classification, clustering, association rules discovery…
    o Parameter-Sweeping application. For data Science it may concern the analyzing of a dataset over multiple instances of the same classification algorithm;
+
** Parameter-Sweeping application. For data Science it may concern the analyzing of a dataset over multiple instances of the same classification algorithm;
    o Workflow based application. For the data Science it may concern the discovering of a certain knowledge where the discovering application is specified as graphs linking data sources, discovering tools, data output.
+
** Workflow based application. For the data Science it may concern the discovering of a certain knowledge where the discovering application is specified as graphs linking data sources, discovering tools, data output.
   Exception encountered, of type "Error"
[74d76f5f] /bigdata/index.php?diff=79&oldid=50&title=Infrastructure%2C_programming_models%2C_frameworks Error from line 434 of /home/bigdata/includes/diff/DairikiDiff.php: Call to undefined function each()
Backtrace:
#0 /home/bigdata/includes/diff/DairikiDiff.php(544): DiffEngine->diag()
#1 /home/bigdata/includes/diff/DairikiDiff.php(344): DiffEngine->compareSeq()
#2 /home/bigdata/includes/diff/DairikiDiff.php(227): DiffEngine->diffLocal()
#3 /home/bigdata/includes/diff/DairikiDiff.php(721): DiffEngine->diff()
#4 /home/bigdata/includes/diff/DairikiDiff.php(859): Diff->__construct()
#5 /home/bigdata/includes/diff/DairikiDiff.php(980): MappedDiff->__construct()
#6 /home/bigdata/includes/diff/TableDiffFormatter.php(194): WordLevelDiff->__construct()
#7 /home/bigdata/includes/diff/DiffFormatter.php(140): TableDiffFormatter->changed()
#8 /home/bigdata/includes/diff/DiffFormatter.php(82): DiffFormatter->block()
#9 /home/bigdata/includes/diff/DifferenceEngine.php(881): DiffFormatter->format()
#10 /home/bigdata/includes/diff/DifferenceEngine.php(797): DifferenceEngine->generateTextDiffBody()
#11 /home/bigdata/includes/diff/DifferenceEngine.php(728): DifferenceEngine->generateContentDiffBody()
#12 /home/bigdata/includes/diff/DifferenceEngine.php(662): DifferenceEngine->getDiffBody()
#13 /home/bigdata/includes/diff/DifferenceEngine.php(632): DifferenceEngine->getDiff()
#14 /home/bigdata/includes/diff/DifferenceEngine.php(453): DifferenceEngine->showDiff()
#15 /home/bigdata/includes/page/Article.php(797): DifferenceEngine->showDiffPage()
#16 /home/bigdata/includes/page/Article.php(508): Article->showDiffPage()
#17 /home/bigdata/includes/actions/ViewAction.php(44): Article->view()
#18 /home/bigdata/includes/MediaWiki.php(490): ViewAction->show()
#19 /home/bigdata/includes/MediaWiki.php(287): MediaWiki->performAction()
#20 /home/bigdata/includes/MediaWiki.php(714): MediaWiki->performRequest()
#21 /home/bigdata/includes/MediaWiki.php(508): MediaWiki->main()
#22 /home/bigdata/index.php(41): MediaWiki->run()
#23 {main}