|
|
| To introduce the different issues of Data-Science, from a System perspective, we can take the analogy with the following category of cloud computing services: | | To introduce the different issues of Data-Science, from a System perspective, we can take the analogy with the following category of cloud computing services: |
| | | |
− | o SaaS (Software as a Service): It is sometimes referred to as "on-demand software ». In the context of Data-Science and data analysis software, It may concern to provide to end user data mining tools, algorithms, analytics suites… All these tools are available through a Web browser.
| + | * SaaS (Software as a Service): It is sometimes referred to as "on-demand software ». In the context of Data-Science and data analysis software, It may concern to provide to end user data mining tools, algorithms, analytics suites… All these tools are available through a Web browser. |
| | | |
− | o PaaS (Platform as a Service): it allows users to develop, run, and manage applications without the complexity of building and maintaining the infrastructure and through a Web browser In the context of Data-Science it provides to end users platforms to build their own data analytics applications or to extend and existing suite without any idea about the underlying physical architecture;
| + | * PaaS (Platform as a Service): it allows users to develop, run, and manage applications without the complexity of building and maintaining the infrastructure and through a Web browser In the context of Data-Science it provides to end users platforms to build their own data analytics applications or to extend and existing suite without any idea about the underlying physical architecture; |
| | | |
− | o IaaS (Infrastructure as a Service): In the context of Data-Science, but not only, it provides a set of virtualized resources (services, processors…) that developers can assemble to run analytics applications or to store data.
| + | * IaaS (Infrastructure as a Service): In the context of Data-Science, but not only, it provides a set of virtualized resources (services, processors…) that developers can assemble to run analytics applications or to store data. |
| | | |
− | o NaaS (Network as a Service): it describes services for network transport connectivity. In the context of Data-Science it may concern the production of Virtual Private Network that enable a host computer to send and receive data across shared or public networks with the functionalities and policies of the private network.
| + | * NaaS (Network as a Service): it describes services for network transport connectivity. In the context of Data-Science it may concern the production of Virtual Private Network that enable a host computer to send and receive data across shared or public networks with the functionalities and policies of the private network. |
| | | |
| Typical examples of frameworks that can be offered as a service (with some efforts to cloudily them) are Apache Haddop and Mahout, SciDB, CloudFlows, Spark, Flink, TensorFlow, BigML, Splunk Hunh… (mettre des hyper liens sur chacun des termes) | | Typical examples of frameworks that can be offered as a service (with some efforts to cloudily them) are Apache Haddop and Mahout, SciDB, CloudFlows, Spark, Flink, TensorFlow, BigML, Splunk Hunh… (mettre des hyper liens sur chacun des termes) |
|
|
| In the different disciplines of e-Sciences we find some generic vocabulary to talk about tasks (and elementary unit of work) and jobs (the composition of many jobs): | | In the different disciplines of e-Sciences we find some generic vocabulary to talk about tasks (and elementary unit of work) and jobs (the composition of many jobs): |
| | | |
− | o Tasks:
| + | * Tasks: |
− | o Single task application. For data Science it may concern with supervised or unsupervised classification, clustering, association rules discovery… | + | ** Single task application. For data Science it may concern with supervised or unsupervised classification, clustering, association rules discovery… |
− | o Parameter-Sweeping application. For data Science it may concern the analyzing of a dataset over multiple instances of the same classification algorithm; | + | ** Parameter-Sweeping application. For data Science it may concern the analyzing of a dataset over multiple instances of the same classification algorithm; |
− | o Workflow based application. For the data Science it may concern the discovering of a certain knowledge where the discovering application is specified as graphs linking data sources, discovering tools, data output. | + | ** Workflow based application. For the data Science it may concern the discovering of a certain knowledge where the discovering application is specified as graphs linking data sources, discovering tools, data output. |
| | | |
− | o Jobs:
| + | * Jobs: |
− | o HPC: using many computing resources over short periods of time | + | ** HPC: using many computing resources over short periods of time |
− | o HTC https://en.wikipedia.org/wiki/High-throughput_computing => a expliquer un tout petit peu | + | ** HTC https://en.wikipedia.org/wiki/High-throughput_computing => a expliquer un tout petit peu |
− | o MTC https://en.wikipedia.org/wiki/Many-task_computing => a expliquer un tout petit peu | + | ** MTC https://en.wikipedia.org/wiki/Many-task_computing => a expliquer un tout petit peu |
| | | |
| According to the type of tasks and jobs, computer scientists design programming models. | | According to the type of tasks and jobs, computer scientists design programming models. |
|
|
| New abstract programming models need to be considered for the Data-Science field. A research effort is needed to develop scalable, adaptive and general purpose models as well as models for the coordination of codes and data integration. Standardized formats for data and data exchange is also required. APIs are also needed to support cooperation between data producers. A scalable programming model must include mechanisms for: | | New abstract programming models need to be considered for the Data-Science field. A research effort is needed to develop scalable, adaptive and general purpose models as well as models for the coordination of codes and data integration. Standardized formats for data and data exchange is also required. APIs are also needed to support cooperation between data producers. A scalable programming model must include mechanisms for: |
| | | |
Exception encountered, of type "Error"
[a94ac4b9] /bigdata/index.php?diff=76&oldid=60&title=Infrastructure%2C_programming_models%2C_frameworks Error from line 434 of /home/bigdata/includes/diff/DairikiDiff.php: Call to undefined function each()
Backtrace:
#0 /home/bigdata/includes/diff/DairikiDiff.php(544): DiffEngine->diag()
#1 /home/bigdata/includes/diff/DairikiDiff.php(344): DiffEngine->compareSeq()
#2 /home/bigdata/includes/diff/DairikiDiff.php(227): DiffEngine->diffLocal()
#3 /home/bigdata/includes/diff/DairikiDiff.php(721): DiffEngine->diff()
#4 /home/bigdata/includes/diff/DairikiDiff.php(859): Diff->__construct()
#5 /home/bigdata/includes/diff/DairikiDiff.php(980): MappedDiff->__construct()
#6 /home/bigdata/includes/diff/TableDiffFormatter.php(194): WordLevelDiff->__construct()
#7 /home/bigdata/includes/diff/DiffFormatter.php(140): TableDiffFormatter->changed()
#8 /home/bigdata/includes/diff/DiffFormatter.php(82): DiffFormatter->block()
#9 /home/bigdata/includes/diff/DifferenceEngine.php(881): DiffFormatter->format()
#10 /home/bigdata/includes/diff/DifferenceEngine.php(797): DifferenceEngine->generateTextDiffBody()
#11 /home/bigdata/includes/diff/DifferenceEngine.php(728): DifferenceEngine->generateContentDiffBody()
#12 /home/bigdata/includes/diff/DifferenceEngine.php(662): DifferenceEngine->getDiffBody()
#13 /home/bigdata/includes/diff/DifferenceEngine.php(632): DifferenceEngine->getDiff()
#14 /home/bigdata/includes/diff/DifferenceEngine.php(453): DifferenceEngine->showDiff()
#15 /home/bigdata/includes/page/Article.php(797): DifferenceEngine->showDiffPage()
#16 /home/bigdata/includes/page/Article.php(508): Article->showDiffPage()
#17 /home/bigdata/includes/actions/ViewAction.php(44): Article->view()
#18 /home/bigdata/includes/MediaWiki.php(490): ViewAction->show()
#19 /home/bigdata/includes/MediaWiki.php(287): MediaWiki->performAction()
#20 /home/bigdata/includes/MediaWiki.php(714): MediaWiki->performRequest()
#21 /home/bigdata/includes/MediaWiki.php(508): MediaWiki->main()
#22 /home/bigdata/index.php(41): MediaWiki->run()
#23 {main}