International Workshop on High Performance Data Intensive Computing
(HPDIC2014)

In Conjunction with IEEE IPDPS 2014 May 19-23, 2014, Arizona Grand Resort, PHOENIX (Arizona) USA.

Important dates

Workshop Paper Due: ~~January 15th, 2014~~
↝ January 30th, 2014 (firm deadline)
Author Notification: February 22th, 2014
Camera-ready Paper Due: March 14th, 2014

Highlight of the 2014 edition

All the accepted papers are invited to submit extended version to the IJBDI and IJCSE journal. A first Call for Papers is now on the Inderscience website, under the IJBDI home page Calls for Papers (http://www.inderscience.com/ijbdi, specifically http://www.inderscience.com/info/ingeneral/cfp.php?id=2339). Please consider these CFPs as an opportunity.

NEW: if you participated in the workshop as a speaker or listener, please find some photos HERE.

Program at a Glance
FRIDAY, 23 May 2014
Arizona Grand Resort, PHOENIX (Arizona) USA

Note: each paper has 25 min (20 min for presentation and 5 min for Q&A).

09:00 - 09:40
Keynote Speech

Title: pMem – Persistent Memory for Data-intensive Applications
Speaker: Karsten Schwan (joint work with Sudarsun Kannan and Ada Gavrilovska), Center for Experimental Research in Computer Systems (CERCS) at Georgia Tech, Atlanta, USA.
Summary: This talk presents the opportunities and challenges presented by future memory technologies like non-volatile RAM (NVM) that offer increased memory capacity as well as fast persistent storage. Prior research has focused either on improving memory scalability by replacing DRAMs with PCM or improving persistent storage by using PCM as nonvolatile heap. In the resource-constrained future exascale nodes, however, it is desirable to leverage PCM for both its capacity and persistence properties. Our research, therefore, is exploring how to obtain these 'dual benefits' of PCM. Specifically, we investigate and evaluate the impact of using PCM for its persistence properties on the performance of applications that are using PCM for capacity. We show that current shared last level cache architectures will cause severe impacts on applications requiring increased memory capacity when there are co-runners using PCM for persistence, via increased cache miss rates experienced by `capacity' applications. In response, we propose novel methods that e.g., use application page contiguity metrics to reduce such misses. We also investigate other software overheads like those relating to memory allocation allocator, then develop methods to reduce them by redesigning allocator data structures. Current results obtained for end devices are now being extended to also consider server systems and applications.
PDF of the keynote

09:40--10:00
Coffee Break

10:00-11:15
Session 1: Memory, I/O and Performance Enhancement

HPDIC01
Compactor : Optimization Framework at Staging I/O nodes
Vishwanath Venkatesan, Mohamad Chaarawi, Quincey Koziol and Edgar Gabriel
University of Houston, USA
The HDF Group,USA

HPDIC04
Hybrid BFS Approach Using Semi-External Memory
Keita Iwabuchi, Hitoshi Sato, Ryo Mizote, Yuichiro Yasui, Katsuki Fujisawa and Satoshi Matsuoka
Tokyo Institute of Technology, Japan
Chuo University, Japan
Japan Science and Technology Agency, Japan

HPDIC05
Model-driven Data Layout Selection for Improving Read Performance
Jialin Liu,Surendra Byna, Bin Dong, Kesheng Wu,Yong Chen
Texas Tech University, USA
Lawrence Berkeley Laboratory, USA

11:15--12:55
Session 2: Clustering, Data Management, and Applications

HPDIC02
Scalable and Reliable Data Broadcast with Kascade
Stephane Martin, Tomasz Buchert, Pierric Willemet, Olivier Richardy, Emmanuel Jeanvoine, Lucas Nussbaum
Universite de Lorraine, France
Universite de Grenoble, France

HPDIC03
SOM Clustering using Spark-MapReduce
Tugdual Sarazin, Mustapha Lebbah and Hanane Azzag
University of Paris 13, CNRS UMR 7030, France,

HPDIC06
Optimizing The Join Operation on Hive to Accelerate Cross-Matching in Astronomy
Liang Li, Dixin Tang, Taoying Liu, Hong Liu, Wei Li,Chenzhou Cui
Institute of Computing Technology, CAS, China
National Astronomical Observatories, CAS, China

Closing Remarks

12:55--14:00:     Lunch

Description

Over the recent years, data generated by humanities, scientific activities, as well as commercial applications from a diverse range of fields have been increasing exponentially which is typically referred to as Big Data. Data volumes of applications in the fields of sciences and engineering, finance, media, online information resources, etc. are expected to double every two years over the next decade and further. With this continuing data explosion, it is necessary to store and process data efficiently by utilizing enormous computing power that is available in the form of multi/manycore platforms. This increase in the demand for high performance large-scale data processing has necessitated collaboration and sharing of data collections among the world's leading education, research, and industrial institutions and use of distributed resources owned by collaborating parties. This kind of data intensive computing is posing many challenges in exploiting parallelism of current and upcoming computer architectures, such as automated data collection and provisioning, system monitoring and management, programming models, etc. Performance related aspects are becoming the bottlenecks for implementation, deployment and commercial application and its operation in data intensive computing system. The high performance data intensive computing paradigm also comes up with algorithmic and engineering issues such as performance aspects not yet eminent but expected to grow with their scaling of the large scale systems, and the dynamics of management. These new challenges may comprise, sometimes even deteriorate the performance, efficiency, and scalability of the dedicated data intensive computing systems.

There is no doubt in the industry and research community that the importance of data intensive computing has been raising and will continue to be the foremost fields of research. This raise brings up many research issues, in forms of capturing and accessing data effectively and fast, processing it while still achieving high performance and high throughput, and storing it efficiently for future use. Programming for high performance yielding data intensive computing is an important challenging issue. Expressing data access requirements of applications and designing programming language abstractions to exploit parallelism are at immediate need. Application and domain specific optimizations are also parts of a viable solution in data intensive computing. While these are a few examples of issues, research in data intensive computing has become quite intense during the last few years yielding strong results.

Moreover, in a widely distributed environment, data is often not locally accessible and has thus to be remotely retrieved and stored. While traditional distributed systems work well for computation that requires limited data handling, they may fail in unexpected ways when the computation accesses, creates, and moves large amounts of data especially over wide-area networks. Further, data accessed and created is often poorly described, lacking both metadata and provenance. Scientists, researchers, and application developers are often forced to solve basic data-handling issues, such as physically locating data, how to access it, and/or how to move it to visualization and/or compute resources for further analysis.

This workshop focuses on the challenges imposed by high performance data-intensive applications on distributed systems, and on the different state-of-the-art solutions proposed to overcome these challenges. It brought together the collaborative and distributed computing community and the data management community in an effort to generate productive conversations on the planning, management, and scheduling of data handling tasks and data storage resources.

It is evident that data-intensive research is transforming computing landscape. We are facing the challenge of handling the deluge of data generated by sensors and modern instruments that are widely used in all domains. The number of sources of data is increasing, while, at the same time, the diversity, complexity and scale of these data resources are also growing dramatically.

After the success of HPDIC 2012 and 2013, the 2014 edition (HPDIC2014) is a forum for professionals involved in data intensive computing and high performance computing. The goal of this workshop is to bridge the gap between theory and practice in the field of high performance data intensive computing and bring together researchers and practitioners from academia and industry working on high performance data intensive computing technologies. We believe that high performance data intensive computing will benefit from close interaction between researchers and industry practitioners, so that the research can inform current deployments and deployment challenges can inform new research. In support of this, HPDIC2014 will provide a forum for both academics and industry practitioners to share their ideas and experiences, discuss challenges and recent advances, introduce developments and tools, identify open issues, present applications and enhancements for data intensive computing systems and report state-of-the-art and in-progress research, leverage each other's perspectives, and identify new/emerging trends in this important area.

We therefore cordially invite contributions that investigate these issues, introduce new execution environments, apply performance evaluations and show the applicability to science and enterprise applications. We welcome various different kinds of papers that could formalize, simplify and optimize all the aspects of existing data intensive applications in science, engineering and business. We particularly encourage the submission of position papers that describe novel research directions and work that is in its formative stages, and papers about practical experiences and lessons learned from production systems.

Papers of applied research, industrial experience reports, work-in-progress and vision papers with different criteria for each category that describe recent advances and efforts in the design and development of data intensive computing, functionalities and capabilities that will benefit many applications are also solicited.

List of topics

Topics of interests include, but are not limited to:

High performance distributed cache and optimization
High performance data transfer and ingestion
NoSQL data store
Machine Learning Algorithms for Big Data
Data intensive computing in science, commerce, entertainment and medicine
Data-intensive applications and their challenges
Data Clouds, Data Grids, and Data Centers
Data-aware toolkits and middleware
Network support for data-intensive computing
Remote and distributed visualization of large scale data
Data archives, digital libraries, and preservation
Service oriented architectures for data-intensive computing
High performance data access toolkits
Power and energy efficiency
Accountability, QoS, and SLAs
Data privacy and protection in a public cloud environment
Programming models, abstractions for data intensive computing
Data capturing, management, and scheduling techniques
Future research challenges of data intensive computing
Security and protection of sensitive data in collaborative environments
MapReduce implementation issues and improvements
MapReduce,Hadoop,Spark and their applications in data intensive computing
Large-scale MapReduce (Grid and Desktop Grid)
Scientific data-sets analysis
Monitoring, troubleshooting, and failure recovery
Distributed I/O (wide-area, grid, peer-to-peer)
Search and data retrieval
Storage and file systems
Performance measurement, analytic modeling, simulation
Distributed Ensemble Classifier

Submission Instructions

Please submit full papers in PDF or doc format via the submission system. Do not email submissions. Papers must be written in English.

The complete submission must be no longer than ten (10) pages. It should be typeset in two-column format in 10 point type on 12 point (single-spaced) leading. References should not be set in a smaller font. Submissions that violate any of these restrictions may not be reviewed. The limits will be interpreted fairly strictly, and no extensions will be given for reformatting. Final author manuscripts will be 8.5" x 11" (two columns IEEE format), not exceeding 10 pages; max 2 extra pages allowed at additional cost.

The names of authors and their affiliations should be included on the first page of the submission.

Simultaneous submission of the same work to multiple venues, submission of previously published work, or plagiarism constitutes dishonesty or fraud.

Reviewing of full papers will be done by the program committee, assisted by outside referees. Accepted papers will be shepherded through an editorial review process by a member of the program committee.

By submitting a paper, you agree that at least one of the authors will attend the workshop to present it. Otherwise, the paper will be excluded from the digital library of IEEE.

Please submit papers via EasyChair for HPDIC2014 (in case of problems, please send emails to the workshop chairs)

General Chairs

Christophe CERIN, Professor, University of Paris XIII, France

E-mail: christophe.cerin@lipn.univ-paris13.fr

Cong-Feng JIANG, PhD,Hangzhou Dianzi University, China

E-mail: cjiang@hdu.edu.cn

Program Chairs

Yuqing GAO, IEEE Fellow, IBM T. J. Watson Research, USA

E-mail: yuqing@us.ibm.com

Jilin ZHANG, PhD,Hangzhou Dianzi University, China

E-mail: jilin.zhang@hdu.edu.cn

Program Committee Members

Walter Binder, University of Lugano, Switzerland
Guoray Cai, Pennsylvania State University, USA
Jiannong Cao, Hong Kong Polytechnic University, Hong Kong
Jean-Paul Smets, Nexedi Inc., France
Weihui Dai, Fudan University, China
Qiang Duan, Pennsylvania State University, USA
Amit Dvir, Budapest University of Technology and Economics, Hungary
Gilles Fedak,INRIA Rhone-Alpes, Lyon, France
Vatche Ishakian, Boston University, USA
Woosung Jung, Chungbuk National University, Korea
Mustapha Lebbah, Paris University XIII, France
Yong Woo Lee, University of Seoul, Korea
Chunlei Liu, Valdosta State University, USA
Lei Liu, Karlsruhe Institute of Technology, Germany
Shiyong Lu, Wayne State University, USA
Audun Nordal, University of Tromsoe, Norway
Weisong Shi, Wayne State University, USA
R.K. Shyamasundar, Tata Institute of Fundamental Research, India
Kumiko Tadano，NEC Corporation，Japan
Peng Di, University of New South Wales，Australia
Jian Tan, IBM Research, USA
Hong-Linh Truong, Vienna University of Technology, Austria
Xiaofei Zhang, Hong Kong University of Science and Technology, Hong Kong
Brian Vinter , Copenhagen University, Denmark
Vincent Lemaire, Orange, France
Kevin Wang,University of Auckland, New Zealand
Stephen Wang, Toshiba Telecommunications Research Laboratory Europe, UK
Jie Wu, Yale University, USA
Bin Xiao, Hong Kong Polytechnic University, Hong Kong
Jue Wang, supercomputing center of CAS, China
Jian Zhao, Institute for Infocomm Research, Singapore
Tingwei Chen, Liaoning University, China
Hui Ma, Victoria University of Wellington, New Zealand

International Workshop on High Performance Data Intensive Computing (HPDIC2014)