| 
                  (HPDIC2014) | 
In Conjunction with IEEE IPDPS 2014 May 19-23, 2014, Arizona Grand Resort, PHOENIX (Arizona) USA.
All the accepted papers are invited to submit extended version to the IJBDI and IJCSE journal. A first Call for Papers is now on the Inderscience website, under the IJBDI home page Calls for Papers (http://www.inderscience.com/ijbdi, specifically http://www.inderscience.com/info/ingeneral/cfp.php?id=2339). Please consider these CFPs as an opportunity.
NEW: if you participated in the workshop as a speaker or listener, please find some photos HERE.
09:00 - 09:40
Keynote
          Speech    
        Title: pMem – Persistent Memory for Data-intensive Applications
        
Speaker: Karsten Schwan (joint work with Sudarsun Kannan and
Ada Gavrilovska), Center for Experimental Research in Computer Systems (CERCS) at Georgia Tech, Atlanta, USA. 
Summary: This talk presents the opportunities and challenges presented by future memory technologies like non-volatile RAM (NVM) that offer increased memory capacity as well as fast persistent storage. Prior research has focused either on improving memory scalability by replacing DRAMs with PCM or improving persistent storage by using PCM as nonvolatile heap. In the resource-constrained future exascale nodes, however, it is desirable to leverage PCM for both its capacity and persistence properties. Our research, therefore, is exploring how to obtain these 'dual benefits' of PCM. Specifically, we investigate and evaluate the impact of using PCM for its persistence properties on the performance of applications that are using PCM for capacity. We show that current shared last level cache architectures will cause severe impacts on applications requiring increased memory capacity when there are co-runners using PCM for persistence, via increased cache miss rates experienced by `capacity' applications. In response, we propose novel methods that e.g., use application page contiguity metrics to reduce such misses. We also investigate other software overheads like those relating to memory allocation allocator, then develop methods to reduce them by redesigning allocator data structures. Current results obtained for end devices are now being extended to also consider server systems and applications.
PDF of the keynote
        
        09:40--10:00
 Coffee
        Break    
        
        10:00-11:15     
        Session 1: Memory, I/O and Performance Enhancement 
        
        HPDIC01    
        Compactor : Optimization Framework at Staging I/O nodes
        Vishwanath Venkatesan, Mohamad Chaarawi, Quincey Koziol and Edgar
        Gabriel    
        University of Houston, USA
        The HDF Group,USA
        
        HPDIC04    
        Hybrid BFS Approach Using Semi-External Memory
        Keita Iwabuchi, Hitoshi Sato, Ryo Mizote, Yuichiro Yasui, Katsuki
        Fujisawa and Satoshi Matsuoka
        Tokyo Institute of Technology, Japan
        Chuo University, Japan
        Japan Science and Technology Agency, Japan
        
        HPDIC05    
        Model-driven Data Layout Selection for Improving Read Performance
        Jialin Liu,Surendra Byna, Bin Dong, Kesheng Wu,Yong
        Chen    
        Texas Tech University, USA
        Lawrence Berkeley Laboratory, USA
        
        11:15--12:55
        Session 2: Clustering, Data Management, and Applications
        
        HPDIC02    
        Scalable and Reliable Data Broadcast with Kascade
        Stephane Martin, Tomasz Buchert, Pierric Willemet, Olivier Richardy,
        Emmanuel Jeanvoine, Lucas Nussbaum    
        Universite de Lorraine, France
        Universite de Grenoble, France
HPDIC03    
        SOM Clustering using Spark-MapReduce
        Tugdual Sarazin, Mustapha Lebbah and Hanane Azzag    
        University of Paris 13,  CNRS UMR 7030, France,
        
        HPDIC06    
        Optimizing The Join Operation on Hive to Accelerate Cross-Matching
          in Astronomy
        Liang Li, Dixin Tang, Taoying Liu, Hong Liu, Wei Li,Chenzhou
        Cui    
        Institute of Computing Technology, CAS, China
        National Astronomical Observatories, CAS, China
        
 
       Closing Remarks
        
        12:55--14:00:       Lunch
 Over the recent years, data generated by humanities, scientific
        activities, as well as commercial applications from a diverse range of
        fields have been increasing exponentially which is typically referred to
        as Big Data. Data volumes of applications in the fields of sciences and
        engineering, finance, media, online information resources, etc. are
        expected to double every two years over the next decade and further.
        With this continuing data explosion, it is necessary to store and
        process data efficiently by utilizing enormous computing power that is
        available in the form of multi/manycore platforms. This increase in the
        demand for high performance large-scale data processing has necessitated
        collaboration and sharing of data collections among the world's leading
        education, research, and industrial institutions and use of distributed
        resources owned by collaborating parties. This kind of data intensive
        computing is posing many challenges in exploiting parallelism of current
        and upcoming computer architectures, such as automated data collection
        and provisioning, system monitoring and management, programming models,
        etc. Performance related aspects are becoming the bottlenecks for
        implementation, deployment and commercial application and its operation
        in data intensive computing system. The high performance data intensive
        computing paradigm also comes up with algorithmic and engineering issues
        such as performance aspects not yet eminent but expected to grow with
        their scaling of the large scale systems, and the dynamics of
        management. These new challenges may comprise, sometimes even
        deteriorate the performance, efficiency, and scalability of the
        dedicated data intensive computing systems. 
        
        There is no doubt in the industry and research community that the
        importance of data intensive computing has been raising and will
        continue to be the foremost fields of research. This raise brings up
        many research issues, in forms of capturing and accessing data
        effectively and fast, processing it while still achieving high
        performance and high throughput, and storing it efficiently for future
        use. Programming for high performance yielding data intensive computing
        is an important challenging issue. Expressing data access requirements
        of applications and designing programming language abstractions to
        exploit parallelism are at immediate need. Application and domain
        specific optimizations are also parts of a viable solution in data
        intensive computing. While these are a few examples of issues, research
        in data intensive computing has become quite intense during the last few
        years yielding strong results. 
        
        Moreover, in a widely distributed environment, data is often not locally
        accessible and has thus to be remotely retrieved and stored. While
        traditional distributed systems work well for computation that requires
        limited data handling, they may fail in unexpected ways when the
        computation accesses, creates, and moves large amounts of data
        especially over wide-area networks. Further, data accessed and created
        is often poorly described, lacking both metadata and provenance.
        Scientists, researchers, and application developers are often forced to
        solve basic data-handling issues, such as physically locating data, how
        to access it, and/or how to move it to visualization and/or compute
        resources for further analysis.
        
        This workshop focuses on the challenges imposed by high performance
        data-intensive applications on distributed systems, and on the different
        state-of-the-art solutions proposed to overcome these challenges. It
        brought together the collaborative and distributed computing community
        and the data management community in an effort to generate productive
        conversations on the planning, management, and scheduling of data
        handling tasks and data storage resources.
        
        It is evident that data-intensive research is transforming computing
        landscape. We are facing the challenge of handling the deluge of data
        generated by sensors and modern instruments that are widely used in all
        domains. The number of sources of data is increasing, while, at the same
        time, the diversity, complexity and scale of these data resources are
        also growing dramatically. 
        
        After the success of HPDIC 2012 and 2013, the 2014 edition (HPDIC2014)
        is a forum for professionals involved in data intensive computing and
        high performance computing. The goal of this workshop is to bridge the
        gap between theory and practice in the field of high performance data
        intensive computing and bring together researchers and practitioners
        from academia and industry working on high performance data intensive
        computing technologies. We believe that high performance data intensive
        computing will benefit from close interaction between researchers and
        industry practitioners, so that the research can inform current
        deployments and deployment challenges can inform new research. In
        support of this, HPDIC2014 will provide a forum for both academics and
        industry practitioners to share their ideas and experiences, discuss
        challenges and recent advances, introduce developments and tools,
        identify open issues, present applications and enhancements for data
        intensive computing systems and report state-of-the-art and in-progress
        research, leverage each other's perspectives, and identify new/emerging
        trends in this important area. 
        
        We therefore cordially invite contributions that investigate these
        issues, introduce new execution environments, apply performance
        evaluations and show the applicability to science and enterprise
        applications. We welcome various different kinds of papers that could
        formalize, simplify and optimize all the aspects of existing data
        intensive applications in science, engineering and business. We
        particularly encourage the submission of position papers that describe
        novel research directions and work that is in its formative stages, and
        papers about practical experiences and lessons learned from production
        systems. 
        
        Papers of applied research, industrial experience reports,
        work-in-progress and vision papers with different criteria for each
        category that describe recent advances and efforts in the design and
        development of data intensive computing, functionalities and
        capabilities that will benefit many applications are also solicited. 
      
Topics of interests include, but are not limited to:
 Please submit full papers in PDF or doc format via the submission
        system. Do not email submissions. Papers must be written in English. 
        
        The complete submission must be no longer than ten (10) pages. It should
        be typeset in two-column format in 10 point type on 12 point
        (single-spaced) leading. References should not be set in a smaller font.
        Submissions that violate any of these restrictions may not be reviewed.
        The limits will be interpreted fairly strictly, and no extensions will
        be given for reformatting. Final author manuscripts will be 8.5" x 11"
        (two columns IEEE format), not exceeding 10 pages; max 2 extra pages
        allowed at additional cost.
        
        The names of authors and their affiliations should be included on the
        first page of the submission.
        
        Simultaneous submission of the same work to multiple venues, submission
        of previously published work, or plagiarism constitutes dishonesty or
        fraud. 
        
        Reviewing of full papers will be done by the program committee, assisted
        by outside referees. Accepted papers will be shepherded through an
        editorial review process by a member of the program committee.
        
        By submitting a paper, you agree that at least one of the authors will
        attend the workshop to present it. Otherwise, the paper will be excluded
        from the digital library of IEEE.
        
        Please submit papers via EasyChair
              for HPDIC2014 (in case of problems, please send
        emails to the workshop chairs)
E-mail: christophe.cerin@lipn.univ-paris13.fr
E-mail: cjiang@hdu.edu.cn
E-mail: yuqing@us.ibm.com
E-mail: jilin.zhang@hdu.edu.cn