Exploitation of Locality and Parallelism in Pointer-based Programs
Speakers and Affiliation:
Oscar Plata and Rafael Asenjo, University of Malaga, Spain
While powerful optimization techniques are currently available for scientific and engineering
numerical codes, a similar level of success has eluded general-purpose programs, specially symbolic
and pointer-based codes, like those written in languages such as C, C++ or Java.
Locality exploitation is one of the major high-impact performance issues in modern computers.
However, contemporary compilers cannot successfully exploit locality exhibited in pointer-based
programs. The locality problem comprises several aspects. In this part of the tutorial we will address
some of the main aspects, like data locality in the cache hierarchy, and hiding of the processor-
memory latency gap.
Parallelism is becoming a common tool for solving any type of applications in any kind of
environments. As with locality, current compilers are not either able to successfully deal with
parallelism in pointer-based codes. This part of the tutorial will describe different methods to
extract parallelism from these codes at different levels, ranging from analysis techniques to detect
data dependences to techniques, based on those analysis, to tackle the parallelism detection.
This tutorial will be organized as a survey and in part from the personal point of view and
background in compilers for high performance computing of the speakers, which have been working
in this field from the beginning of the nineties. This is a main research topic in our group, so up-to-
date results from our project will also be presented within the tutorial.
About the Speakers:
Oscar Plata received his M.Sc. and PhD in Physics from the University of Santiago de Compostela,
Spain, in 1985 and 1989, respectively. Currently he is a Full Professor in the Department of
Computer Architecture at the University of Malaga, Spain, where he is a co-leader of the automatic
parallelization and compiler group. Prior to that he has been an Associate Professor in the
Department of Electronics and Systems at the University of Santiago de Compostela, Spain. Prof.
Plata's main research interests are in the design of novel techniques to optimize irregular and pointer-
based programs, specifically approaches for exploiting any aspect related to locality and parallelism.
He previously made contributions in the parallel languages and algorithms field. He is a member of
the IEEE Computer Society and the ACM.
Rafael Asenjo received the engineering degree in telecommunications in1993 and the PhD degree in
telecommunication engineering in 1997, both from the University of Malaga, Spain. From 1994 to
2001, he was an assistant professor in the Computer Architecture Department at University of
Malaga, and has been an associate professor in the same department since 2001. His research
interests are in parallelizing compilers and multiprocessor architectures.
Itanium: Architecture, compilation techniques and multi-processor systems
Speakers and Affiliation:
J.F. Collard, Hewlett-Packard Laboratories
This tutorial presents the Itanium family of processors, some compilation techniques for this
architecture, and multiprocessor architecture.
The first part will introduce the instruction set of the Itanium architecture, including instruction
predication, groups of parallel instructions, counted and pipelined loops, data and control
speculation, prefetching, rotating registers, and more. The goal this part is to make the attendees
comfortable reading and writing simple Itanium assembly code. Advanced topics will be addressed,
including software pipelining using rotating registers.
In the second part, some micro-architectural issues will be discussed, including cache issues and
dispersal of instruction in parallel groups to functional units. Performance monitors, which report
how cycles are spent and in which instructions, will be presented. Recent developments, such as HP's
dual core Hondo modules, will also be detailed.
In the third part, compiler techniques specific to Itanium are presented. These techniques typically
strive to best combine the novel features mentioned earlier.
In the fourth part, multiprocessor architecture will be discussed, including memory consistency and
synchronization mechanisms on Itanium processors. The architecture of commercial systems will be
detailed, with an emphasis on HP's Superdome systems, which, as of mid-June 2004, scale up to 128
About the Speaker:
Jean-Francois Collard got a Ph.D. in 1995 from the University of Paris 6. He was a researcher at the
French National Center for Scientific Research from 95 to 2000, when he joined the Intel Itanium
compiler team in Santa Clara, Califonia. In 2003, he joined the Hewlett-Packard Laboratories in
Palo Alto. He is part of the Advanced System Architecture Research Department led by Norm