|
WORKSHOP SESSIONS
Session papers are available to workshop registrants immediately and to everyone beginning June 23, 2008.
|
Monday, June 23, 2008
|
8:30 a.m.–10:30 a.m. |
Invited Talks
Performance and Forgiveness
Margo Seltzer, Harvard University
View the presentation slides
Historically, we have designed and built systems to produce
correct answers—always. Unfortunately, as systems grow, becoming large
and distributed, consistency comes at a performance penalty. As
applications continue to grow and become increasingly distributed, we're
going to have to rethink our consistency and performance trade-offs. In
many applications, consistency is simply a knee-jerk reaction—we can give
it up with little harm, especially if we learn to periodically apologize.
In this talk, I'll examine different large scale applications and identify
different categories of applications with respect to consistency,
performance, and apologies.
Margo I. Seltzer is the Herchel Smith Professor of Computer Science and a Harvard College Professor in the Division of Engineering and Applied Sciences at Harvard University. She is the author of several widely-used software packages including database and transaction libraries and the 4.4BSD log-structured file system. Dr. Seltzer is also a founder and CTO of Sleepycat Software, the makers of Berkeley DB. She is recognized as an outstanding teacher and won the Phi Beta Kappa teaching award in 1996 and the Abrahmson Teaching Award in 1999.
Professor Seltzer's research focuses on how to make computer systems better for users. Her research activities range from designing and building new storage systems to exploring how to construct truly large-scale distributed systems.
XtreemOS: A Linux-based Operating System for Large Scale Dynamic Grids
Christine Morin, INRIA
View the presentation slides
Despite the availability of various middleware, grid environments are still complex to manage, use, and program. In this talk, we present a novel grid operating system approach, promoted by the XtreemOS European project funded under the FP6 program. XtreemOS targets the management of large and very dynamic grid systems: users logged into an XtreemOS box will transparently exploit VO-managed resources through the standard POSIX interface. While much work has been done to build grid middleware on top of existing operating systems, little has been done to extend the underlying operating systems for enabling and facilitating grid computing. In this light, XtreemOS aims to be the first European step towards the creation of a true open source operating system for grid platforms.
Christine Morin is a senior researcher at INRIA in the INRIA PARIS project-team, contributing to the programming of large-scale parallel and distributed systems. She has led research activities on single system image OS for high performance computing in clusters, resulting in the Kerrighed cluster OS, which has been developed in open source. Since 2006, she has been the scientific coordinator of the XtreemOS project, a four-year European integrated project aiming at developing a grid operating system. Her research interests are in operating systems, distributed systems, fault tolerance, and cluster and grid computing. She is a co-founder of the Kerlabs start-up, created in 2006 to exploit Kerrighed technology.
|
10:30 a.m.–11:00 a.m. Break
|
|
11:00 a.m.–12:15 p.m. |
Papers
CANCELLED:
XOS-SSH: A Lightweight User-Centric Tool to Support Remote Execution in Virtual Organizations
An Qin, Haiyan Yu, Chengchun Shu, and Bing Xu, Institute of Computing Technology, Chinese Academy of Sciences
Paper in HTML | PDF
Presentation of this paper has been cancelled due to unforeseen circumstances.
Improving Scalability and Fault Tolerance in an Application Management Infrastructure
Nikolay Topilski, University of California, San Diego; Jeannie Albrecht, Williams College; Amin Vahdat, University of California, San Diego
View the presentation slides
Paper in HTML | PDF
The XtreemOS JScheduler: Using Self-Scheduling Techniques in Large Computing Architectures
F. Guim, I. Rodero, M. Garcia, and J. Corbalan, Barcelona Supercomputing Center
View the presentation slides
Paper in HTML | PDF
|
12:15 p.m.–1:45 p.m. Workshop Luncheon
|
|
1:45 p.m.–3:00 p.m. |
Papers
A Multi-Site Virtual Cluster System for Wide Area Networks
Takahiro Hirofuchi and Takeshi Yokoi, National Institute of Advanced Industrial Science and Technology (AIST); Tadashi Ebara, National Institute of Advanced Industrial Science and Technology (AIST) and Mathematical Science Advanced Technology Laboratory Co., Ltd.; Yusuke Tanimura, Hirotaka Ogawa, Hidetomo Nakada, Yoshio Tanaka, and Satoshi Sekiguchi, National Institute of Advanced Industrial Science and Technology (AIST)
Paper in HTML | PDF
A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing
Zoe Sebepou, Kostas Magoutis, Manolis Marazakis, and Angelos Bilas, Institute of Computer Science (ICS),
Foundation for Research and Technology—Hellas (FORTH)
View the presentation slides
Paper in HTML | PDF
Striping without Sacrifices: Maintaining POSIX Semantics in a Parallel File System
Jan Stender, Björn Kolbeck, and Felix Hupfeld, Zuse Institute Berlin (ZIB); Eugenio Cesario, Institute High Performance Computing and Networks of the National Research Council of Italy (ICAR-CNR); Erich Focht and Matthias Hess, NEC HPC Europe GmbH; Jesús Malo and Jonathan Martí, Barcelona Supercomputing Center (BSC)
View the presentation slides
Paper in HTML | PDF
|
3:00 p.m.–3:30 p.m. Break
|
|
3:30 p.m.–5:30 p.m. |
Invited Talks
Large Scale in What Dimension?
Miron Livny, University of WisconsinMadison
View the presentation slides
Computing systems are complex entities with a diverse set of dimensions. The scale of such a system in each of these dimensions is likely to have a profound impact on how the system is designed, developed, deployed, maintained, supported, and evolved. Today, we find systems that support a large number of users deployed at a large number of sites. These systems support a large suite of applications, consist of a large number of software components, are developed by a large community, manage a large number of computer and storage elements, operate over a large range of physical distances, and/or evolve over a large number of versions and releases. In the more than two decades that we have been working on the Condor distributed resource management system, we have experienced a dramatic change in its scale in all of these dimensions. We will discuss what we learned from dealing with these changes, and what we do to prepare our project for the never-ending stream of scale changes.
Miron Livny received a BSc degree in Physics and Mathematics in 1975 from the Hebrew University and MSc and PhD degrees in Computer Science from the Weizmann Institute of Science in 1978 and 1984, respectively. Since 1983 he has been on the Computer Sciences Department faculty at the University of WisconsinMadison, where he is currently a Professor of Computer Sciences, the director of the Center for High Throughput Computing, and leader of the Condor project.
Dr. Livny's research focuses on distributed processing and data management systems and data visualization environments. His recent work includes the Condor distributed resource management system, the DEVise data visualization and exploration environment, and the BMRB repository for data from NMR spectroscopy.
Experiences in Developing Lightweight Systems Software for Massively Parallel Systems
Arthur Maccabe, University of New Mexico
View the presentation slides
The goal of lightweight system software is to get out of the way of the application, while providing isolation between applications. Resource management is, for the most part, left to the application. Lightweight approaches have been successfully used in the some of the largest systems, including the first Teraflop system (ASCI/Red at Sandia National Laboratories), the Cray XT3, and IBM's Blue Gene system. This talk considers our experience developing lightweight systems software for massively parallel systems and contrasts the lightweight approach to other approaches, including Linux, Hypervisors, and microkernels.
Barney Maccabe received his BS in Mathematics from the University of Arizona and his MS and PhD degrees from the Georgia Institute of Technology in Information and Computer Sciences. He currently serves as the Interim CIO for the University of New Mexico. Professor Maccabe has held a faculty appointment in the Computer Science department at UNM since 1982. From 2003 to 2007, he served as director of UNM's Center for High Performance Computing. Professor Maccabe's research focuses on scalable systems software. He was a principal architect of a series of "lightweight" operating systems:
SUNMOS for the Intel Paragon, Puma/Cougar for the Intel Tflop, and most recently Catamount for the Cray XT3. In addition to developing system software for MPP systems, Professor Maccabe has projects to apply lightweight approaches to large scale sensor networks, high performance I/O systems, and virtual machine monitors.
|
|