Abstracts of the Refereed Papers to be Presented
AbstractTitan is a freely available host-based security tool that can be used to improve or audit the security of a UNIX system. It was written almost completely in Bourne shell, with a master script controlling the execution of many smaller programs. Each of the programs either fixes or detects potential security problem, and its simple and extremely modular design also makes it useful to help check or enforce the adherence of a system against its security policy. Finally, anyone who can write a shell script or program can easily create their own Titan modules. Titan does not replace other security tools, nor does it fix or patch security bugs; its primary purpose is to improve the security of the system it runs on by codifying as many security tricks to secure an OS that the authors could think of. And when used in combination with other security tools it can help make the transformation of an "out of the box" system into a firewall or security conscious system a significantly easier task. NOTE: Due
to time, resource, and expertise limitations, the first release of
Titan is only known to run on Solaris Operating Systems, versions
Solaris 2.x and Solaris 1.x. However, many of the small sub-programs
within Titan work well with other UNIX's, and other than taking the
time to create Titan modules for them, there is nothing Sun specific
about Titan that would prevent it working on other UNIX systems.
AbstractThe CERT Coordination Center is building an experimental information infrastructure management system, SAFARI, capable of supporting a variety of operating systems and applications. The motivation behind this prototype is to demonstrate the security benefits of a systematically managed infrastructure. SAFARI is an attempt to improve the scalability of managing an infrastructure composed of many hosts, where there are many more hosts than hosts types. SAFARI is designed with one overarching principle: it should impact user, developer, and administrator activities as little as possible. The CERT Coordination Center is actively seeking partners to further this or alternative approaches to improving the infrastructural fabric on which Internet sites operate. SAFARI is currently being used by the CERT/CC to manage over 900 collections of software on three different versions of UNIX on three hardware platforms in a repository (/afs/cert.org/software) that is over 20 GB in size.SSU: Extending SSH for Secure Root AdministrationChristopher Thorpe - Yahoo!, Inc.AbstractSSU, "Secure su," is a mechanism that uses SSH [Ylonen] to provide the security for distributing access to privileged operations. Its features include both shell or per-command access, a password for each user that is distinct from the login password and easily changed, and high portability. By installing SSU, administrators build a solid infrastructure for using SSH for improving security in other areas, such as file distribution and revision control.System Management With NetScriptApratim Purakayastha and Ajay Mohindra - IBM T. J. Watson Research CenterAbstract Cost and complexity of managing client machines is a major concern
for enterprises. This concern is compounded by emerging client
machines that are mobile and diverse. To address this concern,
management systems must be easy to configure and deploy, must handle
asynchrony and disconnection for mobile clients, and must be
customizable for diverse clients. In this paper, we first present
NetScript, an environment for scripting with network components. We
then propose a management system built with NetScript, where mobile
scripts invoke components to perform management operations. We
demonstrate that our approach results in a flexible, scalable
management system that can support mobile and diverse client machines.
Abstract Accountworks is a system which allows any employee at Sybase, Inc.
to use a web form to create accounts for new employees. Every new hire
gets a personal account in SQL, Notes, NT, and UNIX administrative
domains. Accountworks also creates initial stub entries in our SQL
personnel database. It allows the user to make a number of initial
choices for their new employee, including access to popular
applications and whether to use Notes or UNIX email. Typically all new
accounts are available within four hours after the web form is
submitted. The system operates 24 by 365 to support our worldwide
infrastructure. When the accounts are created, it guarantees a
consistent, unique login, UID (for UNIX), Firstname.Lastname record,
and password across all domains. It went into full production in July
1997, and has been used to create 1900 new accounts since then.
Because this paper is intended to help anyone tackling cross-domain
account management problems, it describes the architecture of
Accountworks, the process of building it, numerous design decisions,
and future directions of the project.
Abstract Large organizations are increasingly shifting critical computing
operations from traditional host-based application platforms to
network-distributed, client-server platforms. The resulting
proliferation of disparate systems poses problems for end-users, who
must frequently track multiple electronic identities across different
systems, as well as for system administrators, who must manage
security and access for those systems. Single sign-on mechanisms have
become increasingly important in solving these problems. System
administrators who are not already being pressured to provide single
sign-on solutions can expect to be in the near future. Duke University
has recently embarked on an enterprise-wide single sign-on project.
This paper discusses the various factors involved in the decision to
deploy a single sign-on solution, reviews a variety of available
approaches to the problem of electronic identity proliferation, and
documents Duke's research and findings to date.
Abstract Imagine having to prove everything you believe at one time. That
is exactly what happened when I was asked to change the design of an
Enterprise Backup System to accommodate the backup and restore needs
of a new, very large system. To meet the challenge, I'd have to use
three brand-new pieces of technology and push my chosen backup
software to its limits. Would I be able to send that much data over
the network to a central location? Would I be forced to change my
design? This paper is the story of the Proof of Concept Test that
answered these questions.
Abstract This paper provides the system administrator with a fundamental
understanding of database architecture internals so that he can better
configure relational database systems. The topics of discussion
include buffer management, access methods, and lock management. To
both illustrate concepts in practice, and to contrast the two
architectures of market leaders, Oracle and Sybase implementations are
referenced throughout the paper. The paper describes different backup
strategies and when each strategy is appropriate. In conclusion, the
paper describes special hardware considerations for high availability
and performance of database systems.
AbstractThis article presents a configuration distribution system that assists system administrators with the tasks of host and service installation, configuration and crash recovery on large and heterogeneous networks. The objective of this article is twofold. First, to introduce the system's modular architecture. Second, to describe the platform independent protocol designed to support fast and reliable configuration propagation.An NFS Configuration Management System and its Underlying Object-Oriented ModelFabio Q. B. da Silva, Juliana Silva da Cunha, Danielle M. Franklin, Luciana S. Varejão, and Rosalie Belian - Federal University of PernambucoAbstract This paper describes an NFS configuration and management system
for large and heterogeneous computer environments. It also shows how
this system can be extended to address other services in the network.
The solution is composed of a process that describes service
configuration and management life-cycle, a modular architecture and an
objected oriented model. The system supports multiple features,
including: automatic host and service installation, service dependency
inference and analysis, performance analysis, configuration
optimization as well as service functioning monitoring and problem
correction.
Abstract The explosive growth of the World Wide Web has raised great
concerns regarding many challenges - performance, scalability and
availability of the Web system. Consequently, Web site builders are
increasingly to construct their Web servers as distributed system for
solving these problems, and this trend is likely to accelerate. In
such systems, a group of loosely-coupled hosts will work together to
serve as a single virtual server. Although the distributed server can
provide compelling performance and accommodate the growth of web
traffic, it inevitably increases the complexity of system
administration. In this paper, we exploit the advantages of Java to
design and implement an administration system for addressing this
challenging problem.
Abstract This paper describes the history and operation of the current
version of MRTG as well as the Round Robin Database Tool. The Round
Robin Database Tool is a program which logs and visualizes numerical
data in a efficient manner. The RRD Tool is a key component of the
next major release of the Multi Router Traffic Grapher (MRTG). It is
already fully implemented and working. Because of the massive
performance gain possible with RRD Tool some sites have already
started to use RRD Tool in production.
Abstract In an ideal world the need to provide data communications between
facilities separated by a large ocean would be filled simply. One
would estimate the bandwidth requirement, place an order with a global
telecommunications company, then just hook up routers on each end and
start using the link. Our experience was considerably more painful,
primarily due to three factors: 1) The behavior of some of our
applications, 2) problems with various WAN carrier networks, and 3)
increasing Internet traffic. "Network Ecology" describes the
management of these factors and others that affect network
performance.
AbstractThe content of many popular ftp and web sites on the Internet are replicated at other sites, called "mirrors"; typically, to decrease the network load at the original site, to make information available closer to its users for higher availability; and to decrease the bandwidth requirements these sites place on long-haul network connections, such as international and backbone links. Even though the success of mirroring depends heavily on the selection of a good mirror, there are very few methods to pick a good mirror: i.e., a mirror "close" to its user based on network topology. This paper describes a method and two tools developed to locate a "close" mirror among replicated copies of a network service such as ftp, www, irc, streaming audio by utilizing network topology information based on autonomous systems. Routing information from the Internet Routing Registry is combined with information about the location of mirrors to generate mirroring tables, similar to routing tables, which are used to identify a "close" mirror, where "close" is defined as traversing the minimum number of autonomous systems. The tools
are avaliable via anonymous ftp from ftp.coubros.com.
AbstractMoving a division of approximately 200 employees from one building to another across town can be a daunting task. It involves coordination among teams from systems administration, networking, facilities, and security as well as support from management and cooperation of the employees being relocated. Contractors and subcontractors are frequently hired to handle physical relocation of goods from one location to another, construction of new server rooms, electrical rewiring, installation of new cooling systems, etc. This paper is the story of how we handled the move and reconfiguration of a network of approximately 1000 nodes over a long weekend in May 1998. Previously published work has discussed some of the issues that challenged us here. The reconfiguration of large numbers of machines has been discussed in [Manning93, Riddle94, Shaddock95]. "Forklift" upgrades of new hardware [Harrison92] share some but not all of the problems we faced in our move. Implementation of new networking topology without the problems or schedules imposed by physical relocation has been discussed in [Limoncelli97]. We believe our
work is unique in requiring all these tasks to happen on a large scale
in a relatively short time. We were allocated only one workday in
addition to a weekend to shutdown and relocate our computing
environment. We were expected to have a fully functioning network at
our new location the following Monday. Ordinarily the complete
reconfiguration of a network this size would be a challenge in itself.
For our project, we had to account for the time required to disconnect
and pack machines, load them into trucks, transport them across town,
unload, and reconnect them at the new building. As we will detail, the
resulting window of time available to handle the reconfiguration of
all these machines was very small.
AbstractThis paper presents work by many developers of the Athena Computing Environment, done over many years. We aim to show how the various components of the Athena system interact with reference to an individual workstation and its particular needs. We describe Hesiod, Kerberos, the locker system, electronic mail, and the software release process for workstation software, and show how they all interrelate to provide high-reliability computing in a very large network with fairly low staffing demands on programmers and systems administrators.Bootstrapping an InfrastructureSteve Traugott - Sterling Software and NASA Ames Research CenterJoel Huddleston - Level 3 Communications AbstractWhen deploying and administering systems infrastructures it is still common to think in terms of individual machines rather than view an entire infrastructure as a combined whole. This standard practice creates many problems, including labor-intensive administration, high cost of ownership, and limited generally available knowledge or code usable for administering large infrastructures. The model we describe treats an infrastructure as a single large distributed virtual machine. We found that this model allowed us to approach the problems of large infrastructures more effectively. This model was developed during the course of four years of mission-critical rollouts and administration of global financial trading floors. The typical infrastructure size was 300-1000 machines, but the principles apply equally as well to much smaller environments. Added together these infrastructures totaled about 15,000 hosts. Further refinements have been added since then, based on experiences at NASA Ames. The methodologies described here use UNIX and its variants as the example operating system. We have found that the principles apply equally well, and are as sorely needed, in managing infrastructures based on other operating systems. This paper is a living document:
Revisions and additions are expected and are available at
https://www.infrastructures.org. We also maintain a mailing list for
discussion of infrastructure design and implementation issues -
details are available on the web site.
AbstractIn the fall of 1994, Applied Research Laboratories, The University of Texas at Austin (ARL:UT) presented a paper [1] at LISA VIII, describing work that we had performed designing and implementing a management framework for NIS and DNS, called GASH. In the years since that paper was presented, it has become clear that the design of GASH was insufficient to meet the complex, idiosyncratic, and rapidly changing needs of modern networking. GASH suffered from being too inflexible to be rapidly retooled for a changing network environment, from being limited to a single user at a time, and from being unable to provide management services to custom clients. In the face of
these issues, the Computer Science Division at ARL:UT went back to the
drawing board and developed a Java-based directory management
framework on the basis of the design principles presented in our GASH
paper. Written in Java, Ganymede (which stands for The
"GAsh Network Manager, Deluxe
Edition," of course) is based on a distributed object design using the Java Remote Method Invocation [2] protocol and features a multi-threaded, multi-user
server, and a graphical, explorer-style client. By supporting
customization through a graphical schema editor, plug-in Java classes,
and external build scripts, Ganymede is able to support a variety of
directory services, including NIS, DNS, LDAP, and even NT user and
group management.
AbstractCisco Systems has chosen to internally develop an enterprise wide print system that provides access to more than 2000 printers for both Unix and PCs. The requirements for this print system were that it had to be very cheap to construct, highly scalable, easily maintained by a very small staff, fault tolerant, and mission critical reliable. In other words, management essentially wanted everything for practically nothing. To meet our objectives we built our print system out of interchangeable low cost running PCs Linux, LPD and Samba as well as other standard Unix applications. The low cost of PC hardware and the lack of licensing fees for Linux allowed us to deploy the print system vary widely without having to go through all the managerial justifications necessary to authorize larger scale purchases. By making each print server interchangeable we achieved scalability as well as a certain degree of fault tolerance. The flexibility of running a Unix like operating system such as Linux as opposed to another more restrictive operating system allowed us to develop a worldwide printing application that can be managed very easily by only two or three people. And finally the robustness of Linux made it possible for us to use our print system in mission critical environments such as manufacturing production floors. This paper discusses the process by which the print system was implemented and the wisdom learned in the process. It covers topics such as how to gain and maintain control of the printing process, why it is necessary and how to keep printers a completely network managed device, how we learned to deal with large numbers of server, the advantages and problems we ran into as the number of servers grew, and the many advantages and few disadvantages of basing the system entirely on free software. It also highlights some of the major processes that we automated and the success we had devolving power first to the the local technical support people and then ultimately to the users. Finally, it discusses many of the problems that we are running into now that the print system is a few years old and the steps that we are taking to ensure that we do not become victims of our our own success and that we do not have the whole system collapse due to data rot. Since the real key to managing thousands of printers effectively
is figuring out how to save time, the real world experience we gained
and the time saving tips we discovered while learning how to manage
thousands of printers should be valuable even to sysadmins that have
only a few printers to manage.
AbstractThe paper describes a project to enhance the print service for CERN. The printer infrastructure consists of over 1000 printers serving more than 5000 Unix users running on workstations of various brands as well as PCs running Linux. In addition, the infrastructure must serve more than 3000 PCs running Windows/95 and NT 4. We support a large number of printer manufacturers, including HP, QMS, Tektronix, Xerox and Apple. Lightweight print clients are provided for all the supported platforms and transparently distributed using the ASIS software repository and the NICE application architecture. They may be used as "drop-in" replacements of the standard vendor clients. Compatibility with older CERN lightweight print clients is provided. Printing with standard vendor clients is also possible. Administrative tools are provided for the general management of print servers and in particular for replicating server configurations and monitoring spool file systems. The service
offers a high level of scalability and fault tolerance, since it has
no single point of failure in the server back-end.
Abstractmkpkg is a tool that helps software publishers create installation packages. Given software that is ready for distribution, mkpkg helps the publisher develop a description of the software package, including manifests, dependencies, and post-install customizations. mkpkg automates many of the painstaking tasks required of the publisher, such as determining the complete package manifest and dependencies of the executables on shared libraries. Using mkpkg, a publisher can generate software packages for complex software such as TeX with only a few minutes effort.
mkpkg has been implemented on HP-UX using Tcl/Tk and provides
both graphical and command line interfaces. It builds product-level
packages for Software Distributor (SD-UX).
Abstract SEPP is an application installation, sharing and packaging
solution for large, decentrally managed Unix environments. SEPP can be
used without making modifications to the organizational structure of
the participants' servers. It provides consistent application setup,
documentation, wrapper scripts and usage logging as well as version
concurrency and clean software removal. This paper first gives an
overview of products already available in this field and then goes on
describing SEPP.
Abstract The combination of large networks, frequent operating system
security patches, and software updates can create a daunting task for
a systems administration team. This paper presents a system created to
address these challenges with system security and "uptime" as the
primary concerns. By using a file-form "database," the Synctree
system holds a full network's configuration in an understandable,
secure, location. This paper also compares this system with
previously published works.
AbstractRapid growth of a computing environment presents a recurring theme of running out of resources. Meeting the challenges of building and maintaining such a system requires adapting to the ever changing needs brought on by rampant expansion. This paper discusses the evolution of our computer network from its origins in the startup company NexGen, Inc. to the current AMD California Microprocessor Division (CMD) network that we support today. We provide highlights of some of the problems we have encountered along the way, some of which were solved efficiently and others that provided lessons to be learned. The
reengineering of computer networks and system environments have been
the subject of numerous papers including [Harrison92, Evard94b,
Limoncelli97]. Like the others, we discuss topics related to
modernization of our systems and the implementation of new
technologies. However, our focus here is on the problems caused by
rapid growth. With increasing requirements for more compute power and
the availability of less expensive and more powerful computers, we
believe that other environments are poised for rapid growth such as
ours. We hope that lessons learned from our experience will better
prepare other system administrators in similar situations.
Abstract Present day computer systems are fragile and unreliable. Human
beings are involved in the care and repair of computer systems at
every stage in their operation. This level of human involvement will
be impossible to maintain in future. Biological and social systems of
comparable and greater complexity have self-healing processes which
are crucial to their survival. It will be necessary to mimic such
systems if our future computer systems are to prosper in a complex and
hostile environment. This paper describes strategies for future
research and summarizes concrete measures for the present, building
upon existing software systems.
AbstractAnalyzing and monitoring logs that portray system, user, and network activity is essential to meet the requirements of high security and optimal resource availability. While most systems now possess satisfactory logging facilities, the tools to monitor and interpret such event logs are still in their infancy. This paper describes an approach to relieve system and network administrators from manually scanning sequences of log entries. An experimental system based on unsupervised neural networks and spring layouts to automatically classify events contained in logs is explained, and the use of complementary information visualization techniques to visually present and interactively analyze the results is then discussed.
The system we present can be used to analyze past activity as well as
to monitor real-time events. We illustrate the system's use for event
logs generated by a firewall, however it can be easily coupled to any
source of sequential and structured event logs.
AbstractElectronic mailing lists are ubiquitous community-forging tools that serve the important needs of Internet users, both experienced and novice. The most popular mailing list managers generally use textual mail-based interfaces for all list operations, from subscription management to list administration. Unfortunately, anecdotal evidence suggests that most mailing list users, and many list administrators and moderators are novice to intermediate computer users; textual interfaces are often difficult to use effectively. This paper describes Mailman, the GNU mailing list manager, which offers a dramatic step forward in usability and integration over other mailing list management systems. Mailman brings to list management an integrated Web interface for nearly all aspects of mailing list interaction, including subscription requests and option settings by members, list configuration and Web page editing by list administrators, and post approvals by list moderators. Mailman offers a mix of robustness, functionality and ease of installation and use that is unsurpassed by other freely available mailing list managers. Thus, it offers great benefits to site administrators, list administrators and end users alike. Mailman is primarily implemented in Python, a free, object-oriented scripting language; there are a few C wrapper programs for security. Mailman's architecture is based on a centralized list-oriented database that contains configuration options for each list. This allows for several unique and flexible administrative mechanisms. In addition to Web access, traditional email-command based control and interactive manipulation via the Python interpreter are supported. Mailman also contains extensive bounce and anti-spam devices. While many of the features
discussed in this paper are generally improvements over other mailing
list packages, we will focus our comparisons on Majordomo, which is
almost certainly the most widely used freely available mailing list
manager at present.
AbstractThis paper describes a set of tools and procedures which allow very large mailing lists to be managed with the freeware tool of the administrator's choice. With the right approach scaling technology can be applied to a list management tool transparently. In recent years, many ingenious methods have been proposed for handling email deliveries to mailing lists of several thousand subscribers. Administration of a mailing list is not limited to message delivery, however. Tasks such as managing subscribers, dealing with mail bounces, and preventing list spamming also become more difficult when applied to very large lists. As a case study, this paper
describes the process of moving the well-known "Firewalls" mailing
list from its original home at GreatCircle Associates to a new
infrastructure at GNAC. The process was thought to be straightforward
and obvious, and it soon became apparent that it was neither. We trust
that our discoveries will benefit other systems administrators
undertaking similar projects, either concerning large mailing lists or
moving complex "legacy systems."
AbstractTracking tasks remains one of the most difficult issues facing any working team of administrators. Even with the implementation of commercial tools available today, e-mail and hallway conversations remain the standard for task management in many organizations; however, these make it difficult and time consuming to remain current on issues, and do nothing to summarize the long-term history of tasks and completion thereof. Many commercial tools are available to handle task management, and most work quite well for stereotype models of their intended environments - development teams, help desk, etc. Unfortunately, these systems often have limitations which prevent their use (or a simple deployment) in a pre-existing, working environment. Other systems are difficult or time-consuming to use, and remain ignored in favor of task accomplishment. Few freely available systems provide the statistics to analyze productivity, generate statistics, and otherwise please management. Request v3 was designed to provide the necessary essentials for modern task management: a selection of user interfaces, support for multiple database backends, flexible security controls, and extensive reporting capabilities. It runs cleanly in heterogeneous environments, including those that have a large installed base of Windows users. It includes command line, e-mail, and web interfaces, in addition to an Extension Interface which provides a simple way to access the Request system from other programs, scripts, or any custom interface one may create. The authentication, notification, data storage, and logging functions are processed within separate modules, allowing a variety of backend databases to be supported. |
Need help? Use our Contacts page.
Last changed: Nov 30 1998 ah |
|