|
LISA 2001 Paper   
[LISA '01 Tech Program Index]
A Management System for Network-Sharable Locally Installed Software: Merging RPM and the Depot Scheme Under SolarisAbstractEfficient management of locally installed software is a recurring central theme of system administration. We report here on an experimental merger of two previously independent systems: Redhat's RPM Package Manager (RPM), an open-source database- driven system developed by a major Linux vendor to manage software on a single host; and, an enhanced version of depot, a well- established set of conventions used to manage software that is installed on a server and shared over a network with multiple (possibly heterogeneous) clients. The combination remedies shortcomings in both systems, but to be fully effective, extensions to RPM are required, particularly to its database system. The results of this study point the way toward a second-generation network-distributed version of RPM. IntroductionManagement of the operating system (OS) software on a computer can be time consuming; at many sites, though, the amount of OS software is dwarfed by the amount of additional, locally installed, software arising from a variety of sources. Such software generally provides the services which justify the very existence of the computing facility. The proper installation and maintenance of software is at the heart of system administration, and has a major influence on the utility, reliability, and security of a facility. The work presented here attempts to merge the complimentary features of two open source software installation and management systems: one, known as depot, is a system designed for managing network-shared software; the other, RPM, is designed to manage software on a single host. Prior WorkOne of the earliest attempts to attack the problem of creating and maintaining a network-shared local software repository was the NIST depot scheme [1]. The depot conventions were widely perceived as too complex for smaller facilities run by non- professional administrators, leading to simplified derivatives such as depot-lite [2] and GNU Stow [3]. Colyer, et al. of the Andrew project at CMU offered extensions to the original NIST scheme, including the notion of a software ``collection'' [4, 5]. Abbey and colleagues at the Advanced Research Laboratory at the University of Texas, Austin (ARL:UT), created a set of perl scripts, opt_depot, which facilitated use of the depot conventions [6, 7]. Other software management schemes that have been developed include STORE [8], the Application Software Installation Server (ASIS) [9], and the /packages scheme employed at Los Alamos National Laboratory (web-based documents for which have been withdrawn from public access). Most of these publically-available systems were developed on UNIX platforms, and attempted to support the sharing of installed software over multiple systems via filesystem-sharing schemes such as network file system (NFS), where clients might be using hardware and OS software different from that of the server. It is difficult to objectively measure the relative costs and benefits of these different approaches; however, STORE and ASIS are much more complex than depot, and none of these systems has been taken up widely outside of its original site of invention. Commercially-derived systems exist as well. Sun introduced a software management system (pkgadd, pkgrm, pkginfo, ...) as part of its Solaris 2 operating system, using it to install Solaris itself. Functional components that can be installed independently of other components are carved off into their own named ``packages,'' and a list is maintained of where a package's files are installed. Major Linux vendors have all developed software packaging methods with similar intent. These include: Redhat's RPM Package Manager (RPM) [10], which is also used by the French-based MandrakeSoft [11]; Ximian's Red Carpet [12] (and Redhat's equivalent, up2date client); and Debian's Package Management System [13]. Suse's YaST interface appears to be more concerned with OS installation than ongoing local software installation. Caldera International's Volution [14] product is claimed to manage software and other resources over a network of (multi-vendor) Linux hosts. Of these systems, RPM is likely the most widely used, due to Redhat's substantial share of the Linux market as well as to RPM being available as open source for multiple hardware platforms using different UNIX variants. Numerous open source software applications are distributed as RPM packages. With the exception of Volution, these systems are concerned with management of software on a single host. Finally, some software applications are bundled with their own installation systems; an example is the XPInstall system employed by the Mozilla web client. How RPM WorksRPM packages exist in two forms: binary-type packages contain executable code for a specific hardware/OS combination, whereas source-type packages contain the original source code used to generate the executable binary files. The RPM binary package format is well-defined and consists of four sections: the lead (a largely abandoned file structure now used to identify the package), signature (the PGP and MD5 data used to validate/authenticate a package), header (tag-demarcated information about the package), and archive (the files constituting the package, compressed with GNU gzip). For a binary-type RPM package file, the RPM command rpm -i does the following: it checks for the presence of any other required packages (dependency checking) and for potential conflicts (the overwriting of existing files, or the installation of the current or older versions of already-installed packages); it performs any required pre-installation commands; it installs the files associated with the current package, attempting to preserve local modifications made to configuration files; it performs any required post-installation commands; and, it logs all of the file locations and other package information into the RPM database, which is based on Berkeley DB [15]. For a source-type RPM package file, the rpm -i command does much less: it unbundles the source code files and specification file (see below), putting the latter in the SPECS subdirectory. One then uses the rpm -ba command to build both binary- and source-type packages. Both binary- and source-type RPM packages have to be created manually. This process employs various directories that are created when RPM is installed, named BUILD, RPMS, SOURCES, SPECS, and SRPMS, and proceeds as follows:
As packages are built, the BUILDS subdirectory gets cleaned out by RPM, but files and packages accumulate in the SPECS, RPMS, and SRPMS directories. Several locations require manual cleaning: the SOURCES subdirectory, and a temporary directory in which log files accumulate in association with failed rpm -ba commands. RPM makes use of MD5 checksums to validate both entire packages and individual files within a package (prior to and after installation), and to guide the treatment of an application's configuration files. It also (optionally) employs PGP [16] to create and authenticate digital signatures for packages. RPM provides utilities to search the RPM database to recover information about installed packages, and to easily update and remove them. The behavior of RPM can be tailored by system-wide and user-specific initialization files. RPM is built on the rpmlib library, which has an Application Programmer's Interface comprising over 60 different functions. How Depot WorksHenceforth in this paper, depot refers to conventions for software management employed at the U. S. National Library of Medicine (NLM), relying upon modified versions of the ARL:UT perl scripts (originally known as opt_depot, and building upon the earlier work at NIST [1] and CMU [4, 5]). The method employs the following directory tree on a server: /depot_server/<hardware-type>/<OS-type>/ package, which allows the server to provide files for multiple arbitrary hardware/OS combinations. An individual package exists within its own subdirectory within the above path, and is named for the package and its version number (for example, /depot_server/ sparc/SunOS5.8/package/gcc_3.0). Within such a package directory, individual files must be installed within the following subdirectories (this list is locally configurable): app-defaults (X windows app-defaults files), bin (binaries), html (HTML documentation), include (include files), info (TeXinfo files), javaclass (Java class files), lib (library files), man (UNIX manual pages), pdf (PDF documentation), and sbin (administrative binaries); UNIX manual pages are organized within a package's man subdirectory in subdirectories (man1, man1m, man3, ...) in accord with System V UNIX manual section numbering conventions. If a package has requirements which preclude following this convention (as for example, with some commercial software), it is installed within a subdirectory named vendor, and links are made from files or subdirectories within the vendor subdirectory to the appropriate app- defaults ... sbin subdirectories. On a depot client, a directory with a name of the form /depot_mount/<server-name>/package is present, where <server-name> represents a particular depot server. One such directory is present for each depot server that is providing software to this client. Package subdirectories from each server's /depot_ server/<hardware-type>/<OS-type>/package/ directory are mounted into the client's corresponding /depot_ mount/<server-name>/package directory, using a network file- sharing scheme such as NFS. The client also has a /depot directory, which contains subdirectories as listed above for the server package directories (app-defaults ... sbin, excluding vendor). Entire packages from /depot_mount/... are symbolically linked into /depot/package (for example, /depot/package/ gcc_3.0). In addition, the files appearing within a package are linked into the corresponding directories of /depot (for example, /depot/package/ gcc_3.0/bin/gcc is linked to /depot/bin/gcc). Finally, if a package must write into host- specific files (for example, log or database files), these are placed in the directory /var/depot/<package-name> A server (or standalone host) may act as a client to itself, and possess /depot_mount, /depot, and /var/ depot directories as well, although we employ symbolic links rather than NFS to provide files to the /depot_ mount tree in that case. The behavior of depot on a client is controlled by configuration files: /depot/site controls the mounting of files from multiple depot servers; /depot/.exclude prevents specified packages from being linked into /depot/package; and, /depot/.priority controls which packages have priority for linking into /depot/{bin, html, ... sbin} when there are name conflicts between individual files coming from different packages. Thus far we have described shared depot packages, which a server provides to one or more depot client hosts. Packages that are specific to a client (for example, node-locked commercial products, or software that requires hardware that is specific to the client) can be installed directly into /depot/package as a local package; its files are linked into /depot/{bin, html, ..., sbin} along with the shared packages, subject to the same configuration files. Although this description may make depot seem complicated, in practice it is not. The main labor is learning how to make a new software package conform to depot's package directory structuring conventions. We often employ script wrappers to encapsulate actual binaries, which allows us to set up various environment variables for a given application, freeing the user from having to do so. The perl scripts of the ARL:UT depot system automate maintenance of the underlying system of links. Shared and Complementary Features of Depot and RPMBoth RPM and depot provide structured, disciplined means of managing software installations. Not surprisingly, they address many issues in common, but with varying degrees of rigor:
The two systems compliment one another in a number of important respects:
The Experimental EnvironmentThe study was done using an UltraSPARC 2 as the depot server, and an UltraSPARC 2 and two UltraSPARC 60 machines as depot clients. The machines were operating under Solaris 2.[5-8]. We standardized the naming and NFS automounting schemes used by depot as described above. At our request, Abbey and colleagues added new features to the original ARL:UT opt_depot scripts: the ability to mount software packages on a client from multiple depot servers; and, improved configurability of the perl scripts. We installed and used depot routinely for a period of four years, successfully supporting as many as two different versions of Solaris concurrently. At the time of writing, the depot server contained 321 shared packages and 21 local packages for Solaris 2.8. Source for the SPARC-compatible version RPM 4.0.2 was obtained from the web site https://www.rpm.org. Eighteen lines of the code had to be modified to get it to compile under Solaris 2.8 using gcc 3.0. RPM was installed as a shared depot package on the depot server, with the RPM database files placed in /depot/package/rpm_4.0.2/ vendor/var/lib/rpm. Information about all shared packages is logged into this database. On each client, the RPM database is installed in /var/depot/rpm_4.0.2/ local_db, and information about local packages is placed there. The need for and limitations of using two databases is discussed below. To allow RPM dependency checking to operate, we employed a script, vpkg-provides2.sh, provided with the Solaris version of RPM, which uses the information provided by Sun's proprietary package database to create entries for ``virtual'' RPM packages (there were 564 such packages on our depot server). We then installed a number of packages using RPM, while following the depot conventions: a library (libpcap 0.4); an application depending upon that library (snort 1.7); a self-standing source application (wget 1.6), and, a commercial pre-built binary application (netscape 4.77). Results/DiscussionThe RPM/depot merger experiment suggested a number of technical directions for future work:
ConclusionAutomation and standardization are two means of reducing the considerable costs of administering software on multiple hosts. Depot is a highly efficient means of managing local software, even on stand-alone systems; its benefits are compounded when used with multiple networked hosts, an environment in which one can use both network-shared and local (client- specific) depot packages. RPM is better at certain things such as documenting packages by database, and dependency checking, but is currently designed for use on a single machine. Our experiment in merging RPM and depot is a qualified success, in that most of our software can be installed and used with RPM/depot without modifying RPM or depot. A fully functioning RPM/depot system, however, requires slight modifications to RPM source code: most importantly, to allow the searching of multiple RPM databases to support dependency checking for local depot packages; less importantly, to couple execution of depot scripts to the execution of RPM commands, and to support the auto- matic reconfiguration of paths required when installing as a local depot package one that was originally designed to be shared (or vice versa). Such modifications seem a slight price to pay in order to turn RPM into a network-based software management tool. It would also make the system more attractive as a packaging system for use by other UNIX/Linux vendors. The existence of a single widely- shared system could save time for administrators, by allowing the creation of ftp- and Web-accessible archives of RPM/depot packages for Solaris (and other) platforms, greatly reducing installation effort. Modifications to RPM to support non-UNIX clients would be more complex than the ones just described, and are harder to justify. Code AvailabilityThe code and documentation for RPM/depot for Solaris will be available at the time of the conference, from: https://www.etg.nlm.nih.gov. AcknowledgementsWe wish to thank Jeff Johnson of Redhat for invaluable assistance with installing and using RPM under Solaris. Jonathan Abbey of ARL:UT assisted us in understanding and using his opt_depot scripts, and made helpful extensions to them at our request. Both provided helpful remarks about the manuscript, along with Jules Aronson of NLM and Nelson Beebe of the University of Utah. Funding Sources & CopyrightBoth authors are functioning as paid employees within a U. S. government research laboratory and produced this work as part of their routine duties. No additional funding was involved. As a work produced at government expense, this text is placed in the public domain and can not be copyrighted. Biographical NotesR. P. C. Rodgers (rodgers@nlm.nih.gov) works in biomedical informatics at the Lister Hill National Center for Biomedical Communications (LHNCBC), where he heads the Emerging Technologies Group. He received a B.A. From Harvard College in 1972, a M.D. from the University of Utah College of Medicine in 1976, and postdoctoral training from the University of London, University of Louvain, the National Cancer Institute, and the University of California, San Francisco (UCSF). He served on the faculty at UCSF prior to joining LHNCBC, a research arm of the U. S. National Library of Medicine (NLM). At NLM he became an early and active exponent of the World Wide Web, creating and running NLM's web services for the first two years of their existence. He has participated in a number of IETF working groups, and served as a founding member of the International World Wide Web Conference Committee and founding chair of the NSF/NCSA World Wide Web Federal Consortium. Ziying Sherwin (sherwin@nlm.nih.gov) received a B.S. in Computing & Engineering from Zhejiang University in 1996, and a M.S. in Computer & Information Science from the University of Delaware in 1999. She has worked Bell for Atlantic, and joined the Emerging Technologies Group at LHNCBC in 2000. References[1] Manheimer, K., B. Warsaw, S. N. Clark, and W. Rowe, ``The Depot: A Framework for Sharing Software Installation Across Organizational and UNIX Platform Boundaries,'' LISA IV, https://www.forwiss.uni-passau.de/archive/marchiv/systemverwaltung.html, 17-19 October, 1990.[2] Rouillard, J. P., and R. B. Martin, ``Depot-Lite: A Mechanism for Managing Software,'' https://www.usenix.org/publications/library/proceedings/lisa94/martin.html, LISA VIII, 1994. [3] Glickstein, B., ``GNU Stow,'' https://www.gnu.ai.mit.edu/software/stow/, https://www.gnu.ai.mit.edu/software/stow/manual.html. [4] Colyer, W., and W. Wong, ``Depot: A Tool for Managing Software Environments,'' LISA VI, https://andrew2.andrew.cmu.edu/depot/depot-lisaVI-paper.html, 1992. [5] ``The Depot Configuration Management Pro- ject,'' Carnegie Mellon University, https://andrew2.andrew.cmu.edu/ANDREWII/depot.html, https://asg.web.cmu.edu/depot/depot.html. [6] ``opt_depot,'' ARL, University of Texas at Austin, https://www.arlut.utexas.edu/csd/opt_depot/opt_depot.html. [7] Abbey, J., ``The Group Administration Shell and the GASH Network Computing Environment,'' LISA VIII, https://www.arlut.utexas.edu/csd/gash_docs/lisa_paper/paper.html, September, 1994. [8] Bakken, S. S., A. Christensen, T. Egge, and A. H. Juul, ``STORE,'' Norwegian University of Science and Technology, https://www.pvv.unit.no/~arnej/store/storedoc.html. [9] Defert, P., S. Gouache, A. Peyrat, and I. Reguero, ``ASIS User's and Reference Guide, Version 3.95,'' https://consult.cern.ch/writeups/asis/node1.html, CERN, 1997. [10] Bailey, E. C., Maximum RPM, SAMS Publishing https://www.rpm.org/max-rpm/index.html; https://www.rpmdp.org/rpmbook, 1997. [11] Bégnis, C., G. Cottenceau, G. Lee, and T. Vignaud, Mandrake RPM HOWTO, vol. 1.1, https://www.linux-mandrake.com/en/howtos/mdk-rpm/. [12] Ximian, ``Red Carpet,'' https://www.ximian.com/products/ximian_red_carpet/. [13] Debian, ``Package Management System,'' https://www.debian.org/doc/FAQ/ch-pkg_basics.html, https://www.debian.org/doc/packaging-manuals/developers-reference/. [14] Caldera International, ``Volution,'' https://www.caldera.com/products/volution/. [15] Sleepycat Software Inc., Berkeley DB, New Riders Publishing, Indianapolis, 2001. [16] Garfinkel, S., PGP: Pretty Good Privacy, First Edition, December, 1994. |
This paper was originally published in the
Proceedings of the LISA 2001 15th System Administration Conference, December 2-7, 2001, San Diego, California, USA.
Last changed: 2 Jan. 2002 ml |
|