|
Perl Practicum: The Devil in the Details
by Hal Pomeranz
A Modest Proposal
We usually think of UNIX tools using command-line switches rather than
configuration files. Sometimes, however, the number of configuration
options is so large or the amount of configuration information is so
daunting (e.g., sendmail or inetd ) that a
configuration file is required. Configuration files also give users a
relatively easy interface for customization and enable the program to
adapt over time without destabilizing the application's actual code.
Lately, I have been writing a lot of applications that cry out for the
use of configuration files, and I have found that a particularly
flexible paradigm is to write config files in Perl syntax and then
eval() them. There is no need to write a new
file-parsing routine for each application, and the configuration files
have direct access to data in the program's environment. The downside,
of course, is that it is harder for users (particularly nontechnical
users) to configure the application. Still, if you write tools for
developers (as I generally do), this is an extremely powerful idea.
Pulling in Files
Generally, there are three ways to bring such configuration files into
a program. First, there's the " open() the file and
slurp" method:
|
open(FILE, $file) || die "Failed to open $file\n";
@lines = <FILE>;
close($file);
eval("@lines");
die "Failed to eval() file $file:\n$@\n" if ($@);
|
The variable $@ is defined only if the preceding
eval() statement detected a syntax error. Configuration files
tend to be small and RAM tends to be large, so there is not much worry
in using too much memory for @lines . Note that
$file must be the configuration file's full path name or
$file must be in the current working directory of the program.
A second option, then, is to use do $file . The do
construct searches the standard Perl "include" path stored in
@INC . The problem is that do won't trap
syntax errors in the configuration file. Instead write:
|
$result = do $file;
die "Probable syntax error $file\n" unless ($result);
|
and make sure to end $file with a nonzero statement as is
typical for Perl library files (usually the last line of such files is
simply "1; "). If it hits a syntax error, do
stops evaluating - causing $result to be
undef .
Rather than having to remember to end config files with statements
that evaluate to be nonzero, another option is to use
require , which searches through @INC just
like do but also raises a fatal error if the file
contains a syntax error. These errors can be trapped with eval() like
this:
|
eval('require("$file")');
die "*** Failed to eval() file $file:\n$@\n" if ($@);
|
The only difficulty with require is that it will include a given file
only once. This may not seem like an issue at first, but suppose the
application is a daemon that is supposed to re-read its configuration
file when it receives a HUP signal.
It turns out that require keeps track of which files the
program has read with the %INC hash. The keys to the hash
are the arguments given to require , and the values are
the full pathnames to the file as found by searching
@INC . Using this information we can write this simple
function:
|
sub acquire {
my($file) = @_;
delete($INC{$file});
eval('require("$file")');
die "*** Failed to eval() file $file:\n$@\n" if ($@);
}
|
This gives us all the benefits of require and still enables us to reread the same configuration file.
But What Good Is It?
All right, we can safely read in configuration files written as Perl
code, but what exactly does this buy us? In one of my early columns, I
talked about writing portable Perl scripts that read in a
configuration file that contained machine-specific configuration
information. For example, consider an /etc/OSinfo file on
all machines that contained information like:
|
$VENDOR = "Sun";
$HARDWARE = "Sparc";
$OS = "Solaris";
$VERSION = "2.5";
$HOSTNAME = `/usr/bin/uname -n`;
$PSCMD = "/usr/bin/ps -ef";
$MAILER = "/usr/bin/mailx -s";
|
On Berkeley-based systems, $PSCMD might be ps
-aux , and $HOSTNAME might be set by calling
hostname instead of uname . All of a site's
administration scripts could simply use the acquire()
function to suck in all this configuration information. Assuming the
scripts used the variables set in the configuration file, they would
be completely portable across every machine on the network.
Configurations files can, of course, contain more than simple scalar
variables. For example, I wrote myself a little program that splits my
mailbox up into smaller files based on who the email comes from. The
program reads a configuration file which defines a hash like:
|
%File = ("firewalls-owner" => "firewalls",
"owner-namedroppers" => "dns",
"bind-" => "dns",
"socks-owner" => "socks",
"owner-www-security" => "wwwsec",
"owner-best-of-security" => "bos",
"bosslug-owner" => "bosslug",
"owner-solaris-x86" => "x86",);
|
The keys of the above array are all From addresses of
various mailing lists I subscribe to. If the From address
of a given message matches one of the keys in the hash, then the
message is deposited in the file whose name is
$File{$key} (if the From address doesn't
match any key, then the message goes into a default file). This
program is very useful when I've been out of the office for several
days and I want to ignore all the mailing list traffic I usually get
and just concentrate on mail sent by individuals.
It is even possible to define subroutines in the configuration
files. Because the eval() statements happen at runtime,
function definitions in the config file will always override
declarations with the same name in your program. For example, a
program like this:
|
eval('require("$file")');
die "*** Failed to eval() file $file:\n$@\n" if ($@);
sub printer {
print "In Perl prog\n";
}
printer();
|
with $file that contains this:
|
sub printer {
print "In required file\n";
}
|
will print In required file\n when the configuration file
invokes printer . My PLOD program uses this idea to enable
users to replace a standard (but weak) encryption routine with a
stronger routine of their own devising.
Generally avoid using symbols in a configuration file that will
clash with variables and function names in the programs. One simple
solution is to use all uppercase symbols in configuration files and
all lowercase symbols in programs. Symbols in your configuration
file could also be prefixed with some standardized string (such as the
name of the configuration file itself). Alternatively, use package in
the configuration file to push all symbols into a protected namespace.
Hybrid Files
Sometimes it is undesirable to force users to write a full-blown Perl
script just to configure the latest tool. Instead, consider using a
hybrid-type file that has easy-to-parse fields, some of which might be
Perl expressions. For example, we could rewrite our
/etc/OSinfo file as
|
# Sun specific:
VENDOR Sun
HARDWARE Sparc OS Solaris
VERSION 2.5
HOSTNAME `/usr/bin/uname -n`
PSCMD "/usr/bin/ps -ef"
MAILER "/usr/bin/mailx -s"
|
and then read in the file with
|
open(FILE, "/etc/OSinfo") || die "Failed to read config\n";
while (<FILE>) {
next if (/^(#.*|\s*)$/);
($key, $val) = split(/\s+/, $_, 2);
$Config{$key} = eval($val);
die "Error on line $.:\n$@\n" if ($@);
}
close(FILE);
|
Note that this code skips comments (lines beginning with a "#") and
blank lines (lines that contain only white space). It uses the
three-argument form of split() so that the line is broken
into two pieces. This kind of hybrid file can give the best of both
worlds: easy, yet extremely flexible and powerful configuration.
Wrapping Up
Although these kinds of configuration files can be extremely powerful,
they can also be a living nightmare for users. Make them have to
configure only useful parameters, and don't bury them under a huge
number of options. Choose sensible defaults that will work properly in
the normal case. Provide users with a variety of well-documented,
precon- figured files that they can copy and modify to suit their
particular needs.
Reproduced from ;login: Vol. 21 No. 3, June 1996.
|