Perl Practicum: The Swiss Army Chainsawby Hal Pomeranz
A Dab of PhilosophyI was expounding upon the glories of Perl to one of my colleagues the other day and he remarked that Perl seemed rather contrary to the UNIX design philosophy. Everybody has a different take on this issue, but combining simple tools to form more complex ones does appear to be a UNIX fundamental. Perl, on the other hand, intentionally provides a rich set of features which can, in turn, emulate a wide variety of different UNIX utilities. Of course Perl makes it easy to invoke various UNIX tools - it wouldn't be nearly as useful a language otherwise. Often, it will enhance the readability of your program to use different programs via open() or with backticks
rather than trying to write the same function in Perl. There are real
reasons, however, why this may be suboptimal.
First, there's the efficiency bogeyman. Certainly there is a great deal of overhead in setting up another process for execution. Of course, as the size of the data set you are processing grows, this overhead may become insignificant. As with all optimizations, you should experiment: try different solutions on real data sets and see which approach is most efficient.
A more telling argument in favor of avoiding vendor-provided tools is
portability. If you have ever main- tained software across multiple
platforms, then you know how difficult it is to find the utility you
need sometimes. Where does the
CognatesCertain UNIX utilities translate directly to built-in Perl functions. Perl has built-inchown() ,
chmod() , mkdir() , and rmdir()
functions. There's also link() and symlink()
for creating hard and symbolic links respectively, as well as
unlink() for removing files, and even a
rename() function to partially emulate mv .
The |
for (@files) { chown(0644, $_) || warn "Can't change permissions on $_\n"; }
A similar strategy can be used for those file operations which do not
operate on lists.
Note that if your operating system does not support one of the above
function calls, you will encounter various failure modes (some more
graceful than others). For example, if symbolic links aren't supported
on your system, then the |
eval "symlink($old, $new);"; warn "Symlink not supported\n" if ($@);
The $@ variable is guaranteed to be the null string if
the eval() succeeds, so this is a reliable test.
Sometimes Perl will simply invoke the appropriate operating system
tool if a function is not provided as a library call: the
SubtletyCertain Perl functions are closely related to UNIX filters. For example,split() and substr() emulate
cut very closely. Perl has a builtin sort()
function that is much more powerful than the UNIX sort
utility, but you have to define your own selection routines to do
really tricky sorts. (See the first Perl
Practicum for more information on devious
sorting). Sometimes, though, you have to change your thinking a bit to
get Perl to do what you want.
For example, programmers often like to use |
($basename = $0) =~ s%.*/%%; ($dirname = $0) =~ s%/[^/]*$%%;
The first substitution takes advantage of Perl's greedy pattern matching algorithm to eat up everything up to the last `/' in the pathname and throw it away. If you're interested in both the directory and the file name, you can use the following one-liner: |
($dirname, $basename) = $0 =~ /(.*)\/(.*)/;
Again, we're making use of the greedy pattern match as well as the
fact that pattern match returns a list of subexpressions in a
list. The statement looks a little strange, but the precedence is
correct.
Another common UNIX filter is |
open(FILE, "< myfile") || die "Can't open myfile\n"; while (<FILE>) { next if $seen{$_}++; ...do some processing here... } close(FILE);
Note that memory usage can get quite high if the file is large and
doesn't have a great deal of repetition. On the positive side, the
%seen array ends up having a count of the number of
repetitions of each line, in case you care to emulate uniq
-c . You can always run sort() on the unique lines
in the file if you really wanted the lines to be sorted.
The |
open(FILE, "< myfile") || die "Can't open myfile\n"; @lines = <FILE>; close(FILE); @found = grep(/$pattern/, @lines);
This, however, can be rather memory intensive for large files. Instead, simply operate sequentially: |
open(FILE, "< myfile") || die "Can't openmyfile\n"; while (<FILE>) { next unless (/$pattern/); ...process here... } close(FILE);
If you want a list of matching lines, rather than operating
sequentially, just push() the matching lines into a list
in the processing section. At least you save having to slurp the
entire file into memory.
The Perl LibraryAs distributed with Perl pl36, the Perl library contains several packages which emulate useful UNIX utilities. Additional packages are available in the Perl archives on coombs.anu.edu.au (150.203.76.2). Be sure to check there before reinventing the wheel.
You use a package by first "requiring" it and then calling the
functions it contains as you would any user-defined function. For
example, the |
require "ctime.pl"; $date_str = &ctime(localtime);
Of course, you don't get the formatting string capabilities that some
date commands provide, put you can always use
localtime() and printf() to emulate this
behavior.
Also in the easy to use category are the
Far and away the most useful volume in the Perl library, though, is
The
Beyond that, the processing done in
Plenty of other tools are available in the Perl library. In
Enough AlreadyHopefully by this point I've convinced you that Perl is more than capable of emulating most simple (and some complex, e.g., thes2p and the a2p translators that come with
the Perl distribution) UNIX tools. Sometimes, though you really need
to call some tool outside of Perl. For example, I have yet to find
anything better than:
|
chop($hostname = `/bin/hostname`);
So, next time we'll be talking about strategies for writing portable
Perl scripts across a bewildering variety of UNIX implementations with
subtly different pathnames and command behaviors. Tentative title to
be, "The Thing I Love About Standards Is That There Are So Many."
Reproduced from ;login: Vol. 18 No. 6, December 1993. |
Need help? Use our Contacts page.
Last changed: May 24, 1997 pc |
|