Perl Practicum: Fun With Formatsby Hal Pomeranz
Before Perl became a general purpose programming language, it was
PERL: the Practical Extraction and Report Language. You can find the
evolutionary remains of Perl's humble beginnings hidden away in dark
corners of the language. Formats, for example, are a Perl language
construct with a syntax unlike any other Perl construct and which
generally have functionality that can be emulated with other routines
(notably Simple ReportingOne of the first useful Perl applications I wrote was a little program to balance my checkbook: the application reads in a file of data containing all of the transactions I have made to date, and prints a nicely formatted statement with a running balance. I originally wrote the output portion usingprintf() statements, but when I
gave the code to Tom Limoncelli, he sent it back to me with all of the
printf() statements replaced with format code. Darn it,
his version was nicer (but my checkbook was balanced first).
I wanted to make the data file as easy to type as possible, so the format is very simple. The first line of the input file is the starting balance, in pennies (no need to type a decimal point and no floating point arithmetic). Each of the following lines represents a transaction: four tab separated fields giving the check number or transaction code, the date, a description, and the amount (again in pennies). Deposits and other credits to the account are represented as negative values (I seem to put money into my accounts much less frequently than I take it out). Here is a simple program to read this input file and generate a statement of the account: |
format STDOUT = @<<<<< @>>>> @<<<<<<<<<<<<<<<<<<< $@######.## $@######.## $code, $date,$descript, $amt, $balance . open(INP, "transactions") || die "Can't read transactions file\n"; chop($penny_balance = <INP>); while (<INP>) { chop; ($code, $date, $descript, $penny_amt) = split(/\t/); $penny_balance -= $penny_amt; $amt = $penny_amt / 100; $balance = $penny_balance / 100; write; } close(INP); format top = . Trans: Date: Description: Amount: Balance: ====== ===== ============ ======= ======== .
The first four lines in the example are a format declaration. The
first line defines the format's name. When the write() function is
called to print a line of formatted data, it uses the format named for
the currently selected file handle. In our example, the program is
sending the report to the standard output. Note that if no format name
is specified, STDOUT is assumed, but it is always better to name
formats explicitly, even when you are using STDOUT .
The second line is a picture of how each output line will look. Each
group of characters beginning with an
The picture's third line associates a variable with each field. When
the The last line of a format declaration is always a dot on a line by itself. This terminates the format declaration. Format declarations can appear anywhere in the program. The example above contains two format declarations: one before the code and one after. This was done to make the point; in your own code, I recommend you group all formats together near the top of the script. If there are multiple formats with the same name in the program, the one defined last will be the one that gets used.
If a format with the special name top is defined in the program, this
format will be printed at the beginning of each page of formatted
output. The special variable Dirty TricksWhile you can define a special top format for page headers, there is no way to define a format for page footers. There is, however, a trick for dealing with this situation. Whilewrite() usually uses the format
named for the file handle that the output is going to, you can use a
different format by assigning the alternate format's name to the
special $~ variable. The trick then, is to keep track of the number of
lines left on the page and emit a special footer format at the bottom
of the page. Here is the program logic for doing this:
|
format top = Trans: Date: Description: Amount: Balance: ====== ===== ============ ======= ======== . format STDOUT = @<<<<< @>>>> @<<<<<<<<<<<<<<<<<<< $@######.## $@######.## $code, $date,$descript, $amt, $balance format footer = Page @### $% . $footer_depth = 2; open(INP, "transactions") || die "Can't read transactions file\n"; chop($penny_balance = <INP>); while (<INP>) { chop; ($code, $date, $descript, $penny_amt) = split(/\t/); $penny_balance -= $penny_amt; $amt = $penny_amt / 100; $balance = $penny_balance / 100; write; if ($- == $footer_depth) { $~ = "footer"; write; $~ = "STDOUT"; } } close(INP);
First we introduce a new footer format and a new global constant,
$footer_depth , which is the number of lines that the
footer occupies on the page. The footer format in our example uses yet
another special variable, $% , which gives the current
page number (numbered starting with 1).
Each time we emit a line with
While this method works very cleanly, when each
If you ever want to change header formats for any reason - for example
if you wanted a large header on the first page, but only minimal
headers on the other pages - you can use the special Multi-Line FormatsConsider a couple of important facts about the top format in the two examples. First, there are no field definitions anywhere in the format declaration. It is perfectly legal to have a format with no field declarations, though in practice you will probably only do this for header formats.Second, the format declaration defines multiple lines of output. This also is perfectly legal and each line can have zero, one, or more field declarations in it. The general pattern for multi-line format declarations is one line of field descriptions, followed by a line containing the variables associated with those fields, followed by another line of field descriptions, etc.
The next example shows an interesting use of multi-line formats. For
purposes of this example program, we are assuming a function called
|
format message = Date: @<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $header{`Date'}, $body From: @<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $header{`From'}, $body To : @<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $header{`To'}, $body Subj: ^<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $header{`Subject'}, $body ~~ ^<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $header{`Subject'}, $body . $~ = "message"; while (<STDIN>) { &mailparse(); write; } Date: Tue, 11 Apr 1995 16:39:06 Hal-- What's the status of your Perl From: tmd@iwi.com (Tina M. Darmo article for the upcoming issue of ;login;? To : hal@netmarket.com (Hal Pom Rob needs to review the article before Subj: Your ;login: article is giving it to Carolyn for typesetting. *OVERDUE* Please send email soon-- the fate of the universe is at stake. --Tina
There are a number of new constructs in the message format in this example. First are the fields that begin with ^ instead of @ . For these
fields, Perl outputs as much text as will fit in the field and then
removes that text from the string variable. By stacking several ^
fields together using the same long string, you can output that string
as a block of text with a ragged right margin, as shown in the output,
with both the body of the message and the Subj: line. The special $:
variable (last special variable in this column, I promise) is the set
of characters on which Perl can legally break the line; the
default value for $: is \n - (newline, space, or hyphen).
The special ConclusionI have run across many Perl programs with complexprintf() blocks that
would have been much easier to write and much more readable if the
developer had used formats instead. If you need to quickly produce
reports, or output large amounts of tabulated data, formats are an
extremely effective tool.
Reproduced from ;login: Vol. 20 No. 3, June 1995. |
Need help? Use our Contacts page.
Last changed: May 24, 1997 pc |
|