by Dave Taylor Dave Taylor has been hacking on the Net since 1980 and has created thousands of Web pages, most of which format correctly. He's also the author of Creating Cool Web Pages with HTML and Teach Yourself Unix In a Week.
Graft a Smart Error Page System to your Web SiteI usually talk about standalone CGI programs in this column. But I just set up a new Web server (RedHat Linux 5.0 on a 300Mhz Pentium II box, if you're curious) and I decided that, instead of the ugly generic error messages given to people when they encounter an error on my Web site, I'd like to offer something more useful. My error message would not just have my company logo (which is a great first step, of course), but actually help people find what they seek on the site itself. The process of adding this error page to the Web server, writing the page, and then writing a simple underlying search engine (using grep) is what I talk about in this column. Hooking in Your Own Error Page The first step is to delve into the Apache Web server configuration file. (If you're running a Web server other than Apache which I think is fabulous then you'll probably have to do something slightly different in this spot.) The file, usually named /etc/httpd/conf/httpd.conf contains quite a few lines of different configuration elements, the vast majority of which you should definitely not touch until you're an Apache configuration expert. Fortunately, what we want to do is straightforward. Apache Web servers can serve up lots of different domains on the same IP address and Web server. Indeed, my system is host for about 15 different Web sites. The error page I'm adding here is only for the .intuitive.com domain, so the trick is to find the .intuitive.com virtual host configuration section in the file and then add a specific line. Before my changes, the file looked like:
<VirtualHost www.intuitive.com> This defined the actual filesystem location of the root directory of this domain (/web/intuitive/) and the location of all the log files (/log/intuitive/). To hook in the new error page, I simply added: ErrorDocument 404 /error-page.html somewhere in this configuration section. Where you place it doesn't matter. Creating the Error Page Though the previous configuration appears to have the error page in the topmost directory of the filesystem, Apache is smart enough to know already that you've specified a root in the system for the specified domain, so in fact this file needs to be located at DocumentRoot/ErrorDocument or, a bit more clearly, /web/intuitive/error-page.html. Part of the goal of the error page is to offer visitors the ability to enter a keyword or two and search through all documents on the site to try and find that which they were originally seeking. That's reflected in the middle of the simple HTML document created as the error page:
<HTML> I won't belabor the HTML here it's all pretty straightforward other than to note that the error page includes a form that prompts for a few key words and then feeds the user entry to the CGI script /apps/search-everything.cgi on the server. One trick worth mentioning: explicit tables can be a nice way to box an important message on the page! Figure 1 shows you how this page looks. You can, of course, search for some gobbledygook URL on my site and find it for yourself quite easily!
Figure 1: The new improved Intuitive.com error page Without the search capability, we'd be done. New error page, much cooler than the default "404 File not found." However . . . Building a Search Engine The good news about building a search system for your Web site is that you've already made the smart move: you're running a UNIX-based operating system. This means that you can let the grep command do all the work. Because we're using a METHOD=GET in the form itself, the pattern entered is held in the environment variable QUERY_STRING. Sent as name=value, a quick invocation to sed strips it to its basics: pattern="`echo $QUERY_STRING | sed 's/p=//g'`" Armed with the search pattern, you then use the find command to look through all the HTML files on the site: find /web/intuitive -name '*tml' -print | xargs grep -il '$pattern' This gives us the ability to display matching files, and we can easily add clickable links to them all by first stripping out the actual file root (because remember that /web/intuitive/index.html is the URL /index.html) with two lines buried in a for loop that steps through the output of the above find command:
for filename in `cat $outputfile' ; do But we can do better than this and have output that's considerably more attractive and interesting. The solution is to extract the TITLE of each document by again using grep, stripping the HTML tags therein, then using that as the text of the link:
for filename in `cat $outputfile` ; do There's one problem with this. Going through my Web site for a quick analysis reveals that several documents have remarkably similar titles (some of which aren't even useful, if you can believe it!). As a result, the output really needs to list both the filename and the TITLE of the document, as available. Now all that's left is to do some error checking (What if they skipped entering a pattern? What if there are no matches to the pattern?) and wrap it in some nice HTML formatting. Again, I opt for a TABLE to have it look nice on the screen, as you can see in Figure 2.
Figure 2: The results of a search for "Linux" The final CGI script, written as a Bourne shell script, is shown here:
#!/bin/sh -f
ALT=\"INTUITIVE SYSTEMS\" WIDTH=485 HEIGHT=62" Conclusions I encourage you to jump onto my Web site and enter a URL that you are sure won't work correctly. Try <http://www.intuitive.com/missing-page.html>. Once you're there, type in a word or two as a search pattern to see what kind of results you get. It'd be nice to refine this further so that you could have an HTML tag in specific pages that prevent them showing up as matches to a sitewide search, and for the search results to be smart enough to show you a META DESCRIPTION value if one is present in the file as further information.
|
|
First posted: 8th July 1998 efc Last changed: 8th July 1998 efc |
|