Chapter 3 Advanced Form Processing and Data Storage

by David Harlan

CONTENTS

Using the POST Method
- Comparing GET and POST
- Processing the Information from a POST-Method Form
Making Your Perl Code Modular
- Defining and Calling Subroutines
- Using Variable Aliases
Using DBM Files for Data Storage
Using Complex Forms and Storing Related Data
- Processing the Form Data and Checking the Password
- Working Around the Limitations of DBM Files
From Here

In Chapter 2 "Introduction to CGI," you created a fairly simple (but usefully extensible) Guestbook application for your Web site. This chapter discusses a more advanced user-input example: an online Internet-use survey. You probably have seen similar surveys, but this chapter takes the concept a step further. In your survey, the data will be saved so that you can refer to it later, online. You also add to the user data from another form.

Using the POST Method

The CGI specification provides two methods of calling a script: the GET method, which you used in Chapter 2 and the POST method, which is discussed in this section. Look at the form shown in figure 3.1.

Figure 3.1 : This figure shows the first form from the online experiment.

The first thing that you should notice about this form is that it's much bigger and more complex than the Guestbook form in Chapter 2. You're asking for significantly more data this time. Why is this fact significant? Recall that when you posted the form in Chapter 2 the data from the form appeared in the Location box, as part of the URL of the resulting screen (refer to fig. 2.2 in the preceding chapter).

Figure 3.2 : This screen shows a listing of files on the author's server.

Some people may complain that this huge URL clutters the screen and makes things too messy for their taste. This may be true, but as you'll soon see, aesthetics are not the only reasons why you won't always use this method of forms processing.

Comparing GET and POST

What can you do about this ugly URL? Fortunately, you can use either of two methods for submitting HTML forms. If you look at line 2 of Listing 2.1 (refer to Chapter 2, you notice that the form element of this page is opened with the following tag:


<form method=get action="/cgi-bin/harlan/guestbook">

The key portion of this tag in the current discussion is method=get, which tells the browser that the /CGI-BIN/HARLAN/GUESTBOOK script is expecting an HTTP request of type get from this form. The browser must know the request method so that it can send the information back to the server properly. The most important thing that this syntax change affects is where the user-submitted data appears when it gets to the CGI script.

The GET method places all the data in the URL portion of the request. Specifically, everything after the question mark in the URL portion of a GET request is user-submitted data. The server software puts this information in the QUERY_STRING environment variable for use in the script that will process the form. Aside from the obvious aesthetic difficulties, this method also creates a significant functional roadblock.

Some Web servers limit the length of the URL portion of a request (check the documentation for your server). So you might not be able to submit larger forms by using the GET method on some servers. Fortunately, you have the POST method to handle larger forms. Listing 3.1 shows the HTML code for the form shown in figure 3.1.

Listing 3.1 The First Part of the Experiment Registration Form (USERFORM.htmL)


<body bgcolor="#FFFFFF">

<title>User Information Form</title>

<center>

<h2>User Information</h2>

<table width=650>

<tr><td colspan=4>

Thank you for your interest in our experiment.

The information below is needed to correlate Internet use to demographic data.

Please provide information in <b>all</b> fields below. 

This information will only be used in this study.

No information about you specifically will ever be used without your permission.

<form method=post action=/cgi-bin/harlan/postuser>

<tr><td colspan=4 align=center>

<h3>Identity</h3>

<tr><td align=right>E-mail Address:

<td colspan=3><input type=text size=40 name=email>

<tr><td align=right>First Name:

<td><input type=text size=20 name=firstname>

<td align=right>Last Name:

<td><input type=text size=20 name=lastname>

<tr><td colspan=4 align=center>

The most obvious difference is in line 11, in which the script opens the form definition with the POST method specified.

Processing the Information from a POST-Method Form

How do you get the information from this form? See Listing 3.2. This script processes and saves the information, and prints a simple thank-you page in response.

Listing 3.2 The postuser Script (POSTUSER1.PL)


#!/usr/bin/perl



read(STDIN,$temp,$ENV{'CONTENT_LENGTH'});

@pairs=split(/&/,$temp);

foreach $item(@pairs) {

   ($key,$content)=split (/=/,$item,2);

   $content=~tr/+/ /;

   $content=~ s/%(..)/pack("c",hex($1))/ge;

   $fields{$key}=$content;

}

dbmopen(%users,"users",0666);

print "Content-type: text/html\n\n";

if (!defined($users{$fields{'email'}})) {

   $users{$fields{'email'}}="$fields{'firstname'}::$fields{'lastname'}::$fields{'cont'}

::$fields{'country'}::$fields{'gender'}::$fields{'age'}::$fields{'income'}

::$fields{'employment'} ::$fields{'netexp'}::$fields{'netconn'}::$fields{'workuse'}";

   print "Thanks for registering for our survey.

   Please remember to come back weekly to record your net use.";

}

else {

   print "Someone has already registered from that e-mail address. Sorry.";

}

Except for line 3, much of the beginning of this script should look familiar. If you look at Listing 2.2 in Chapter 2 you see that the text-processing lines are almost identical to those in Listing 3.2. Both lines take the text in the $temp variable; split it into an array of key/value pairs; and then place each pair in an associative array, associating each key with its value.

The unfamiliar line is fairly simple. The read() function takes a file handle, a variable, and an integer for arguments. So the script in Listing 3.2 is reading $ENV{'CONTENT_LENGTH'} bytes from the file handle STDIN and putting the information that it finds in the variable $temp. Recall that in the guestbook script, you copied $temp from $ENV{'QUERY_STRING'}. So you see the major difference between processing GET-method forms and POST-method forms.

Built-in File Handles

Perl uses normal UNIX names for its built-in file handles. STDOUT, which is the standard file handle for output, is where you normally print from a Perl script. If you are running a script from the command line, output directed to STDOUT appears on the console. In CGI applications, STDOUT goes back to the Web server to be sent to the browser.

STDERR is a file handle that is used for error messages. Operating systems do different things with STDERR, but text printed to this special file handle generally ends up in some sort of log. In CGI applications, most Web servers print messages that are directed to STDERR to the server's error log-a useful fact that you can use to debug your CGI scripts.

STDIN is a file handle in which you can often find input for your Perl script. Running scripts from the command line, you would read from STDIN if input were piped to your script from another command. (This syntax might look something like cat filename | myperlscript.) In CGI, the Web server puts POST-method form input in STDIN.

As I said earlier, the rest of the processing is exactly the same. So, you might now be asking yourself, if you're going to do a great deal of form processing, wouldn't it be much easier to write some of this code once and reuse it? The answer, of course, is yes.

Making Your Perl Code Modular

One thing that almost all good programmers have in common is an abhorrence for redoing work. We all want to do things as efficiently as possible. One of the best ways to make programming more efficient is to put code for common tasks in a place where other programs can easily use it.

Perl provides an effective feature for this purpose. The simplest method, which works for both versions 4 and 5 of Perl, is to create a Perl library. Perl 5 adds a new entity called a module to the mix. Modules are similar to libraries, but they allow the advanced Perl programmer to use object-oriented programming syntax.

TIP

To find out what version of Perl you are using, type perl -v at your command line. My machine shows the following display when I type this command:

portland:~/# perl -v This is perl, version 5.002 Copyright 1987-1996, Larry Wall Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5.0 source kit.

You can see that I'm using Perl 5.002. If you're using Perl 4, this command tells you that you're using Perl 4 and then gives you a patch level. If the display shows anything other than patch level 036, you'll want to upgrade. You probably should upgrade to version 5 anyway; it's available for most platforms on the CD-ROM that accompanies this book.

Listing 3.3 shows what the postuser script might look like with a more modular approach. You can see that the code is much shorter and easier to read.

Listing 3.3 The Modular postuser Script (POSTUSER2.PL)


#!/usr/bin/perl

require ("process_cgi.pl");



&parse_input(*fields);

&print_header;

dbmopen(%users,"users",0666) || die "Can't open users DBM file\n";

if (!defined($users{$fields{'email'}})) {

   $users{$fields{'email'}}="$fields{'firstname'}::$fields{'lastname'}::

$fields{'cont'}::$fields{'country'}::$fields{'gender'}::$fields{'age'}::

$fields{'income'}::$fields{'employment'}::$fields{'netexp'}::

$fields{'netconn'}::$fields{'workuse'}";

   print "Thanks for registering for our survey. Please remember to come back 

weekly to record your net use.";

}

else {

   print "Someone has already registered from that e-mail address. Sorry.";

}

What's going on in this new script? The first new command that you see is require(). This command tells the Perl interpreter to find the file specified in the argument (PROCESS_CGI.PL, in this case) and use it as though it were part of the script. You then see the two lines &parse_input(*fields) and &print_header. The first line parses the form input; the second line prints the page header.

These two lines demonstrate one of the major advantages of modular programming: it makes code much more readable. You may not know exactly how these lines do their thing, but you can read this program and understand it. In this case, however, you really do want to know what's happening. The following sections look at these lines a little more closely.

Defining and Calling Subroutines

The ampersand tells the interpreter that each line is a call to a subroutine. Because no subroutines are defined in this script, they must be defined by the code in the require("process_cgi.pl") statement near the top of Listing 3.3. Listing 3.4 shows process_cgi.pl.

Listing 3.4 The First Incarnation of process_cgi.pl (PROCESS_CGI1.PL)


#process_cgi.pl. A Perl library for CGI processing.



sub form_method {

   $method=$ENV{'REQUEST_METHOD'};

}



sub print_header {

   if (!defined(@_)) {

     print "Content-type: text/html\n\n";

   }

   else {

     print "Location: @_\n\n";

   }

}



sub parse_input {

   if (defined(@_)) {

     local(*input)=@_;

   }

   else {

     local(*input)="*cgiinput";

   }

   local ($temp,@pairs);

   if (&form_method eq 'POST') {

     read(STDIN,$temp,$ENV{'CONTENT_LENGTH'});

   }

   else {

     $temp=$ENV{'QUERY_STRING'};

   }

   @pairs=split(/&/,$temp);

   foreach $item(@pairs) {

     ($key,$content)=split (/=/,$item,2);

     $content=~tr/+/ /;

     $content=~ s/%(..)/pack("c",hex($1))/ge;

     $input{$key}=$content;

   }

return 1;

}

return 1;

This code introduces several concepts. First, this file defines three subroutines. The syntax for subroutine definition is simple; the keyword sub is followed by a word that becomes the subroutine name. The final portion of the subroutine definition is a block of statements enclosed in braces ({}). Unfortunately, the way in which these subroutines do their jobs is not immediately clear. The following paragraphs examine each subroutine.

TIP

Subroutines are not used only in Perl libraries; you can define a subroutine in any Perl script to reuse code within that script. Suppose that you have a script that reads a given file into a given array a certain number of times. Instead of reproducing that code each time, you could (and perhaps should) create a subroutine and then call that subroutine each time you want to read a file.

The form_method subroutine has a very simple function-it returns the contents of the REQUEST_METHOD environment variable. But how does the subroutine work? Notice that line 9 of the parse_input subroutine checks the value of the &form_method subroutine call. If you have ever done any Pascal programming, you would expect $ENV{'REQUEST_METHOD'} to be assigned to the subroutine name somewhere in the form_method sub. This script clearly doesn't do that, however; Perl takes a different, behind-the-scenes approach. In any subroutine, the last assignment performed in the block is returned as the value of the subroutine. (The only exception to this rule arises when you use the return function, which returns the specified value.)

By assigning $ENV{'REQUEST_METHOD'} to $method in the form_method sub, then, you can check the value of &form_method, as in parse_input.

The second subroutine in Listing 3.4 prints the proper header for an HTML page. The first line of the procedure checks to see whether the Perl special variable @_ is not defined. The exclamation point before the defined() function tells Perl to negate the result. So if the function tells you that @_ is not defined (that is, if it returns false), the conditional returns true and executes the first statement block. The @_ variable contains any arguments of the subroutine. So if you call this subroutine with &print_header;, @_ is not defined, and the script prints the Content-type: text/html\n\n header.

If you call the subroutine with &print_header("http://192.0.0.1/cgi-bin/harlan/ showguestbook2");, @_ is defined, and the script prints Location: http://192.0.0.1/cgi-bin/harlan/showguestbook2\n\n. This code should look familiar. Refer to Listing 2.6 in Chapter 2, where this precise string was used at the end of the guestbook script to send the browser to the guestbook display script.

You have created a very useful subroutine that not only prints the standard HTML page header but also prints the Location header when you want to redirect the browser to an existing page, rather than print HTML from the script.

Using Variable Aliases

The final subroutine in Listing 3.4 performs the all-important input processing. The first part of the sub is an if/else construct, similar to the one in the preceding subroutine. Here, the script checks to see whether there were any arguments when the sub was called; if so, it performs a little Perl magic on that argument.

The line local(*input)=@_; is not an easy one to understand. The local() function makes the variables listed as arguments local to the program block from which local() was called. Look at line 8, which makes $temp and @pairs local variables. This syntax ensures that the script won't change the values of any other variables with the same names in the program that's calling this subroutine.

But what's happening in the first two calls to local()? You're not seeing a new variable type. These lines perform type-globbing; a simpler name is aliasing. What you want to do in this subroutine is assign the form data to a user-specified associative array. To do so, you have to work with a global copy of that array. You may think that you should just pass the %fields array as the argument to this subroutine call. That procedure wouldn't work, though, because Perl would assume that you want to work with the values in %fields and not with the array itself.

So the Perl developers came up with a method. If you assign one variable name, preceded by an asterisk (*), to a localized variable name, also preceded by an asterisk, Perl works with the local variable as an alias to the other. An important point is that all variables with that name are aliased. Although you only work with the associative array in this example, you could also play with the global copies of similarly named scalar variables and standard arrays in this routine, modifying their global values accordingly.

Now, as you recall, the location of the input data depends on the method that the form uses to call the script. Because you really want to make parse_input universal, you want to be able to use it for either method; the second if/else construct in Listing 3.4 does just this. If the form method (as determined by a call to the form_method subroutine described in "Defining and Calling Subroutines" earlier in this chapter) is POST, the script gets the data from STDIN and places it in $temp; otherwise, the value of $temp comes from $ENV{'QUERY_STRING'}.

The final portion of the routine does the input processing exactly as described in Chapter 2

Thus, in your parse_input routine, you're telling Perl to use input as an alias to the variable specified in @_. The script then parses the form input into that alias. When the script makes the call in line 3 of Listing 3.3, parse_input works on %input as an alias to %fields. The result is that the data is placed in the %fields array exactly as it was in Listing 3.2.

One final note about parse_input: notice the first conditional. This conditional allows a user to call parse_input without arguments and have the routine assign the form input to a default variable-in this case, %cgiinput.

Using DBM Files for Data Storage

Now you see how you get the information from the form into the %fields array in Listing 3.3. But when you look farther down in the script (and also in Listing 3.2, the last few lines of which are identical to those in Listing 3.3), you see some new commands that I need to explain.

In the Guestbook program in Chapter 2 you stored the data in a text file. This method was acceptable because you didn't need to access the data in any way except to print it. Suppose, however, that you don't want any given person to sign in more than once. You would have to scan the file each time to see whether the name existed in the file. The necessary code would look something like Listing 3.5.

Listing 3.5 Example Code for Scanning the GUESTBOOK.TXT File


open (gbfile, "guestbook.txt");

while (<gbfile>) {

   ($date,$name,$comment)=split(/::/,$_);

   if ($name eq $fields{'name'}) {

     $nameused='y';

     last;

   }

}

close (gbfile);

if ($nameused eq 'y') {

   #Do stuff here to tell the user that she can't sign in more than once

}

else {

   #go on with the rest of the guestbook script as normal

Aside from the fact that you wouldn't want to depend on this method for preventing multiple sign-ins, it should be fairly obvious that scanning the entire text file for a name each time a user signs the guestbook could become time-consuming as the file got larger. So there must be a better way, right? The answer, of course, is yes (I wouldn't have posed the question otherwise). Perl provides the DBM file for just this situation.

A DBM file (DBM stands for database management) is a special type of file that is inherited from Perl's UNIX roots. DBM files perform very simple database functions. The following sections examine how DBM files work.

Opening a DBM File

Line 5 of POSTUSER2.PL reads as follows:


dbmopen(%users,"users",0666) || die "Can't open users DBM file\n";

As you might expect from its name, this line opens a DBM file. To be precise, the line opens the DBM file called USERS and links it to the associative array %users. 0666 stands for a file mode that tells Perl to provide read and write access for this file for everyone. (See "Initializing a DBM File" later in this chapter for details on file modes.) If the file does not exist, Perl attempts to create it with the specified file mode. If the file can't be opened or created, the || die... construct tells Perl to exit the program.

Take a closer look at the || die... syntax. The || symbol is a standard Perl operator-basically, a logical or. If the code before || does not return a value of true, the script performs the action specified after ||. In this case, you want to perform the die function, which tells Perl to exit the program immediately and print die's arguments to STDERR. (Remember that in CGI programming output, STDERR usually ends up in the server's error log.)

This piece of code is a very useful part of your CGI programming arsenal. If you did not have die at the end of this line, Perl would go on with the program blithely, even if it couldn't open your DBM file. Because the user wouldn't be alerted to the problem, he wouldn't know that anything was amiss. You wouldn't be aware of the problem, either, until you started wondering why no data was showing up anywhere. By then, you would have lost the registration data of who knows how many users. With die (in this case, anyway), the user would get a malformed header from script error, and you would be alerted to the problem. Then you could go back to the error logs and track down the problem with little difficulty.

I use die liberally in my CGI programs. die is particularly valuable for dealing with external files, but you'll find many other uses for it as you get deeper into CGI programming.

Assigning Values to a DBM File

Now that you have the DBM file open, you need to assign values to it. The first thing that you need to decide is what value in your data set will serve as your key. Each key must be unique within any DBM file. In the postuser script, I chose an e-mail address as the key for this DBM file. This key makes sense, because an e-mail address should be a unique identifier for a user. (Sorry-those who share e-mail boxes need not apply here.)

After you decide on a key, you can assign your values to the file. To be safe, before you make any assignments to this file, check to make sure that the key doesn't exist in the file, using the following code:


if (!defined($users{$fields{'email'}})) {

This statement says, "If the address in $fields{'email'} does not already exist in the user's DBM file, execute the lines that follow." Those lines simply assign the data from the form to a long string associated with the e-mail address in the user's DBM file. The script then prints a one-line HTML page that thanks the user for registering.

If the e-mail address already exists in the database file, the script prints a one-line HTML page, telling the user that the address is already in use.

Essentially, when the DBM file is open, working with it is just like working with a standard associative array. You can assign to it, read from it, and iterate over it exactly the same way that you would a normal hash. Some things are different, however; you learn about them later in the following section and in Chapter 4, "Advanced Page Output."

Initializing a DBM File

If you take USERFORM.htmL and the postuser script off the CD-ROM and put them on your Web server, changing the directory references as necessary, you may be able to start collecting data right away-but probably not.

Most Web servers run CGI scripts under a nonprivileged user name (frequently, the user nobody) to avert security problems. This arrangement is a good thing, in general; keeping your computer safe from intruders is important. But when you want to write to files on the computer, this situation can become bothersome. You need some knowledge of UNIX file permissions to get around the problem. To explain this concept thoroughly, I have to digress a bit. (If you already know about UNIX file permissions, feel free to skip ahead.)

UNIX and its many cousins use a very flexible-and sometimes confusing-system for file permissions. To look at the files in your directory, use the ls command. Figure 3.2 shows a full listing of the directory that I've been working in.

Consider an example file from this listing. The third file listed is the GUESTBOOK.TXT file, which was used to store the data for the Guestbook example in Chapter 2 Starting from the left, the first string of characters defines who can do what to this file (I'll get back to that topic soon). The next two items define the user who owns the file and the group to which that file belongs. In this case, the file belongs to the user root and the group root. root is the most privileged user on a UNIX system. In this example, I own and administer the server that I'm working on, so I'm root. A more typical listing would show user harlan and group users, or something similar. The next entry in this line is the size of the file, in bytes, followed by the date and time of the last modification, and ending with the file name.

Now all that's left to explain is that first string of characters.The first character in this string is a dash except under very special circumstances that don't apply here. (This character tells you whether the listing is a directory, among other things.) The next nine characters indicate the permissions for this file. The first three characters refer to the owner of the file; the second three, to the group; and the final three, to everyone else. Each trio of characters indicates read, write, and execute permissions, in that order. If the group portion of the permissions reads rwx, any member of the group indicated on the line can read from, write to, and execute that file.

Sometimes, these permissions are designated numerically, as in the dbmopen command. For the GUESTBOOK.TXT file, for example, the permissions equal 0666. The first zero sets the initial character to a dash. Each successive digit sets the values of the rwx characters for the user, then for the group, and then for everyone else. The digits are calculated as follows:

Read permission equals 4, write permission is 2, and execute permission is 1. Thus, the permissions for GUESTBOOK.TXT (-rw-rw-rw-) come out to 0666.
If you want the user to have all permissions and everyone else to have no permissions, set the mode to 0700.
As indicated in Chapter 2 you generally set the mode for scripts to 0755. This mode means that the user can read, write, and execute the file (4+2+1=7). Group members and others can read and execute the file (4+1=5).

In the case of the GUESTBOOK.TXT file, the permissions for the user, group, and others are all the same: rw-, which means that anyone on the server can read from and write to this file. The permissions must be set this way, because the CGI scripts that use this file are run by the user nobody (not root). The user nobody does not belong to the group root, either. So the scripts must abide by the permissions for the "everybody else" category.

Where does that leave you with the DBM file? As I said earlier in this section, the dbmopen() command tries to create the file if it does not already exist. If directory permissions are set to allow writing by the script, this will work. But the permissions are not likely to be set this way by default. Figure 3.3 shows the full directory listing one level up from the listing in figure 3.2. As you can see, in the "everybody else" category for the directory HARLAN, the permissions don't allow writing-which makes sense, because you don't want just anyone on the server to write to the directory.

Figure 3.3 : Here, you see a listing of files one level up from figure 3.2.

What can you do? The solution is to create your data files beforehand and set the permissions on those files to allow the CGI scripts to write to them. I wrote a simple script that does all these things for me whenever I want to use a new DBM file. Listing 3.6 shows the code.

Listing 3.6 A Script to Initialize DBM Files for CGI Use (INITIALIZEDBM.PL)


#!/usr/bin/perl

($filename,$mode)=@ARGV;

$mode="0666" if !defined($mode);

dbmopen (%temp, $filename, oct($mode))

   || die "Couldn't open $filename\n";

dbmclose (%temp);

system "chmod $mode $filename.*";

To make this code do its job, you run this script from the command line, with the name of the DBM file that you want to initialize as an argument. Before you run the postuser script for the first time, for example, you would issue the command initializedbm users from the command line in your CGI directory. This command creates the DBM file (on some systems, it creates two files) and sets the proper file permissions.

This code presents several new Perl concepts. Line 2 assigns the contents of the array @ARGV to $filename and $mode. (Remember that if you enclose a list of scalars in parentheses, they act like an array.) @ARGV is a special Perl variable that contains the command-line arguments for this script. The arguments are everything that follows the script when it is called. The arguments are separated by spaces on the command line, and each argument becomes an element of @ARGV.

Line 3 checks to see whether a mode was specified on the command line; if not, a default mode of 0666 is assigned. This line may seem to be strange at first. It is functionally identical to the following:


if (!defined($mode)) {$mode="0666"}

The syntax in the script is a shortcut. The if portion of the statement is a statement modifier, which does exactly what it sounds like in English ("Set $mode to 0666 if $mode is not yet defined"). When I first saw this syntax, I assumed that the first part of the statement was always executed; then I couldn't figure out what the if was doing. When I got it through my thick skull that the right portion of the line was checked before the action on the left was performed, everything became clear. I hope that this explanation gets you past that confusion.

Line 4 opens the DBM file just as it does in Listing 3.3. The only difference is that because I was not using a literal value for the file mode, I had to convert the number from the octal string that we start with to a decimal value. To do so, I used the Perl function oct(). Line 5 is just a continuation of the dbmopen line, which tells Perl that I want to die if the script can't open the specified DBM file for some reason.

Line 6 closes the DBM file. Finally, you come to line 7, which illustrates one of the nicest features of Perl-and also one of its most dangerous features. The system function tells Perl that I want to execute the text enclosed in quotes as an operating-system command on the server.

This specific line executes the system chmod command on the newly created DBM files to make sure that they are world-readable and -writable. This line is functionally identical to typing the command chmod 0666 users.* at the UNIX command line. Depending on your system setup, this command may not always be necessary. Sometimes, however, Perl won't be able to set the permissions properly in the dbmopen command (or even in its own chmod command), so you have to set them from the system level. Including this command ensures that this script will initialize the DBM files correctly for almost any UNIX system.

What's so powerful and dangerous about the system function? It allows you to automate some repetitive tasks, as well as to perform some functions with your data that you might not be able to perform with Perl alone.

But this power also means that you can easily cripple your system or compromise its security if you're not careful. A simple example of something that you don't want to do is system 'rm *';. This command removes all files in the current directory. You could do the same thing at the command line, but some systems would warn you about what you were doing. The Perl system function bypasses these warnings.

In CGI applications, use of the system function with user-entered data must be closely monitored. If you don't carefully check the data that users are passing to the system, an expert UNIX user can easily compromise the security of your system.

Adding Data to the DBM File

After that extensive digression from the task at hand (the online experiment), consider one other major advantage that DBM files have over text files for data storage: appending data to a record. If you wanted to change one of the records in the GUESTBOOK.TXT file, you would have to scan the entire file to find the correct record; save all the data (except the record to be changed) to memory; and then write the data back to the file, with the new data appended to the appropriate record. This process is much easier with DBM files.

You can change your current application to ask your users to give you a password for their data immediately after they register. First, you need to change the postuser script as shown in Listing 3.7. This change sends the user to the password-entry form after the user has been added to the database.

Listing 3.7 A New Version of postuser (POSTUSER.PL)


#!/usr/bin/perl

require ("process_cgi.pl");



&parse_input(*fields);



dbmopen(%users,"users",0666) || die "Can't open users DBM file\n";

if (!defined($users{$fields{'email'}})) {

   $users{$fields{'email'}}="$fields{'firstname'}::$fields{'lastname'}::$fields{'cont'}::

$fields{'country'}::$fields{'gender'}::$fields{'age'}::$fields{'income'}::

$fields{'employment'}::$fields{'netexp'}::$fields{'netconn'}::$fields{'workuse'}";

   &print_header("http://192.0.0.1/userpassword.html");

}

else {

   &print_header;

   print "Someone has already registered from that e-mail address. Sorry.";

}

The form in figure 3.4 asks the user for his password. I use password-type input boxes in this example, so that the password doesn't appear on-screen as the user types it in the form. For this reason, I ask the user to type it twice, so that I can confirm that he knows what password he typed.

Figure 3.4 : This form requests a new password from the user.

When the user submits this form, the password is checked and then the data is added to the appropriate record of the DBM file. This task is accomplished with the addpassword script, shown in Listing 3.8.

Listing 3.8 Script for Adding a Password to the DBM File Data (ADDPASSWORD.PL)


#!/usr/bin/perl

require("process_cgi.pl");

&parse_input(*fields);

dbmopen (%users, "users", 0666);

&print_header;

if (!defined($users{$fields{'email'}})) {

   print "The email address you entered does not exist in our

     database. Please hit the back button on your browser,

     correct your entry and re-submit the form.";

}

elsif (!($users{$fields{'email'}} =~ /::yes$|::no$/)) {

   print "There is already a password registered for the provided

     email address. Please contact the survey administrator

     if you have forgotten you password.\n";

}

elsif (!($fields{'pass1'} =~ /^[a-zA-Z0-9]{5,10}$/)) {

   print "You entered an illegal password. Please try again.";

}

elsif ($fields{'pass1'} ne $fields {'pass2'}) {

   print "The passwords you typed did not match. Please

     return to the previous screen and try again.";

}

else {

   $users{$fields{'email'}} .= "::$fields{'pass1'}";

   print "Your password has been registered. Thank you.";

}

The addpassword script brings together most of the concepts that this chapter has discussed. First, you see that I'm using the PROCESS_CGI.PL library. I parse the input into the hash %fields, open the DBM file USERS, and then print the HTML header. The next line checks to see whether the e-mail address entered by the user exists in the database; if not, the script prints a brief error page and asks the user to check his work.

If the e-mail address does exist, the script moves on to ensure that a password hasn't already been entered for this user. If this test fails, the script again prints a brief error page.

Next, the script checks the password to see whether it is the right length and contains only alphanumeric characters. This line uses a regular expression, as discussed in Chapter 2 The pattern /^[a-zA-Z0-9]{5,10}$/ translates as follows:

The caret (^) indicates that you want to match the beginning of the string.
The characters in brackets define a range of characters, any one of which should match. Because this range is followed by {5,10}, Perl looks for no fewer than 5 and no more than 10 characters that match the given range.
Finally, the dollar sign ($) indicates that you want to match the end of the string.

What you want to match with this pattern is a string that contains only 5 to 10 alphanumeric characters. Because the entire expression is negated (with the leading exclamation point), the error text that follows prints whenever the pattern is not matched.

Next, the script checks to see whether both copies of the password are identical; if not, it again prints a brief error page, asking the user to check the information that he entered and try again.

Finally, if the address checks out, the password is legal, and the two versions of the password match, the script appends the password to the end of the data already entered for the given e-mail address.

The line that does the appending presents a new piece of Perl; it uses .=, which is Perl's append assignment operator. This little gem shortens the line that might have otherwise been written like this:


$users{$fields{'email'}} = $users{$fields{'email'}} . "::$fields{'pass1'}";

The functionality of this line is slightly more obvious. The dot (.) is the string-append operator. So this line connects the two strings and assigns them to $users{$fields{'email'}}. The line as I wrote it in Listing 3.8 functions identically. This is one more Perl shortcut. If you have, in a scalar variable, a string to which you want to append text to, the append assignment operator performs that task without requiring the repetitious typing of the longer (if equally correct) version of the command.

Now you see how much simpler it can be to work with data in DBM files. The following section takes all the Perl and CGI that you've learned so far and puts it to the test.

Using Complex Forms and Storing Related Data

The data that you processed in the first part of this chapter is intended to be the first part of an online survey of Internet use. You know how to register your users. Now you can take a crack at collecting some data.

First, consider some general assumptions. You are asking users to carefully track their Internet use on a weekly basis. You want to keep each week separate, so that you can chart any changes over time. You won't try to be comprehensive in your survey questions, but you'll try to hit some key areas.

Where do you start? Begin with the form that users will fill out when they enter their weekly data (see fig. 3.5). The form itself seems to be fairly simple, but it presents some interesting programming problems.

Figure 3.5 : This form is the main data-entry point for the survey.

Processing the Form Data and Checking the Password

The first order of business is to grab the data from the form and put it in your associative array. You should be able to do that just as you did before, right? Well, almost. This form contains an input method that you haven't seen before. Notice that the uses list shown in figure 3.5 has two items selected. The list has only one name, so if you leave PROCESS_CGI.PL as it is, each time a new uses entry is processed into the array, the old one is erased. So you need to change the parse_input routine in PROCESS_CGI.PL, as shown in Listing 3.9.

Listing 3.9 A New Parse_Input Routine for PROCESS_CGI.PL


sub parse_input {

   if (defined(@_)) {

     local(*input)=@_;

   }

   else {

     local(*input)="*cgiinput";

   }

   local ($temp,@pairs);

   if (&form_method eq 'POST') {

     read(STDIN,$temp,$ENV{'CONTENT_LENGTH'});

   }

   else {

     $temp=$ENV{'QUERY_STRING'};

   }

   @pairs=split(/&/,$temp);

   foreach $item(@pairs) {

     ($key,$content)=split (/=/,$item,2);

     $content=~tr/+/ /;

     $content=~ s/%(..)/pack("c",hex($1))/ge;

     if (!defined($input{$key})) {

        $input{$key}=$content;

     }

     else {

        $input{$key} .= "\0$content";

     }

   }

return 1;

}

The new code in this subroutine is the if/else construct at the end of the last foreach loop. This code looks to see whether the key in the key/value pair that is being processed already exists in the %input hash; if not, the key is associated with the current value, just as in the first incarnation of this routine. If the key does exist, you don't want to erase the current value, but we want to save this one. You might tack the new value to the end of the old value (which actually may be old values), separating the two with the null character (\0). You use the null character because it never will exist in the user data; other characters might.

After getting the data, you want to make sure that the user exists in the database, that the user has a password, and that the password provided matches the one you have on file. If all these conditions are met, you want to add the data to your database. The code in Listing 3.10 accomplishes these tasks.

Listing 3.10 Script to Post Periodic Data to the Database (POSTPERIODDATA.PL)


#!/usr/bin/perl

require "process_cgi.pl";

&parse_input(*fields);

&print_header;

dbmopen (%users,"users",0666);

if (!defined($users{$fields{'email'}})) {

   print "The email address you entered does not exist in our

     database. Please hit the back button on your browser,

     correct your entry and re-submit the form.";

}

else {

   $temp=$users{$fields{'email'}};

   dbmclose(%users);

   $temp=~/([a-zA-Z0-9]{5,10})$/;

   $actualpass=$1;

   if ($actualpass eq '') {

     print "There is no password entered for this e-mail

        address. Please enter one

        <a href=../../../userpassword.html>here</a>        before you enter your data.";

   }

   elsif ($fields{'pass'} ne $actualpass) {

     print "The password you entered is incorrect. Please

        return to the previous screen and try again.";

   }

   else {

     &post_data;

   }

}

The first if should look familiar; it's the same code that you used in listings 3.7 and 3.8 to see whether the user exists in the database. If the user does exist, the script ends up in the else portion of the construct. Here, you want to get the password from the user data, and you do so with another regular expression. Because you set up the data so that the password (if it exists) is always be the last piece of data, all you need to do is check for 5 to 10 alphanumeric characters at the end of the string. Because this pattern is enclosed in parentheses, Perl saves any match in the special variable $1. So assign $1 to $actualpass and then test to make sure that $actualpass is not null.

This test tells you whether the user has entered a password, because if the user hasn't entered a password, the data string ends with yes or no. Because neither of these elements matches your password pattern, $1-thus, $actualpass-will be blank if no password exists in the USERS DBM file for this e-mail address.

Last, the script checks to see whether the password entered with the data actually matches the one that you have on file; if it does, you can finally enter the data in your database.

Working Around the Limitations of DBM Files

As powerful as DBM files are, in many implementations of Perl they suffer from one severe limitation: you can make only one call to dbmopen() per program. For the application that you're working on now, this limitation causes a problem, because you prefer to store the survey data in a DBM file.

TIP

Your implementation of Perl may allow multiple dbmopens per program. The limitation depends on the version of the DBM libraries that your system uses. If your system has the NDBM libraries (instead of GDBM), you're in luck. If your Perl interpreter was compiled with those libraries, you can have multiple dbmopen commands; otherwise, you're stuck.

To figure out what version of DBM your system uses, you'll have to find the computer's libraries and look for a file such as LIBNDBM.A or LIBGDBM.A.

Fortunately, a way around this limitation exists: you can use the system command to tell a separate Perl program to deal with this data. First, save the data from the form in a temporary text file, as shown in Listing 3.11.

Listing 3.11 The postdata Subroutine from POSTPERIODDATA.PL


sub post_data {

   $filename=$$.time.".txt";

   open (f, ">scratch/$filename") || die "couldn't open file $!";

   print f "email:$fields{'email'}\n";

   print f "period:$fields{'period'}\n";

   print f "hours:$fields{'hours'}\n";

   print f "webhours:$fields{'webhours'}\n";

   print f "phonehours:$fields{'phonehours'}\n";

   print f "send:$fields{'send'}\n";

   print f "receive:$fields{'receive'}\n";

   print f "uses";

   foreach $use(split(/\0/,$fields{'uses'})) {

     print f ":$use";

   }

   print f "\n";

   close f;

   system "postdatasup scratch/$filename >> /dev/null";

   system "rm scratch/$filename";

   print "Thanks for sending in your data.";



}

The first line of this subroutine creates a string that will serve as the file name. This string looks odd but serves an important purpose. $$ is a special Perl variable that stands for the process ID-a number that identifies this script when it's running. Every process that is running has a process ID, and process IDs are unique at any given moment. After a process ends, the IDs can be reused. This last fact means that to ensure the uniqueness of the temporary file that you're creating, you need to put a time stamp on it. To do so, use the time function, which returns the number of seconds since January 1, 1970.

With a unique file name ensured, you next need to open the file. Line 3 of Listing 3.11 performs this task, opening a new file with the name designated in the directory SCRATCH. I created this directory to be world-writable for just such a purpose: to write temporary files for immediate use. Having a SCRATCH directory around can be quite handy.

With the file open, the script prints the data to the file, one item per line. Because the uses item can have multiple values, it takes a little more processing, as shown in lines 11 and 12. This loop iterates over the array that results from the split(/\0/,$fields{'uses'}) command, printing each item from the null-character-separated list.

When all the data is printed to the temporary file, the file is closed, and the system function calls the program that will place the data in the DBM file. When that task is finished, the script deletes the temporary file and prints the "thank you" line back to the browser.

The final piece of this puzzle is the postdatasup script, shown in Listing 3.12. The script reads the appropriate file name from the command line. Notice that in Listing 3.11, I called this script (using the system function) with the temporary file name immediately following it. postdatasup opens that file and reads the data into an array.

Listing 3.12 The Support Script for Periodic Data Entry


#!/usr/bin/perl

open (f, $ARGV[0]) || die "Couldn't open file $ARGV[0]";

$i=0;

while (<f>) {

   chop;

   $file[$i]=$_;

   $i++;

}

close f;

($trash,$email)=split(/:/,$file[0]);

($trash,$period)=split(/:/,$file[1]);

($trash,$hours)=split(/:/,$file[2]);

($trash,$webhours)=split(/:/,$file[3]);

($trash,$phonehours)=split(/:/,$file[4]);

($trash,$send)=split(/:/,$file[5]);

($trash,$receive)=split(/:/,$file[6]);

($trash,@uses)=split(/:/,$file[7]);

$dbfile="period".$period;

dbmopen (%data, $dbfile, 0666);

$data{$email}=join ('::',$hours,$webhours,$phonehours,$send,$receive);

$data{$email}.= "::".join (',', @uses);

Notice that line 5 issues the command chop-a standard Perl function that chops one character off the end of a string. If it is called without any arguments, chop works on the Perl special variable $_. Recall that in a loop such as this that reads in a file, $_ is set to equal each successive line in the file.

Lines 10 through 17 put the appropriate data in named variables. Notice that when the script processes the uses data, the data is put into an array rather than a scalar variable, because uses can have multiple entries.

Finally, the script opens the appropriate DBM file. You can see from the code that the files are named PERIOD1, PERIOD2, and so on. You have to initialize these files, using the initializedbm script (shown in "Initializing a DBM File" earlier in this chapter), before they can be used. The script then associates the e-mail address with the rest of the data as a delimited string.

From Here...

With the examples in this chapter, you should now have a solid knowledge of form processing and data storage in DBM files. You should be able to take any form and know what you need to do to save submitted data. You know how to write to text files and to DBM files. You also know how to append data to DBM records and how to check the data in those records.

You may want to branch out to the following areas:

Chapter 4 "Advanced Page Output." This chapter concentrates on processing existing data and page output.
Chapter 11, "Database Interaction." If you're interested in high-end data storage, head to this chapter for information about interacting with SQL databases.
Chapter 16, "Subroutine Definition." This chapter provides extensive information on creating subroutines, libraries and modules.