by Paul Doyle
Providing a secure but open Web service is a balancing act. You want to make your site as easily accessible as possible so that the maximum number of people can use it, but you also want to make sure that access is not so open that your service can be harmed, accidentally or deliberately, due to a lack of security.
Achieving the correct balance between security and openness is easier if you simply eliminate all write access on the server. Often, however, the nature of the service that you are providing dictates that users have write access to files on the server.
This chapter deals with one approach to providing a secure Web service that permits writing of data on the server. The chapter examines the security issues in detail and shows how you can accept data from users and store it on the server while protecting the integrity of your service. To illustrate the techniques, a sample Web-based ordering system using shopping-cart logic is developed in the course of the chapter.
Previous chapters (particularly Chapter 8 "Understanding Basic User Authentication") dealt with the general issues involved in verifying the identity of users who access your Web server and in restricting access to files on the server by using the HTTP daemon's configuration mechanism. Before you get started using CGI wrapper scripts, take a closer look at the specific security issues that arise when you decide to accept input from users.
The transparency of the Web's infrastructure is one thing that has contributed to the enormous growth in its popularity in recent years. The user clicks a button in his browser, and a server somewhere on the planet sends him whatever his little heart desires.
The Browser As Interface Users feel as though they are interacting directly with the material that they read on the Internet, with only their browsers between them and the words or images that they see. A browser picks up files for a user and displays images. From time to time, the user may need to enter a user ID and a password to access a special service. To most users, their browsers appear to be logging them on to the server. If a user notices that a program on the server is being executed, the impression is that the browser is executing the program for the user.
Beneath the Surface In fact, looking just below the surface of the action in a typical point-and-click operation, you can see that what actually happens is considerably more complex. The sequence of events can be summarized as follows:
This schema could be broken down into much finer levels of detail, but for now, the points of interest are:
In short, all the action takes place on the server, and all of
it is done by the Web server (httpd) process. The user sends a
request to the server and receives the result from the server;
everything that happens in between involves activity carried out
by the server process on behalf of the user.
NOTE |
Browser extensions such as Java and SafeTcl are exceptions to this rule. They are executed by the browser after the browser downloads script files from a server. |
The Server As Interface This execution by the server on behalf of the user presents a serious security issue. If the user were logging in to the server interactively and running a program in a shell that was governed by the user's own private account privileges, matters would be simpler. It would be relatively easy to ensure that the user did not have privileges that endangered the integrity of the rest of the system. Many system administrators would protest that it is far from easy to restrict privileges in this way on even a moderately large system. The difficulty, however, certainly pales in comparison with the effort involved in maintaining the integrity of data in a directory to which everyone on the Internet has some form of write access.
The fact is, users do not log in to the Web server. The Web server responds to requests from anyone on the Internet, reading files on the server on the user's behalf or even executing programs on the server on the user's behalf. The same httpd process generally executes programs for all users who access the server, so the system that works on interactive systems-containing the activities of the user at the operating-system level based on the user's process ID-will not work on Web servers.
Instead, the httpd process uses its own verification mechanisms to ensure that users are who they say they are, and it interprets its access restriction rules to determine who can do what (and where). From the point of view of the operating system, everyone who accesses a service shares the same process ID and has the same privileges to all files. The following sections explain why sharing a process is a problem.
All Web servers make files available for reading. If your server does nothing else, you can concentrate on making sure that the http daemon has no write access anywhere on your server, with the exception of its log files.
But even then, you need to take care. Some files may be intended for the eyes of a particular group of people only. In that case, creating Web server user groups and carefully planning authentication domains (containers for the files) can protect the data from unwelcome attention.
It is also important to avoid exporting files that should not be visible over the network. A classic example is exporting the server's password file so that anyone on the Internet can have a crack at it. Restricting the exported area on the Web server to a particular, specially designated directory tree can help you avoid security holes such as this, but you also must prevent users from placing links or aliases to sensitive files in the exported area. Chapter 8 "Understanding Basic User Authentication," explains how to secure against this kind of security breach.
The primary area of concern in this chapter is allowing users to write files, not just read them. When you decide that it is necessary to allow the httpd server to write to files on behalf of remote users, you must take great care to limit the circumstances under which the httpd writes data to disk.
You may need to allow the server to write files on behalf of users for several reasons:
This section takes a closer look at providing temporary storage. Suppose that you, the CGI programmer, want to implement a system that involves three screens. Screen A presents a form that contains the usual elements: fill-in boxes, drop-down menus, and such. When form A is submitted, assuming that no essential data is missing, screen B appears. This screen is also a form, but it picks up additional information from the user. When form B is submitted, screen C appears-another form, this time summarizing the user's entries in forms A and B.
When I fill in form A and click the submit button, the ACTION parameter of the FORM statement in form A contains the name of a CGI program that generates form B. This CGI program-call it MakeB.c-receives the values that I entered in form A through the process environment, as described in earlier chapters. The program then generates form B with a FORM statement that has an ACTION parameter specifying that another program-call it MakeC.cgi-should be invoked when form B is submitted. So I fill in form B and click the submit button. MakeC.cgi receives the values that I entered in form B and generates form C appropriately.
Just one thing is missing in form C: the data that I entered in form A! This data was held in two places: in the form where I entered it and in MakeB.cgi, which received it. That's as far as the data goes unless you take explicit steps to pass it along.
So you need to make sure that all values that I enter or select in both forms A and B arrive safely in MakeC.cgi.
Overloading the ACTION Parameter One way is to make sure that the data gets passed along is to add the cgi variable information to the ACTION parameter of the FORM statement in form B. The value of ACTION is a hyperlink, so it can take CGI parameters in the usual way. This example invokes MakeC.cgi, with the variables Name, ColorChoice, and Horsepower equal to Kurt, Blue, and 0, respectively, when the user submits the form:
<FORM ACTION="MakeC.cgi?Name=Kurt&ColorChoice=Blue&Horsepower=0">
Notice that a statement such as this must be generated by a CGI program. The values Kurt, Blue, and 0 are decided only when the user fills in form B, so these values cannot be hard-coded into a HTML file on disk.
This method ensures that the designated values get to MakeC.cgi. Unfortunately, the contents of form B will be lost, replaced by these manually imposed values. This method may be useful in limited circumstances, but beware-it can get out of hand quickly. Consider a case in which a group of HTML forms are used to build up a set of data over a series of transactions, with the user being allowed to go back and forth between forms at will. Consider the number of variables that you need to put on the ACTION parameter's hyperlink, and consider the impossibility of keeping everything straight. Then read on.
Saving It to Disk Another approach is to store the values of all such variables to a file as the filled-in forms are received. This type of method is easier to manage when you have multiple pages and when flow between pages is not strictly linear-that is, in virtually all Web services. From the point of view of the Web programmer, writing pages and scripts to use such a system is not terribly arduous; the main requirement is that you store to disk all data that you may want to see again later.
Although saving data to disk is the preferred option for all but the most trivial cases, it has some drawbacks:
These disadvantages are easily outweighed by the flexibility of a solid intermediate storage system, especially when you consider that most of the work has already been done. The system described in the rest of this chapter has all the essentials that you need to get up and running quickly. Just finish this chapter, copy some files from the CD-ROM that comes with this book, and prepare to amaze and astound your friends!
The type of write-to-file system described in the preceding sections is a session-based system. A solid understanding of the way in which such a system works is essential before you can start writing code, so this section examines the building blocks of a session-based Web service.
A URL is the basic unit of Web access. You want to allow users to access several such locations-CGI scripts and HTML pages-on your server in a linked sequence, in such a way that you can track users' actions and any data that they enter. Therefore, you need to identify the user when she makes contact initially, and if she follows a link within our service, you want to regard the new access as being a continuation of the initial one. That means identifying her when she requests the second and subsequent locations, and making the logical connection between these accesses and the initial one. This logical sequence of connected accesses is what I refer to as a session.
This section is concerned with tracking the user's access over a sequence of steps within the service, not with gathering historical data over an extended period. If the user attaches to your service and follows links between pages for a few minutes one day, and then does something similar the next day, those accesses count as two sessions, not as two parts of one session.
The End The preceding section's definition of a session contains a loose end: when does a session finish?
In some cases, you may want to provide the user an explicit menu option for logging off and terminating the session. This approach makes sense in the case of password-protected services, in which a dangling open connection may represent a security risk.
In other cases, in which services are open to all users, an explicit disconnect or logout button may not be necessary. You simply follow the user's actions until she stops using the service, at which time any data stored on a temporary basis is deleted by a housekeeping process of some sort.
This open-ended approach can be messy. How do you know whether your user has really left your service and is not just reading what you displayed on her screen or handling some other business, with the intention of resuming the session later?
The answer lies in a time-out mechanism. You decide on a reasonable upper limit to the length of a pause between accesses to your service, and you regard as abandoned any sessions that pause for longer than that duration. A separate housekeeping process-a Perl script executed as a cron job at regular intervals, for example-deletes session files that have not been modified within the designated time.
A time-out system is also useful in systems that use an explicit logout option. If the connection goes down or the user forgets to log off, the session was not terminated properly and remains active until you kill it. You can use a time-out mechanism to put these suspended sessions to sleep.
The Session Identifier Tracking each user separately from one CGI script or HTML page to the next is essential. You may have dozens of users accessing your page simultaneously, and you don't want one person's data becoming confused with that of another.
The key to keeping track of users is the session identifier-a unique number or string that your service assigns to a user when she first connects to the service. This identifier is automatically included in all subsequent requests made by the user during that session, allowing the service to determine which session file to use for the user when she reconnects.
You can pass the key between the server and the client in several ways. If you can guarantee that all clients who access your server are capable of supporting cookies, you can set a cookie to the value of the key, as described in Chapter 7 "Dynamic and Interactive HTML Content in Perl and CGI." The simplest approach-and the one that you'll use for your sample application-is to store the key in a HTML form as a hidden value.
The following HTML statement, for example, results in a CGI variable called sessionkey, with a value of clef:
<input type="hidden" name="sessionkey" value="clef">
The value is not displayed on the browser in any way, but when the user submits the form, sessionkey and clef are included in the list of CGI variables and values that the CGI script on the server receives.
To summarize, a session is a set of connected accesses of a service by one user. A session is terminated when the user explicitly sends a termination request or when a designated time-out period elapses. The service-your Perl program-tracks the user throughout the session by checking for the user's unique session identifier on each access request.
So far, so good. The user has a session ID, which she provides with each new request for a location within your service. Using this ID, she hops from one page, form, or CGI program to another, and you keep track of all her data for her.
This process should be managed by a single CGI program rather than by a series of interconnected scripts, for a few good reasons:
In short, a single program is easier to manage, and it makes the
HTML and associated hyperlinks easier to develop, too. This single
script is called a CGI wrapper, and it's how you'll write
your sample application.
CGIWrap |
A public-domain utility called CGIWrap (included on the CD-ROM that comes with this book) uses a CGI wrapper script for a different purpose. The problem that CGIWrap seeks to address originates in the fact that HTTP daemons (httpds) execute CGI programs on behalf of the end user. The httpd process runs on the server under a user ID that has privileges that are not available to the ordinary user of the server, such as write access to database files. Accordingly, a user on your Web server can write a CGI program to perform tasks that the end user cannot carry out. Examples include printing configuration or password files that you prefer to keep confidential and overwriting data. It would be relatively easy for one user to attack another by overwriting the data in the other user's CGI directory, for example, but damage of this kind can occur accidentally, too. The best solution to this kind of risk is to have each CGI program execute by using the user ID of the owner of the script, rather than using the user ID of the httpd process. The httpd process runs as root (on a UNIX machine) and gets the httpd to run each CGI program under a separate process, with the user ID of the script owner. CGIWrap, written by Nathan Neulinger, is a utility program that farms out CGI executions to a separate process in this way. The program also performs some other basic security checks on the CGI script before deciding whether it should allow the script to execute. Some HTTP daemons now have this type of functionality built in. If your HTTP daemon does not provide this feature, you may want to consider installing CGIWrap to enhance the security of your Web server. |
The list in "The Wrapper" earlier in this chapter discussed substitutions in HTML files. This process is best explained by means of a simple example. Suppose that you want to greet your user by using her first name, which she has already entered in a form. The relevant line of the HTML would look something like this:
<h3>Welcome, Jean!</h3>
Assuming that you stored the user's name in the Perl variable $firstname, you can produce this HTML by using a Perl statement such as the following:
print "<h3>Welcome, $firstname!</h3>";
This statement could go in a special CGI script that prints out the welcome page, or it could appear in a special subroutine in your wrapper script. You don't want to adopt that approach, though-you would find yourself writing special scripts or subroutines for every bit of HTML that is not completely static.
A much more elegant solution would be to have your variable name embedded in the HTML file and to have the variable replaced automatically just before the page is sent to the user. You can't use that method, of course; a HTML file is not a Perl program, so there's no point in sticking Perl variable names in there. The principle is sound, however, and you can achieve the same result by using a slightly different mechanism.
Instead of embedding Perl variable names in the HTML file, you can embed special place-holders that your script will translate for you as it sends the page. The placeholder needs to be identifiable as such to the wrapper script; you need to make sure that the wrapper replaces all placeholders without altering any of the HTML. You can identify your placeholders in several ways-by inventing a new HTML tag, for example. (That method is risky, though, because you never know what tag names will appear in the next version of the HTML standard.)
The method in this section uses simple syntax. Placeholders in your HTML files start and end with a backslash. The part between the placeholders indicates the name of the variable whose value is to go in the placeholder's position. The welcome line, for example, would appear in the HTML file as follows:
<h3>Welcome, \personalname\!</h3>
Your wrapper script will spot the backslashes, extract the personalname token, and look it up in a table in memory. Because you're using Perl, that table is implemented as an associative array. (The section called "Parsing an HTML File" later in this chapter explains exactly how that implementation is achieved.) The wrapper script then spits out the original line, minus the backslashes and the token name, which it replaces with the value in the associative array for that token.
You probably will want to develop your own simple syntax for your
application. The syntax used in this chapter is deliberately simple,
so as to keep the sample code easy to follow.
CAUTION |
If you want to do anything more complicated than simply replace values, you probably should design proper syntax for your embedded commands before you start, because adding features as you go along will almost certainly result in obscure, confusing syntax. You may even want to add looping and other flow-control capabilities. If your needs really extend to features of that sort, you may want to consider server-side includes or Java for an out-of-the-box solution. |
The essential components of your managed system are:.
Before you start to develop this application, you need to know how you're going to manage program flow.
This type of system is state-based in the sense that the current state of the system-the aggregate of the values of all the system's variables-dictates the next action taken by the system. Looking at the system from the server side, a set of variables and values are provided by the browser, and the CGI script decides what to do based on these values. From the point of view of the browser, the CGI program is directed by a sort of remote-control mechanism, by which the browser sets CGI values to control the action on the server.
Pointing the Way The most direct way to tell the wrapper script which location to display next is to state it in the CGI values. You can accomplish this task quite easily by inserting into the outgoing form a hidden value that contains the URL of the next location, such as this:
<input type="hidden" name="location" value="wrap.cgi">
Then the wrapper script can check the CGI values for a location setting when it tries to decide what to do next.
Directing the Action Although a location setting is adequate in many cases, you may not always know which location will come next. You may need the server to do some processing of the session-state values before deciding which location to return next.
In some cases, the CGI program can determine the next action by examining particular CGI values. If there is no session-key value, for example, the only valid action is to force the user to log on. In most cases, however, the number of values to be checked and the possible combinations of values will get out of control quickly. Statements of the form "if (A=B and X=Y) but not ((C=B or Y=Z) and A=D)" will start to appear.
A neat, direct way to implement this type of remote control is to have a special CGI value-call it action-that specifies the next action to be taken. This value is not always required but is very useful in most cases.
Suppose that your welcome screen is to be followed by a product menu. The following line, placed inside the form on the welcome page, will tell the wrapper that the next action to be taken should be product_menu:
<input type="hidden" name="action" action="product_menu">
Notice that this mechanism merely indicates a state to the wrapper script; it does not dictate which Perl function should be invoked in the event that a particular state arises.
Walking Through the Wrapper Figure 9.1 illustrates the chain of events that take place during a typical session.
Figure 9.1 : The typical program flow through the wrapper.
The following list provides a detailed explanation:
Notice the overall level of program flow. The browser sends CGI values to the wrapper script; the wrapper script process the values, updates stored values, and returns HTML to the browser. The process starts again when the user submits the form or follows a hyperlink that leads back into the system.
The example application in this chapter is a shopping-cart-style ordering system for Camels 'R Us, which sells three types of products: food, vacations, and accessories. You will develop an interface that allows authorized users to browse product menus; build up a list of purchases; review the order; and, finally, submit the order. At that point, your program simply writes the order details to a file. In real life, the order could be passed on to an ordering database.
Start creating the application by outlining the sequence of events that take place from the user's point of view. This outline is not the same as the outline of the wrapper-script internals in the walk-through section earlier in this chapter, but a description of the functionality required of your application. The sequence of events from the user's point of view is as follows:
Figure 9.4 : The Vacations menu.
Figure 9.6 : The order-review screen.
The main menu acts as a sort of anchor for the application, offering three menu options and three submit buttons. A more hierarchical system may be appropriate for a larger system, but this level of complexity is fine for a simple example such as this one.
Next, you need to consider how data is to be stored by the application, both internally (in Perl data structures) and externally (in session files and raw data files). There are two principal data elements. The first is the set of data representing the available products and prices; the second is the order list built up by the user in the course of a session.
Product Data The price list consists of a list of product names and corresponding prices. You're writing this application in Perl, so the obvious candidate for storing this data is an associative array, with the product names being keys and the prices being values. An expression such as $Price{"Camelskin"} returns the unit price of the named product.
The data can be stored externally in several ways. On a UNIX system, a simple products-to-prices lookup list would be most efficiently stored in a DBM file, using a tied hash. You'll use this technique to store the session information. In the case of the product data, however, you will have two arrays indexed by product name: an array of prices and an array of product descriptions. The best approach for an application this simple is to store the data items in a flat text file, one product per line. More complex programs could interface with a relational database of product information, if necessary.
Each record contains three items of information about a single product: the product name, the product price, and a brief description. In this application, use the separator ::: between items so that Perl can easily split the lines as it reads them.
Listing 9.1 shows the sample data file, which is stored in the file products.dat on the CD-ROM that comes with this book.
Listing 9.1 The Contents of products.dat, the Data File
feed_desertfruit:::14.95:::Fruit of the Desert feed_driedhusks:::9.95:::Sun Dried Husks feed_dromspecial:::19.95:::Special Dromedary Supplement travel_kalahari:::1250:::A 1 year round trip of the Kalahari with the Tuareg. travel_dakota:::1990:::Discover the magic of the Occident with our 2 week åwhirlwind tour of Dakota. travel_alranch:::250:::Break in gently with a weekend at your local branch of åAl's Camel Ranch Inc. extras_pancam:::69:::Hand-stitched leather panniers (camel) extras_pandrom:::99:::Hand-stitched leather panniers (dromedary) extras_covers:::150:::Genuine camel-skin hump covers
The first line in this file describes a product called feed_desertfruit, which is described as "Fruit of the Desert" and which has a unit price of $14.95. The product name is used as the key in the price and description associative arrays-%Price and %Desc, respectively. So $Price{'feed_desertfruit'} is "14.95", and $Desc{'feed_desertfruit'} is "Fruit of the Desert".
The three product categories are denoted by the feed_, travel_, and extras_ prefixes. These prefixes make maintaining the product data file easier; they have no significance for the program.
Order Data Orders are built up over the course of a session, with the partial order being saved in a session file until it is complete. You'll save the order as part of the session data, in the form of an associative array. The keys of order elements in the session data array are the product names, and the values are the numbers of the items ordered by the user.
Internally, the wrapper program stores all session values in an associative array called %state. If the user orders three extras_covers items, $state{'Order_extras_covers'} is set to 3. The Order_ prefix here is used when you're scanning the session file to pick up ordered items.
One advantage of storing this data internally in the form of an associative array is that it makes the job of storing the data externally very simple indeed. You'll use Perl's tied-hash functionality to create a link between the internal storage (associative array) and the external storage (DBM file), and leave it to Perl's innards to keep the two in sync. Session files are stored in the LOG subdirectory and called $Sessionid.DB. ($Sessionid is the user's session ID.)
Now you're ready to implement your wrapper program. In this section, you develop a working wrapper system in Perl from scratch. This application is not highly sophisticated, but it is fully functional, and it is intended primarily to be an illustration of the techniques involved. The source code for the program is explained along the way; all source code for this basic wrapper program appears on the CD-ROM that comes with this book.
The main routine is the best place to start. This code dictates the overall program flow; it is also the only routine that is guaranteed to be executed every time. Listing 9.2 shows the code for the main routine.
Listing 9.2 The Wrapper Script's Main Routine
#!/usr/local/bin/perl -TI. # Import methods for DBM files: require SDBM_File; require Fcntl; # Global variables: %Price, %Desc, %cgivals; # Read any form values passed in. &GetCGIVals(%cgivals); # Extract the settings which dictate program flow: $SessId = $cgivals{'sessionid'}; $Loc = $cgivals{'location'}; $Action = $cgivals{'action'}; # Read the product details: &ReadProductData("products.dat"); # Now decide what to do. First check if session id supplied: if ( $SessId ) { # Session Id supplied: perform action and/or show location. $Action && &DoAction($SessId, $Action); $Loc && ShowLoc($SessId, $Loc); } else { # No session id supplied: login is only valid action. # Error check: session id required if location or action requested. ( $Loc || $Action ) && HTMLError("Location/action requested but no session ID provided.<br>", "Please <a href=\"./wrap.cgi\">log in</a> ", "and follow the instructions on screen."); # Default action: log in. &DoLogin; }
The main routine performs these tasks:
The subroutines called by the main routine are described in the
following sections, in the sequence in which they appear.
Error Messages and HTML Headers |
The &HTMLError function, which appears from time to time in the following code, is a utility function that displays an error message on the user's browser in HTML format. You could simply write error messages to STDOUT, knowing that the messages will get to the browser. If an error message is sent before the browser receives a Content-type: HTML header line, however, the browser reports a server error, and the user does not get to see your error message. For this reason, send a Content-type: text/html line first. Another problem arises if you send the HTML header twice; the user sees the second header line on-screen along with the error message. This arrangement is a little untidy, so use a second utility function- &HTMLhead-to write the header for you, as follows: # A utility routine to print the HTML header once only. The variable $header_printed is 0, or false, initially. The first time that you call this function, the HTML header is printed, and $header_printed is set to true; thereafter, the if statement is false and the header is not printed. Following is the &HTMLError function, which uses &HTMLhead: |
The CGI values returned by the browser represent the sum of the wrapper's knowledge about the user when the program starts. Other information about the user and his or her previous actions is contained in the user's session file, but that information cannot be accessed without the session ID-a CGI value. Your first priority, then, must be to interpret the CGI information and save the values of all CGI variables.
The &GetCGIVals subroutine queries the httpd's environment values and saves the CGI values in the global %cgivals associative array. These values can arrive in two forms, depending on whether the form data was transmitted by means of the GET or POST method:
The &GetCGIVals routine checks for both the GET and POST methods and saves the CGI information in either case. Listing 9.3 shows the source code for &GetCGIVals.
Listing 9.3 The GetCGIVals Subroutine
# Get the CGI Values sub GetCGIVals { my (@settings, $set, $name, $value, $formvalues, $postlength); # First decide if GET or POST used: $postlength = $ENV{'CONTENT_LENGTH'}; if ( $postlength ) { read (STDIN, $formvalues, $postlength); } else { $formvalues = $ENV{'QUERY_STRING'}; } # Store settings in an associative array: # First split into "A=B" parts: @settings = split('&', $formvalues); # Now store each name and value in the associative array: foreach $set ( @settings ) { ($name, $value) = split('=', $set); $cgivals{$name} = $value; } }
The code shown in Listing 9.3 carries out the following steps:
This last section may seem to be unnecessarily complicated. Why not split $formvalues into %cgivals in one step by using a statement like the following, which would replace the entire foreach loop in &GetCGIVals?:
%cgivals = map( split('='), split('&', $formvalues) );
The problem is that there may be "empty" CGI values, which would disrupt the mapping shown in the single statement. Suppose that a session ID was missing, for example. The $formvalues string might look like this:
sessionid=99353&action=Validate&userid=&pass=www
In this case, the first split function would break $formvalues into these substrings:
sessionid=99353 action=Validate userid= pass=www
The second split operation carried out within the map operation would break these substrings into the following list of substrings: sessionid, 99353, action, Validate, userid, pass, and www.
Finally, the assignment of %cgivals would result in the following key/value pairs being stuffed into %cgivals (www would be empty):
sessionid=99353 action=Validate userid=pass
Breaking the operation into two steps is marginally more complicated, but much safer.
Having read the CGI values, you next read in the product data. This data is stored in the %Price and %Desc associative arrays by the &ReadProductData subroutine, which takes a single argument: the name of the product data file. Listing 9.4 shows the code for &ReadProductData.
Listing 9.4 The ReadProductData Routine
# Read in the product data: sub ReadProductData { my ($infile) = @_; my $product, $price, $desc; # Check parameters: $infile || HTMLError("ReadProductData requires data file name."); # Open the data file: open (PRODUCTS, $infile) || HTMLError("Unable to open product data file $infile (!$)."); # Read each line: while (<PRODUCTS>) { $line = $_; # drop trailing newlines: chop($line); if ( $line =÷/:::/) { # Ignore lines without separator # Split on ":::" separators: ($product, $price, $desc) = split(':::', $line); # Store price and description using product name as key: $Price{$product} = $price; $Desc{$product} = $desc; } } # tidy up: close PRODUCTS; }
If the named product file exists and is opened successfully, it is read in one line at a time, and the following processing occurs for each line:
The next subroutine that the main routine may call is &DoAction-a function that encapsulates all specific processing functions other than parsing and displaying a HTML file. &DoAction consists primarily of a list of if clauses, as you can see from the source code in Listing 9.5.
Listing 9.5 The DoAction Subroutine
# Subroutine to perform a named action for a given session Id. # Branches to required subroutine. sub DoAction { my ($SessId, $Action ) = @_; # Argument check: $Action || &HTMLError("DoAction called but no Action specified!"); # Now a branch for each possible action - ( $Action eq "Validate" ) && &Validate($cgivals{'userid'}, $cgivals{'pass'}); ( $Action eq "Add+to+Order" ) && &AddToOrder($SessId, %cgivals); ( $Action eq "Cancel+Order" ) && &ShowLoc($SessId, "mainmenu.htmw"); ( $Action eq "Return+to+Main+Menu" ) && &ShowLoc($SessId, "mainmenu.htmw"); ( $Action eq "Review+Order" ) && &ReviewOrder($SessId); ( $Action eq "Confirm+Order" ) && &ConfirmOrder($SessId); ( $Action eq "Log+Out" ) && &DoLogout($SessId); }
The $DoAction subroutine takes two arguments: the user's session ID and the name of the action to be taken. After a quick check for valid arguments, the subroutine checks the action name against a list of possible actions and, if it finds a match, calls the appropriate subroutine. The available subroutines are described in their own context later in this chapter.
If the main routine finds that a location was specified with the $location variable, it invokes the &ShowLoc subroutine to show the contents of that file on the browser. Any tokens found in the file (denoted by means of the syntax described in "Generic Substitutions" earlier in this chapter) are filled in by means of the contents of the %State, %Price, and %Desc arrays.
This function is, in many ways, the core of the wrapper script. Listing 9.6 shows the code.
Listing 9.6 The ShowLoc Subroutine
# Show a HTML file, filling in values using the supplied session ID sub ShowLoc { my ($ID, $URL) = @_; my %SessionValues, @matches; # Open the requested file for reading: open(RETURNFILE, $URL) || &HTMLError("Unable to open file \"", $URL, "\" for reading."); # Send HTML header: &HTMLhead; # Load all session values for this ID: %SessionValues = &GetSessValues($ID); # Process each line of requested file: while(<RETURNFILE>) { # Store this line ($_ will be overwritten): $currentline = $_; # Check for prices, e.g. "\\Price\itemname\": if ( @matches = /\\\\Price\\(\w+)\\/g ) { # Interpolate each match on this line: foreach $match ( @matches ) { $currentline =~ s/\\\\Price\\$match\\/$Price{$match}/; } } # Check for descriptions, e.g. "\\Desc\itemname\": if ( @matches = /\\\\Desc\\(\w+)\\/g ) { # Interpolate each match on this line: foreach $match ( @matches ) { $currentline =~ s/\\\\Desc\\$match\\/$Desc{$match}/; } } # Check for tokens, e.g. "\tokenname\" => tokenvalue: if ( @matches = /\\(\w+)\\/g ) { # Interpolate each match on this line: foreach $match ( @matches ) { $currentline =~ s/\\$match\\/$SessionValues{$match}/; } } # Now print the line, including any substitutions: print $currentline; } # Tidy up: close RETURNFILE; }
The code is simpler than it looks. Step through the code to see how it works:
The fourth step is actually slightly more complicated. Each line is checked for substitution tokens before being printed to standard output. If any tokens are found, they are replaced by the appropriate session-specific values.
Each line is checked for description, price, and other tokens. The mechanism is very similar in each case. Start with looking at the substitution of simple tokens, which are denoted by a token name surrounded by single backslashes (\sessionid\, for example).
The following steps are involved in replacing this value with the actual session ID value:
The steps for replacing price and description tokens are quite similar. In the case of price tokens, the pattern match is /\\\\Price\\(\w+)\\/g, which looks for an additional \\Price\ before the token. The replacement operation is similar, too, but the %Price array is used instead of the %Price%SessionValues array. The procedure for descriptions is identical, except for the fact that the %Desc associative array is used.
The final subroutine that may be invoked from the main routine is &DoLogin. This subroutine assigns a session ID and displays the login screen, which challenges the user to enter a valid user ID and password. Listing 9.7 shows the source code for &DoLogin.
Listing 9.7 The DoLogin Subroutine
# Subroutine to perform login. sub DoLogin { # Generate a pseudo-random session id: $SessId = time || $$; # Store this id in its own session file: $sessvals{'sessionid'} = $SessId; &SetSessValues($SessId, %sessvals); # Show the login page &ShowLoc( $SessId, "login.htmw" ); }
The code carries out the following three simple steps:
The mechanics of initiating and manually terminating a session are explained in the following section.
The first time that the user runs wrap.cgi, &DoLogin is invoked and displays the login screen on the user's browser. The user enters a user ID and password and then sends them to the server by submitting the form. Then the wrapper program calls &Validate to authenticate the details provided by the user.
Notice that &DoLogin does no more than initiate the login. After the user fills in the user ID and password and submits the form, the wrapper program is invoked again. At that point, the &Validate function is called to perform the actual authentication of the user.
Listing 9.8 shows the HTML file login.html.
Listing 9.8 The login.html File
<html> <head> <title>Camel's 'R UsLog in</title> </head> <body> <h1>Camels 'R Us Log in</h1> You must log in as a registered user before you can use the system. <p> <ul> <li> Click <a href="http://www.camelsrus.com/register.html">here</a> to register as åan on-line customer with Camels 'R Us. <p> <li> If you have already registered, enter your userid and password and click "Log on": </ul> <form method="post" action="wrap.cgi"> <input name="sessionid" type="hidden" value="\sessionid\"> <input name="action" type="hidden" value="Validate"> <table> <tr> <td>User ID:</td> <td><input name="userid" type="text" size=20></td> </tr> <tr> <td>Password:</td> <td><input name="pass" type="password" size=20></td> </tr> <tr> <td></td> <td><input name="logon" value="Log on" type="submit"></td> </tr> </table> </form> </body> </html>
Following are the critical lines of this file:
When the user submits the login form, the resulting CGI data contains two items that are of interest to the wrapper program: the user's session ID and a CGI value called action, which has the value Validate. The &DoAction function sees this value and invokes the &Validate function, which is shown in Listing 9.9.
Listing 9.9 The Validate Subroutine
# Validate: Given a userid and password, check against # a user database and if valid, show main menu. sub Validate { my ($uid, $pwd) = @_; my %userdb; # Argument check: both userid and password are required. $pwd || return 0; # userid/password pairs are stored in the user db file: tie(%userdb, 'SDBM_File', ".userdb", Fcntl::O_RDONLY(), 0664) || HTMLError("Unable to open user database (!$)."); # Success if password given matches password in file. # Note check that a password was actually given... if ( $pwd ne "" && $userdb{$uid} eq $pwd ) { # Add customer name to session data: %sessvals = &GetSessValues($SessId); $sessvals{'customerid'} = $uid; &SetSessValues($SessId, %sessvals); # Show the main menu: &ShowLoc($SessId, "mainmenu.htmw"); } else { &ShowLoc($SessId, "failedlogin.htmw"); } # tidy up: untie(%userdb); }
&Validate takes two arguments-the user ID and password-and attempts to match them with the contents of a DBM file that contains user ID-password pairs by following these steps:
If this call to tie is successful, the %userdb array serves as an interface to the contents of the DBM file.
Logging out is much simpler than logging in. If the user clicks a submit button called action, with a value of Log Out, the wrapper script's &DoLogout function is called by &DoAction. Listing 9.10 shows the code for &DoLogout.
Listing 9.10 The DoLogout Subroutine
# Perform a logout. Deletes session file and shows log off screen. sub DoLogout { my ($sessionid) = @_; # zap the session file: two parts, *.pag and *.dir # taint checking => need to save file name via a pattern match: $sessionid =~/(\w+)/; unlink("./log/$1.DB.pag", "./log/$1.DB.dir"); # show the farewell screen: print "Content-type: text/html\n\n", "<html><head>", "<title>End of session</title>", "</head>", "<body>", "<h1>Session Terminated</h1>", "you have logged out from the Camels 'R Us Web ordering system.<p>", "<a href=\"wrap.cgi\">Call again</a> soon!<p>", "</body></html>"; }
This subroutine performs two simple steps: deletes the DBM file associated with the session and displays a farewell message. The latter task is simple, but the former is complicated somewhat by the fact that you have turned on Perl's taint checking by using the -T option in the command line.
There are, in fact, two DBM files for each session: one with a .pag extension and one with a .dir extension. Given a session ID stored in the Perl $sessionid variable, the most direct way to delete these two files is to pass them as a literal string to the unlink function, as follows:
unlink("./log/$sessionid.DB.pag", "./log/$sessionid.DB.dir");
This statement fails, however. Perl can see that $sessionid was passed in to the program via the environment and is, therefore, not to be trusted. In this instance, a hacked session ID value might result in the deletion of arbitrary files.
You need to extract the value contained in $sessionid to another variable that Perl does not regard as being tainted. Simply assigning a new variable to $sessionid does not work; Perl will see that the new variable is tainted by such close association with the old one.
Instead, perform a pattern match on $sessionid, looking for all alphanumeric characters and saving the result, as follows:
$sessionid =÷ /(\w+)/;
The expression /(\w+)/ tells Perl to match the first set of alphanumeric characters in $sessionid and store them. Then this stored value-$1-is used in the arguments to the unlink command.
This method works, because Perl assumes that you know what you are doing when you save the results of a pattern match. The assumption is based on the fact that you got hold of the tainted variable and extracted something from it in a very specific way. It would be quite difficult for a suspect value to survive a pattern match of this sort.
After you come this far, the management of session data becomes relatively simple. You use associative arrays to store the session data internally, and you use tied hashes to associate these arrays with DBM files for external storage. You've already seen how to use DBM files for user ID-password pairs; the principle is identical for session data.
The current session data is stored by calling the &SetSessValues subroutine. Listing 9.11 shows the code for &SetSessValues.
Listing 9.11 The SetSessValues Subroutine
# Store values for a given session id # Takes an associative array as argument, saves to session file sub SetSessValues { my ($Sessionid, %DBMdb) = @_; my %tiedDB; # Open the session file and set values: tie(%tiedDB, 'SDBM_File', "./log/$Sessionid.DB", Fcntl::O_RDWR()|Fcntl::O_CREAT(), 0644) || HTMLError("Unable to open session file for sessionid ", $Sessionid, " for writing ($!)."); # Set the values in the DB to values passed as argument: %tiedDB = %DBMdb; # Store the new values: untie(%tiedDB); }
The code does the following things:
That's the beauty of using tied hash arrays; they look after all the storage implementation details for you. Simply assign a normal associative array to a tied hash array, and you've stored the contents of the normal array.
The principle for retrieving session data that has already been stored to a DBM file is analogous. You can retrieve the session state for a given session ID from DBM storage by using the &GetSessValues function, the code for which appears in Listing 9.12.
Listing 9.12 The GetSessValues Subroutine
# Retrieve session values for a given session ID # Return them as an associative array sub GetSessValues { my ($Sessionid) = @_; my %DBMdb, %returnvalue; # No session file, no values so just return. return unless -e "./log/$Sessionid.DB.pag"; # Open the session file and get values: tie(%DBMdb, 'SDBM_File', "./log/$Sessionid.DB", Fcntl::O_RDONLY(), 0664) || HTMLError("Unable to open session file for sessionid ", $Sessionid, " for reading ($!)."); # Save the array before closing the file: %returnvalue = %DBMdb; untie %DBMdb; # Pass the associative array back to the calling routine: return %returnvalue; }
All the action in this code is contained in the tie and untie statements; the rest is error checking. The following steps show how &GetSessValues works:
Again, the tied hash looks after the storage implementation details for you. These two functions allow you to store and retrieve an entire set of session data quite easily.
You now have the necessary infrastructure to carry out the core business of this application, which is to give the user an interface to an ordering system. You need to allow the user to build an order in stages during the course of a session; review that order at any stage; cancel the entire order, if desired; and confirm the order, at which point the order will be written to permanent storage.
A user builds an order by using the three order forms shown in figures 9.3, 9.4, and 9.5 (refer to "Program Flow" earlier in this chapter). These forms work in the same way, so this section focuses on only one: the Feeds form. The source for the form is stored in feeds.htm. The relevant lines for the first product are as follows, with the other products being set up in an identical fashion:
After filling in the desired quantity of each product, the user clicks the Add to Order submit button. A set of CGI data goes back to wrap.cgi, containing an action value that is caught by &DoAction and that in turn invokes the &AddtoOrder function.
&AddtoOrder takes two parameters: the user's session ID and the associative array of CGI values. Notice that these values are the CGI values, not the session values. You want to extract some of the CGI information and discard the rest; the data that you extract will be saved with the session data for later use.
Listing 9.13 shows the code for the &AddtoOrder function.
Listing 9.13 The AddToOrder Subroutine
# Given the cgi values from a form, add fields starting # with "Order_" to the order for the current session. sub AddToOrder { my ($SessId, %cgivals) = @_; my %state; # Get current session state first: %state = &GetSessValues($SessId); # Add order items and quantities to state: foreach $item (keys %cgivals) { if ( $item =~ /^Order_/ && $cgivals{$item} ) { $state{$item} = $cgivals{$item}; } } # Save state after adding order: &SetSessValues($SessId, %state); # Now drop back to main menu: &ShowLoc($SessId, "mainmenu.htmw"); }
This code takes the following actions:
It is reasonable to expect that the user may want to review the order before confirming it. She can do so by selecting Review Order from any of the menus. This option passes a CGI value of action=Review+Order to the wrapper script. This value is trapped by &DoAction, causing &ReviewOrder to be invoked.
Listing 9.14 shows the code for &ReviewOrder.
Listing 9.14 The ReviewOrder Subroutine
# Review the order for the current session sub ReviewOrder { my ($sessionid) = @_; my %state = GetSessValues($sessionid); # Use &ShowLoc to display start and end parts of form: # We'll build the list manually in this subroutine. # Print the form up to the start of the list: &ShowLoc($sessionid, "review_head.htmw"); # Show the current order in a table: print "<table border=2>", "<tr>", "<th>Item</th>", "<th>Unit price</th>", "<th>Number Ordered</th>", "<th>Total Price</th>", "</tr>"; # Keep a running total of price as we go $grand_total = 0; foreach $item ( keys %state ) { # If it starts with "Order_", it's an order. if ( $item =~/^Order_(\w+)/ ) { $thisprice = $state{$item} * $Price{$1}; print "<tr>", "<td align=left>$Desc{$1}</td>", "<td align=right>\$$Price{$1}</td>", "<td align=right>$state{$item}</td>", "<td align=right>\$$thisprice</td>", "</tr>\n"; $grand_total += $thisprice; } } print "</table><p>"; print "Total cost this order: \$$grand_total. ", "Residents of Ireland please add 21\% sales tax."; # Now show the rest of the form: &ShowLoc($sessionid, "review_tail.htmw"); }
This code builds a HTML table that shows the current order details, one item at a time. To create this table, the code follows these steps:
You need to look closely at the code that displays the order information for a given item. Notice first that the regular-expression match that determines whether the item is an order item stores the text after Order_. This backreference is available as $1 after the match takes place. If the item's key is Order_feed_driedhusks, for example, $1 will be feed_driedhusks. You need to store this backreference so that you can reference values in the %Price and %Desc arrays.
For each item, &ReviewOrder does the following:
Figure 9.6, earlier in this chapter, shows an example of the resulting table.
Finally, the order that you have so carefully built must be confirmed by the user and written to a file. Order confirmation is triggered when the user clicks one of the many Confirm Order buttons that you have helpfully scattered around the various forms. The CGI data that arrives back at wrap.cgi then contains the setting action=Confirm+Order, which is caught by &DoAction; then &ConfirmOrder is invoked.
Listing 9.15 shows the source code for &ConfirmOrder.
Listing 9.15 The ConfirmOrder Subroutine
# Confirm the order and write it to file. sub ConfirmOrder { my( $sessionid ) = @_; my %state = GetSessValues($sessionid); # Write a record to the orders file: open(ORDFILE, ">>./orders.dat") || &HTMLError("Unable to open orders file for appending."); # Print a header line for this order: print ORDFILE "Order for customer $state{'customerid'} at ", scalar(localtime(time)), ":\n"; # Each order item: foreach $item ( %state ) { $item =~ /^Order_(\w+)/ && print ORDFILE "$1 ($state{$item});\n"; } # Finish: print ORDFILE "End of order for customer $state{'customerid'}.\n"; close ORDFILE; # Inform the user: &ShowLoc($sessionid, "confirm.htmw"); }
&ConfirmOrder does the following things:
The example wrapper application shown in this chapter, while primitive, is functional. You could easily develop this application into a practical package. Among the issues that need to be addressed to make this application production-ready are:
You can learn more about the issues raised in this chapter by reading the following chapters: