Chapter 9 Understanding CGI Security

by Paul Doyle

CONTENTS

Understanding the Security Issues
Managing Sessions
Designing the Sample Application
- Program Flow
- Data Issues
Implementing the Sample Application
Logging In and Out
Managing Session Data
- Storing Session Data
- Retrieving Session Data
Managing the Orders
Wrapping Up
From Here

Providing a secure but open Web service is a balancing act. You want to make your site as easily accessible as possible so that the maximum number of people can use it, but you also want to make sure that access is not so open that your service can be harmed, accidentally or deliberately, due to a lack of security.

Achieving the correct balance between security and openness is easier if you simply eliminate all write access on the server. Often, however, the nature of the service that you are providing dictates that users have write access to files on the server.

This chapter deals with one approach to providing a secure Web service that permits writing of data on the server. The chapter examines the security issues in detail and shows how you can accept data from users and store it on the server while protecting the integrity of your service. To illustrate the techniques, a sample Web-based ordering system using shopping-cart logic is developed in the course of the chapter.

Understanding the Security Issues

Previous chapters (particularly Chapter 8 "Understanding Basic User Authentication") dealt with the general issues involved in verifying the identity of users who access your Web server and in restricting access to files on the server by using the HTTP daemon's configuration mechanism. Before you get started using CGI wrapper scripts, take a closer look at the specific security issues that arise when you decide to accept input from users.

Tracing the Chain of Command

The transparency of the Web's infrastructure is one thing that has contributed to the enormous growth in its popularity in recent years. The user clicks a button in his browser, and a server somewhere on the planet sends him whatever his little heart desires.

The Browser As Interface Users feel as though they are interacting directly with the material that they read on the Internet, with only their browsers between them and the words or images that they see. A browser picks up files for a user and displays images. From time to time, the user may need to enter a user ID and a password to access a special service. To most users, their browsers appear to be logging them on to the server. If a user notices that a program on the server is being executed, the impression is that the browser is executing the program for the user.

Beneath the Surface In fact, looking just below the surface of the action in a typical point-and-click operation, you can see that what actually happens is considerably more complex. The sequence of events can be summarized as follows:

The user selects a location on the Web, using a browser.
The browser locates the server that stores the requested item.
The browser sends a request to the server for the contents of the location.
The server examines the request and decides, based on its access restrictions, whether access to the location is allowed from the IP address at which the request originated.
If access is allowed from that IP address, the server decides whether its access restrictions mean that user authentication is required for the requested location.
If so, the server challenges the browser to provide a valid user ID and password for the authentication domain that contains the requested location.
If the user ID and password are verified, and if that user ID is entitled to access the authentication domain that contains the requested location, the server reads the contents of the location.
If the location is a CGI program, the server executes it and sends the results to the browser. If the location is not a CGI program, the contents of the location are sent as they are to the browser.
The browser displays the content sent to it by the server.

This schema could be broken down into much finer levels of detail, but for now, the points of interest are:

The server decides whether a user ID and password are required.
The server validates the user ID and password.
The server executes the CGI program.
The server transmits the contents of the requested location.

In short, all the action takes place on the server, and all of it is done by the Web server (httpd) process. The user sends a request to the server and receives the result from the server; everything that happens in between involves activity carried out by the server process on behalf of the user.

NOTE

Browser extensions such as Java and SafeTcl are exceptions to this rule. They are executed by the browser after the browser downloads script files from a server.

The Server As Interface This execution by the server on behalf of the user presents a serious security issue. If the user were logging in to the server interactively and running a program in a shell that was governed by the user's own private account privileges, matters would be simpler. It would be relatively easy to ensure that the user did not have privileges that endangered the integrity of the rest of the system. Many system administrators would protest that it is far from easy to restrict privileges in this way on even a moderately large system. The difficulty, however, certainly pales in comparison with the effort involved in maintaining the integrity of data in a directory to which everyone on the Internet has some form of write access.

The fact is, users do not log in to the Web server. The Web server responds to requests from anyone on the Internet, reading files on the server on the user's behalf or even executing programs on the server on the user's behalf. The same httpd process generally executes programs for all users who access the server, so the system that works on interactive systems-containing the activities of the user at the operating-system level based on the user's process ID-will not work on Web servers.

Instead, the httpd process uses its own verification mechanisms to ensure that users are who they say they are, and it interprets its access restriction rules to determine who can do what (and where). From the point of view of the operating system, everyone who accesses a service shares the same process ID and has the same privileges to all files. The following sections explain why sharing a process is a problem.

Reading Files

All Web servers make files available for reading. If your server does nothing else, you can concentrate on making sure that the http daemon has no write access anywhere on your server, with the exception of its log files.

But even then, you need to take care. Some files may be intended for the eyes of a particular group of people only. In that case, creating Web server user groups and carefully planning authentication domains (containers for the files) can protect the data from unwelcome attention.

It is also important to avoid exporting files that should not be visible over the network. A classic example is exporting the server's password file so that anyone on the Internet can have a crack at it. Restricting the exported area on the Web server to a particular, specially designated directory tree can help you avoid security holes such as this, but you also must prevent users from placing links or aliases to sensitive files in the exported area. Chapter 8 "Understanding Basic User Authentication," explains how to secure against this kind of security breach.

Writing to Files

The primary area of concern in this chapter is allowing users to write files, not just read them. When you decide that it is necessary to allow the httpd server to write to files on behalf of remote users, you must take great care to limit the circumstances under which the httpd writes data to disk.

You may need to allow the server to write files on behalf of users for several reasons:

Accepting user input. The guestbook example in Chapter 2, "Introduction to CGI," stores data entered by users who visit a site. Other examples include name and address or customer feedback fields in Web forms.
Taking orders. The same concept can be extended to cover actual orders for goods. The Web server acts as a front end for an ordering database, with the order data being read from a form filled in by users on their browsers. Although it is similar in many technical respects to survey-type data collection (user feedback, for example), this type of input is conceptually quite different. The data received by the server in this way is not simply stored on disk; it directly initiates a sequence of events that leads to the delivery of a product and the generation of an invoice.
Providing temporary storage. The page-based nature of Web services raises a special difficulty for the Web programmer. Each screen that the user sees on his browser is a separate Web location. If a user is presented with a menu on one page and chooses one of the menu options, the menu choice generally leads to a separate location. In the case of CGI programs, a separate program is invoked for each menu choice. And that fact, to the dismay of programmers, unfortunately means that variables must be stored on disk between pages.

Storing Variables

This section takes a closer look at providing temporary storage. Suppose that you, the CGI programmer, want to implement a system that involves three screens. Screen A presents a form that contains the usual elements: fill-in boxes, drop-down menus, and such. When form A is submitted, assuming that no essential data is missing, screen B appears. This screen is also a form, but it picks up additional information from the user. When form B is submitted, screen C appears-another form, this time summarizing the user's entries in forms A and B.

When I fill in form A and click the submit button, the ACTION parameter of the FORM statement in form A contains the name of a CGI program that generates form B. This CGI program-call it MakeB.c-receives the values that I entered in form A through the process environment, as described in earlier chapters. The program then generates form B with a FORM statement that has an ACTION parameter specifying that another program-call it MakeC.cgi-should be invoked when form B is submitted. So I fill in form B and click the submit button. MakeC.cgi receives the values that I entered in form B and generates form C appropriately.

Losing the Data

Just one thing is missing in form C: the data that I entered in form A! This data was held in two places: in the form where I entered it and in MakeB.cgi, which received it. That's as far as the data goes unless you take explicit steps to pass it along.

So you need to make sure that all values that I enter or select in both forms A and B arrive safely in MakeC.cgi.

Overloading the ACTION Parameter One way is to make sure that the data gets passed along is to add the cgi variable information to the ACTION parameter of the FORM statement in form B. The value of ACTION is a hyperlink, so it can take CGI parameters in the usual way. This example invokes MakeC.cgi, with the variables Name,ColorChoice, and Horsepower equal to Kurt,Blue, and 0, respectively, when the user submits the form:


<FORM ACTION="MakeC.cgi?Name=Kurt&ColorChoice=Blue&Horsepower=0">

Notice that a statement such as this must be generated by a CGI program. The values Kurt, Blue, and 0 are decided only when the user fills in form B, so these values cannot be hard-coded into a HTML file on disk.

This method ensures that the designated values get to MakeC.cgi. Unfortunately, the contents of form B will be lost, replaced by these manually imposed values. This method may be useful in limited circumstances, but beware-it can get out of hand quickly. Consider a case in which a group of HTML forms are used to build up a set of data over a series of transactions, with the user being allowed to go back and forth between forms at will. Consider the number of variables that you need to put on the ACTION parameter's hyperlink, and consider the impossibility of keeping everything straight. Then read on.

Saving It to Disk Another approach is to store the values of all such variables to a file as the filled-in forms are received. This type of method is easier to manage when you have multiple pages and when flow between pages is not strictly linear-that is, in virtually all Web services. From the point of view of the Web programmer, writing pages and scripts to use such a system is not terribly arduous; the main requirement is that you store to disk all data that you may want to see again later.

Although saving data to disk is the preferred option for all but the most trivial cases, it has some drawbacks:

You need to allow write access in CGI directories. This situation is not a security risk in itself, but it does make security risks more likely, and it certainly precludes the possibility of a watertight, read-only Web server.
A good deal of programming overhead is involved in setting up a system of this sort. You need to write low-level storage management functions that other, higher-level scripts will invoke; you need to decide on a user authentication strategy; and you may have to write a primitive interpreter to allow stored values to reappear in HTML pages. (Fortunately, I've done all that work for you.)
Finally, you must do a certain amount of minor housekeeping work on a regular basis with a system of this type-deleting stale data from time to time, reviewing access restrictions, and so on.

These disadvantages are easily outweighed by the flexibility of a solid intermediate storage system, especially when you consider that most of the work has already been done. The system described in the rest of this chapter has all the essentials that you need to get up and running quickly. Just finish this chapter, copy some files from the CD-ROM that comes with this book, and prepare to amaze and astound your friends!

Managing Sessions

The type of write-to-file system described in the preceding sections is a session-based system. A solid understanding of the way in which such a system works is essential before you can start writing code, so this section examines the building blocks of a session-based Web service.

The Nature of a Session

A URL is the basic unit of Web access. You want to allow users to access several such locations-CGI scripts and HTML pages-on your server in a linked sequence, in such a way that you can track users' actions and any data that they enter. Therefore, you need to identify the user when she makes contact initially, and if she follows a link within our service, you want to regard the new access as being a continuation of the initial one. That means identifying her when she requests the second and subsequent locations, and making the logical connection between these accesses and the initial one. This logical sequence of connected accesses is what I refer to as a session.

This section is concerned with tracking the user's access over a sequence of steps within the service, not with gathering historical data over an extended period. If the user attaches to your service and follows links between pages for a few minutes one day, and then does something similar the next day, those accesses count as two sessions, not as two parts of one session.

The End The preceding section's definition of a session contains a loose end: when does a session finish?

In some cases, you may want to provide the user an explicit menu option for logging off and terminating the session. This approach makes sense in the case of password-protected services, in which a dangling open connection may represent a security risk.

In other cases, in which services are open to all users, an explicit disconnect or logout button may not be necessary. You simply follow the user's actions until she stops using the service, at which time any data stored on a temporary basis is deleted by a housekeeping process of some sort.

This open-ended approach can be messy. How do you know whether your user has really left your service and is not just reading what you displayed on her screen or handling some other business, with the intention of resuming the session later?

The answer lies in a time-out mechanism. You decide on a reasonable upper limit to the length of a pause between accesses to your service, and you regard as abandoned any sessions that pause for longer than that duration. A separate housekeeping process-a Perl script executed as a cron job at regular intervals, for example-deletes session files that have not been modified within the designated time.

A time-out system is also useful in systems that use an explicit logout option. If the connection goes down or the user forgets to log off, the session was not terminated properly and remains active until you kill it. You can use a time-out mechanism to put these suspended sessions to sleep.

The Session Identifier Tracking each user separately from one CGI script or HTML page to the next is essential. You may have dozens of users accessing your page simultaneously, and you don't want one person's data becoming confused with that of another.

The key to keeping track of users is the session identifier-a unique number or string that your service assigns to a user when she first connects to the service. This identifier is automatically included in all subsequent requests made by the user during that session, allowing the service to determine which session file to use for the user when she reconnects.

You can pass the key between the server and the client in several ways. If you can guarantee that all clients who access your server are capable of supporting cookies, you can set a cookie to the value of the key, as described in Chapter 7 "Dynamic and Interactive HTML Content in Perl and CGI." The simplest approach-and the one that you'll use for your sample application-is to store the key in a HTML form as a hidden value.

The following HTML statement, for example, results in a CGI variable called sessionkey, with a value of clef:


<input type="hidden" name="sessionkey" value="clef">

The value is not displayed on the browser in any way, but when the user submits the form, sessionkey and clef are included in the list of CGI variables and values that the CGI script on the server receives.

To summarize, a session is a set of connected accesses of a service by one user. A session is terminated when the user explicitly sends a termination request or when a designated time-out period elapses. The service-your Perl program-tracks the user throughout the session by checking for the user's unique session identifier on each access request.

The Wrapper

So far, so good. The user has a session ID, which she provides with each new request for a location within your service. Using this ID, she hops from one page, form, or CGI program to another, and you keep track of all her data for her.

This process should be managed by a single CGI program rather than by a series of interconnected scripts, for a few good reasons:

If you use several CGI scripts-one to handle each distinct task carried out within your service-you must take great care to ensure that all scripts deal with session files in a coherent way.
Managing the hyperlinks between a series of interconnected scripts can be a real nightmare. Suppose that you have a link from script A to script B to script C and another link from script X to script B. Then you decide to edit script A to go directly to script C, and delete script B. You don't realize that you've broken the connection between X and B until a customer runs script X, tries to follow the link to B, and gets a nasty error message. If you use a single CGI program, there is only one link to follow.
If you want to send to the browser a HTML page that is mostly static, but that has a few simple text substitutions based on values from the user's session file, you have to write an entire CGI program to do the job. If you can get your single CGI program to perform those substitutions for you, you can write plain HTML where appropriate.
Using many scripts is too much work. Remember, laziness is officially a virtue in the Perl world.

In short, a single program is easier to manage, and it makes the HTML and associated hyperlinks easier to develop, too. This single script is called a CGI wrapper, and it's how you'll write your sample application.

CGIWrap

A public-domain utility called CGIWrap (included on the CD-ROM that comes with this book) uses a CGI wrapper script for a different purpose. The problem that CGIWrap seeks to address originates in the fact that HTTP daemons (httpds) execute CGI programs on behalf of the end user. The httpd process runs on the server under a user ID that has privileges that are not available to the ordinary user of the server, such as write access to database files. Accordingly, a user on your Web server can write a CGI program to perform tasks that the end user cannot carry out. Examples include printing configuration or password files that you prefer to keep confidential and overwriting data. It would be relatively easy for one user to attack another by overwriting the data in the other user's CGI directory, for example, but damage of this kind can occur accidentally, too.

The best solution to this kind of risk is to have each CGI program execute by using the user ID of the owner of the script, rather than using the user ID of the httpd process. The httpd process runs as root (on a UNIX machine) and gets the httpd to run each CGI program under a separate process, with the user ID of the script owner.

CGIWrap, written by Nathan Neulinger, is a utility program that farms out CGI executions to a separate process in this way. The program also performs some other basic security checks on the CGI script before deciding whether it should allow the script to execute.

Some HTTP daemons now have this type of functionality built in. If your HTTP daemon does not provide this feature, you may want to consider installing CGIWrap to enhance the security of your Web server.

Generic Substitutions

The list in "The Wrapper" earlier in this chapter discussed substitutions in HTML files. This process is best explained by means of a simple example. Suppose that you want to greet your user by using her first name, which she has already entered in a form. The relevant line of the HTML would look something like this:


<h3>Welcome, Jean!</h3>

Assuming that you stored the user's name in the Perl variable $firstname, you can produce this HTML by using a Perl statement such as the following:


print "<h3>Welcome, $firstname!</h3>";

This statement could go in a special CGI script that prints out the welcome page, or it could appear in a special subroutine in your wrapper script. You don't want to adopt that approach, though-you would find yourself writing special scripts or subroutines for every bit of HTML that is not completely static.

A much more elegant solution would be to have your variable name embedded in the HTML file and to have the variable replaced automatically just before the page is sent to the user. You can't use that method, of course; a HTML file is not a Perl program, so there's no point in sticking Perl variable names in there. The principle is sound, however, and you can achieve the same result by using a slightly different mechanism.

Instead of embedding Perl variable names in the HTML file, you can embed special place-holders that your script will translate for you as it sends the page. The placeholder needs to be identifiable as such to the wrapper script; you need to make sure that the wrapper replaces all placeholders without altering any of the HTML. You can identify your placeholders in several ways-by inventing a new HTML tag, for example. (That method is risky, though, because you never know what tag names will appear in the next version of the HTML standard.)

The method in this section uses simple syntax. Placeholders in your HTML files start and end with a backslash. The part between the placeholders indicates the name of the variable whose value is to go in the placeholder's position. The welcome line, for example, would appear in the HTML file as follows:


<h3>Welcome, \personalname\!</h3>

Your wrapper script will spot the backslashes, extract the personalname token, and look it up in a table in memory. Because you're using Perl, that table is implemented as an associative array. (The section called "Parsing an HTML File" later in this chapter explains exactly how that implementation is achieved.) The wrapper script then spits out the original line, minus the backslashes and the token name, which it replaces with the value in the associative array for that token.

You probably will want to develop your own simple syntax for your application. The syntax used in this chapter is deliberately simple, so as to keep the sample code easy to follow.

CAUTION

If you want to do anything more complicated than simply replace values, you probably should design proper syntax for your embedded commands before you start, because adding features as you go along will almost certainly result in obscure, confusing syntax. You may even want to add looping and other flow-control capabilities. If your needs really extend to features of that sort, you may want to consider server-side includes or Java for an out-of-the-box solution.

Flow Control

The essential components of your managed system are:.

Sessions
Session keys
Session management functions
HTML files with placeholders for variable values
A substitution mechanism to replace those placeholders with real values
A wrapper script that manages all accesses within your system

Before you start to develop this application, you need to know how you're going to manage program flow.

This type of system is state-based in the sense that the current state of the system-the aggregate of the values of all the system's variables-dictates the next action taken by the system. Looking at the system from the server side, a set of variables and values are provided by the browser, and the CGI script decides what to do based on these values. From the point of view of the browser, the CGI program is directed by a sort of remote-control mechanism, by which the browser sets CGI values to control the action on the server.

Pointing the Way The most direct way to tell the wrapper script which location to display next is to state it in the CGI values. You can accomplish this task quite easily by inserting into the outgoing form a hidden value that contains the URL of the next location, such as this:


<input type="hidden" name="location" value="wrap.cgi">

Then the wrapper script can check the CGI values for a location setting when it tries to decide what to do next.

Directing the Action Although a location setting is adequate in many cases, you may not always know which location will come next. You may need the server to do some processing of the session-state values before deciding which location to return next.

In some cases, the CGI program can determine the next action by examining particular CGI values. If there is no session-key value, for example, the only valid action is to force the user to log on. In most cases, however, the number of values to be checked and the possible combinations of values will get out of control quickly. Statements of the form "if (A=B and X=Y) but not ((C=B or Y=Z) and A=D)" will start to appear.

A neat, direct way to implement this type of remote control is to have a special CGI value-call it action-that specifies the next action to be taken. This value is not always required but is very useful in most cases.

Suppose that your welcome screen is to be followed by a product menu. The following line, placed inside the form on the welcome page, will tell the wrapper that the next action to be taken should be product_menu:


<input type="hidden" name="action" action="product_menu">

Notice that this mechanism merely indicates a state to the wrapper script; it does not dictate which Perl function should be invoked in the event that a particular state arises.

Walking Through the Wrapper Figure 9.1 illustrates the chain of events that take place during a typical session.

Figure 9.1 : The typical program flow through the wrapper.

The following list provides a detailed explanation:

The user begins a new session by accessing the wrapper script without a session key.
The wrapper script sends back the logon screen (a simple HTML form).
The user enters a user ID and password, and submits the form. The ACTION parameter of the FORM statement ensures that the wrapper is invoked again when the form is submitted.
The wrapper script determines whether the user ID and password match; if they don't, the script sends a failure message back to the user.
If the authentication succeeds, a new session key is generated, and a welcome screen is displayed. This screen contains several links or submit buttons, each of which points to the wrapper script. The screen also contains a hidden field that stores the session key for this user.
The user enters data, if the screen is a form, and then selects a link or clicks a submit button.
The browser sends the user's CGI values to the wrapper script. These values always include the user's session key. Generally, there also is an action value to direct the wrapper script, as explained in "Directing the Action" earlier in this chapter.
The wrapper script reads the CGI values and extracts the session key from them.
The wrapper script reads the user's current session state from the user's session file, using the session-key value.
Depending on the value of the location and action variables (or on a combination of other CGI values, if these values are not defined), the wrapper invokes a subroutine to perform whatever processing is required. This subroutine usually involves making changes in the user's session-state values.
The wrapper script saves the new session state to the user's session file.
If a location was set, either by the location value in the incoming CGI data or internally by the wrapper script, the wrapper script starts to read the file that corresponds to that location. Alternatively, the script can generate HTML to go directly to the browser. In either case, the user's session key is included in the HTML.
If a HTML file is being sent by the wrapper, each line of the file is sent to the browser, with any placeholders being filled in based on the user's session values.
The new location is displayed on the browser, and the entire process starts again from step 6.

Notice the overall level of program flow. The browser sends CGI values to the wrapper script; the wrapper script process the values, updates stored values, and returns HTML to the browser. The process starts again when the user submits the form or follows a hyperlink that leads back into the system.

Designing the Sample Application

The example application in this chapter is a shopping-cart-style ordering system for Camels 'R Us, which sells three types of products: food, vacations, and accessories. You will develop an interface that allows authorized users to browse product menus; build up a list of purchases; review the order; and, finally, submit the order. At that point, your program simply writes the order details to a file. In real life, the order could be passed on to an ordering database.

Program Flow

Start creating the application by outlining the sequence of events that take place from the user's point of view. This outline is not the same as the outline of the wrapper-script internals in the walk-through section earlier in this chapter, but a description of the functionality required of your application. The sequence of events from the user's point of view is as follows:

The action starts when a user accesses the wrapper script for the first time-that is, without a currently valid session key. The wrapper script sends a logon menu to the user's browser.
If the user provides a valid user ID and password, the main menu is sent. This menu (see fig. 9.2) allows the user to choose one of the three product categories.
When the user makes a choice from the main menu, an order form for the chosen category appears. This form lists all the available products and allows the user to enter the numbers of all the products that she wants to order. A blank for any product means that the user does not want to purchase the product.
Figures 9.3, 9.4, and 9.5 show the order screens for the three product categories.
The user either submits or cancels the order form. If the user submits the form, the order details from the form are added to the user's session file, and the main menu reappears. If the user cancels the form, the session file is unmodified, and the main menu reappears.
The user can choose another product category to add to the order list for this session. If she wants, she can revisit a product category and amend the number of items ordered.
A submit button in the main menu allows the user to review her order for the session so far. When the user clicks this button, a form appears, showing the number of items of each type ordered and the total cost of the order. Figure 9.6 shows a sample order-review screen.
When the user finishes adding items to the order list, she can click a submit button in the main menu to confirm the order. Her order details for this session are appended to an order file.
Alternatively, the user can choose to cancel the entire order. This action clears the order list contained in her session file.
Finally, the user selects the log-off option from the main menu to terminate the session. Her session file is deleted at this point, and an appropriate end-of-session message is displayed.

Figure 9.2 : The main menu.

Figure 9.3 : The Feeds menu.

Figure 9.4 : The Vacations menu.

Figure 9.5 : The Extras menu.

Figure 9.6 : The order-review screen.

The main menu acts as a sort of anchor for the application, offering three menu options and three submit buttons. A more hierarchical system may be appropriate for a larger system, but this level of complexity is fine for a simple example such as this one.

Data Issues

Next, you need to consider how data is to be stored by the application, both internally (in Perl data structures) and externally (in session files and raw data files). There are two principal data elements. The first is the set of data representing the available products and prices; the second is the order list built up by the user in the course of a session.

Product Data The price list consists of a list of product names and corresponding prices. You're writing this application in Perl, so the obvious candidate for storing this data is an associative array, with the product names being keys and the prices being values. An expression such as $Price{"Camelskin"} returns the unit price of the named product.

The data can be stored externally in several ways. On a UNIX system, a simple products-to-prices lookup list would be most efficiently stored in a DBM file, using a tied hash. You'll use this technique to store the session information. In the case of the product data, however, you will have two arrays indexed by product name: an array of prices and an array of product descriptions. The best approach for an application this simple is to store the data items in a flat text file, one product per line. More complex programs could interface with a relational database of product information, if necessary.

Each record contains three items of information about a single product: the product name, the product price, and a brief description. In this application, use the separator ::: between items so that Perl can easily split the lines as it reads them.

Listing 9.1 shows the sample data file, which is stored in the file products.dat on the CD-ROM that comes with this book.

Listing 9.1 The Contents of products.dat, the Data File


feed_desertfruit:::14.95:::Fruit of the Desert

feed_driedhusks:::9.95:::Sun Dried Husks

feed_dromspecial:::19.95:::Special Dromedary Supplement



travel_kalahari:::1250:::A 1 year round trip of the Kalahari with the Tuareg.

travel_dakota:::1990:::Discover the magic of the Occident with our 2 week 

åwhirlwind tour of Dakota.

travel_alranch:::250:::Break in gently with a weekend at your local branch of 

åAl's Camel Ranch Inc.



extras_pancam:::69:::Hand-stitched leather panniers (camel)

extras_pandrom:::99:::Hand-stitched leather panniers (dromedary)

extras_covers:::150:::Genuine camel-skin hump covers

The first line in this file describes a product called feed_desertfruit, which is described as "Fruit of the Desert" and which has a unit price of $14.95. The product name is used as the key in the price and description associative arrays-%Price and %Desc, respectively. So $Price{'feed_desertfruit'} is "14.95", and $Desc{'feed_desertfruit'} is "Fruit of the Desert".

The three product categories are denoted by the feed_, travel_,and extras_ prefixes. These prefixes make maintaining the product data file easier; they have no significance for the program.

Order Data Orders are built up over the course of a session, with the partial order being saved in a session file until it is complete. You'll save the order as part of the session data, in the form of an associative array. The keys of order elements in the session data array are the product names, and the values are the numbers of the items ordered by the user.

Internally, the wrapper program stores all session values in an associative array called %state. If the user orders three extras_covers items, $state{'Order_extras_covers'} is set to 3. The Order_ prefix here is used when you're scanning the session file to pick up ordered items.

One advantage of storing this data internally in the form of an associative array is that it makes the job of storing the data externally very simple indeed. You'll use Perl's tied-hash functionality to create a link between the internal storage (associative array) and the external storage (DBM file), and leave it to Perl's innards to keep the two in sync. Session files are stored in the LOG subdirectory and called $Sessionid.DB. ($Sessionid is the user's session ID.)

Implementing the Sample Application

Now you're ready to implement your wrapper program. In this section, you develop a working wrapper system in Perl from scratch. This application is not highly sophisticated, but it is fully functional, and it is intended primarily to be an illustration of the techniques involved. The source code for the program is explained along the way; all source code for this basic wrapper program appears on the CD-ROM that comes with this book.

Getting Started: The Main Routine

The main routine is the best place to start. This code dictates the overall program flow; it is also the only routine that is guaranteed to be executed every time. Listing 9.2 shows the code for the main routine.

Listing 9.2 The Wrapper Script's Main Routine


#!/usr/local/bin/perl -TI.



# Import methods for DBM files:



require SDBM_File;

require Fcntl;





# Global variables:



%Price, %Desc, %cgivals;





# Read any form values passed in.



&GetCGIVals(%cgivals);





# Extract the settings which dictate program flow:



$SessId = $cgivals{'sessionid'};

$Loc    = $cgivals{'location'};

$Action = $cgivals{'action'};





# Read the product details:



&ReadProductData("products.dat");





# Now decide what to do. First check if session id supplied:



if ( $SessId ) {



    # Session Id supplied: perform action and/or show location.



    $Action && &DoAction($SessId, $Action);

    $Loc && ShowLoc($SessId, $Loc);

}

else {



    # No session id supplied: login is only valid action.



    # Error check: session id required if location or action requested.



    ( $Loc || $Action ) &&

     HTMLError("Location/action requested but no session ID provided.<br>",

            "Please <a href=\"./wrap.cgi\">log in</a> ",

            "and follow the instructions on screen.");



    # Default action: log in.



    &DoLogin;

}

The main routine performs these tasks:

Reads the contents of SDBM_File.pm and Fcntl.pm, which are Perl modules that contain functions related to DBM files and file access modes, respectively.
Declares three global associative arrays to store prices, descriptions, and CGI values, respectively.
Calls &GetCGIVals to read the CGI values. (The &GetCGIVals function is described in "Getting the CGI Values" later in this chapter.)
Picks out the user's session ID, as well as the location to be loaded next or the action to be carried out.
Reads the product data from the data file by using the &ReadProductData function (described in "Reading the Product Data" later in this chapter).
If the session ID is defined and an action has been requested, the &DoAction function is called to carry out that action. (For details, see "Invoking a Specific Function" later in this chapter.)
If the session ID is defined and a location has been requested, the &ShowLoc function (described in "Parsing an HTML File" later in this chapter) is called to display that location.
If no session ID is defined, the only valid action is to display the logon screen.
The wrapper determines whether a location or action was requested. If no location or action was requested, the wrapper terminates the session with an error message. Otherwise, it calls the &DoLogin function to initiate a login. (The &DoLogin function is described in the "Initiating a Login" section later in this chapter.)

The subroutines called by the main routine are described in the following sections, in the sequence in which they appear.

Error Messages and HTML Headers

The &HTMLError function, which appears from time to time in the following code, is a utility function that displays an error message on the user's browser in HTML format. You could simply write error messages to STDOUT, knowing that the messages will get to the browser. If an error message is sent before the browser receives a Content-type: HTML header line, however, the browser reports a server error, and the user does not get to see your error message. For this reason, send a Content-type: text/html line first.

Another problem arises if you send the HTML header twice; the user sees the second header line on-screen along with the error message. This arrangement is a little untidy, so use a second utility function- &HTMLhead-to write the header for you, as follows:

# A utility routine to print the HTML header once only. Sub HTMLhead { if ( !$header_printed ) { print "Content-type: text/html\n\n"; $header_printed = 1; } }

The variable $header_printed is 0, or false, initially. The first time that you call this function, the HTML header is printed, and $header_printed is set to true; thereafter, the if statement is false and the header is not printed.

Following is the &HTMLError function, which uses &HTMLhead:
# Utility routine to show a HTML error message sub HTMLError { my @ErrMsg = @_; &HTMLhead; print "<title>wrap.cgi Error</title>\n"; print "<h1>Error</h1>\n"; print @ErrMsg; print "<p>Execution aborted."; exit; }

Getting the CGI Values

The CGI values returned by the browser represent the sum of the wrapper's knowledge about the user when the program starts. Other information about the user and his or her previous actions is contained in the user's session file, but that information cannot be accessed without the session ID-a CGI value. Your first priority, then, must be to interpret the CGI information and save the values of all CGI variables.

The &GetCGIVals subroutine queries the httpd's environment values and saves the CGI values in the global %cgivals associative array. These values can arrive in two forms, depending on whether the form data was transmitted by means of the GET or POST method:

If the GET method was used, the CGI values are contained in a single environment variable called QUERY_STRING.
If the POST method was used, the CGI values are available on standard input, and the length of the string that contains those values is provided by the environment variable CONTENT_LENGTH.

The &GetCGIVals routine checks for both the GET and POST methods and saves the CGI information in either case. Listing 9.3 shows the source code for &GetCGIVals.

Listing 9.3 The GetCGIVals Subroutine


# Get the CGI Values



sub GetCGIVals {



    my (@settings, $set, $name, $value, $formvalues, $postlength);



    # First decide if GET or POST used:



    $postlength = $ENV{'CONTENT_LENGTH'};



    if ( $postlength ) {

	read (STDIN, $formvalues, $postlength);

    }

    else {

	$formvalues = $ENV{'QUERY_STRING'};

    }





    # Store settings in an associative array:



    # First split into "A=B" parts:

    @settings = split('&', $formvalues);



    # Now store each name and value in the associative array:

    foreach $set ( @settings )  {

     ($name, $value) = split('=', $set);

     $cgivals{$name} = $value;

    }

}

The code shown in Listing 9.3 carries out the following steps:

Saves number of characters of CGI information waiting on standard input in the $postlength Perl variable.
If $postlength is nonzero, reads exactly $postlengthcharacters from standard input and stores them in the $formvalues Perl variable.If $postlength is zero, sets $formvalues to the value of the QUERY_STRING environment variable.
The $formlength variable contains one or more CGI settings and is of the form "A=X&B=Y&C=Z"- a series of single settings concatenated with ampersands. The &GetCGIVals routine saves these individual settings in the @settings array by means of the split function.
Splits each element of the @settings array at the equal sign (=) into a key and a value, and stores the key and value in the $name and $value Perl variables, respectively.
Adds a new element to the %cgivals associative array, with $name as the index and $value as the value.

This last section may seem to be unnecessarily complicated. Why not split $formvalues into %cgivals in one step by using a statement like the following, which would replace the entire foreach loop in &GetCGIVals?:


%cgivals = map( split('='), split('&', $formvalues) );

The problem is that there may be "empty" CGI values, which would disrupt the mapping shown in the single statement. Suppose that a session ID was missing, for example. The $formvalues string might look like this:


sessionid=99353&action=Validate&userid=&pass=www

In this case, the first split function would break $formvalues into these substrings:


sessionid=99353

action=Validate

userid=

pass=www

The second split operation carried out within the map operation would break these substrings into the following list of substrings: sessionid, 99353, action, Validate, userid, pass, and www.

Finally, the assignment of %cgivals would result in the following key/value pairs being stuffed into %cgivals (www would be empty):


sessionid=99353

action=Validate

userid=pass

Breaking the operation into two steps is marginally more complicated, but much safer.

Reading the Product Data

Having read the CGI values, you next read in the product data. This data is stored in the %Price and %Desc associative arrays by the &ReadProductData subroutine, which takes a single argument: the name of the product data file. Listing 9.4 shows the code for &ReadProductData.

Listing 9.4 The ReadProductData Routine


# Read in the product data:



sub ReadProductData {



    my ($infile) = @_;

    my $product, $price, $desc;



    # Check parameters:



    $infile || HTMLError("ReadProductData requires data file name.");



    # Open the data file:



    open (PRODUCTS, $infile)

     || HTMLError("Unable to open product data file $infile (!$).");





    # Read each line:



    while (<PRODUCTS>) {



     $line = $_;



     # drop trailing newlines:



     chop($line);



     if ( $line =÷/:::/) {    # Ignore lines without separator



         # Split on ":::" separators:



         ($product, $price, $desc) = split(':::', $line);



         # Store price and description using product name as key:



         $Price{$product} = $price;

         $Desc{$product} = $desc;

     }

    }



    # tidy up:



    close PRODUCTS;

}

If the named product file exists and is opened successfully, it is read in one line at a time, and the following processing occurs for each line:

The new-line character at the end of the line is dropped by means of the chop function.
If the line does not contain the separator string, Perl skips to the next record in the file.
If the line contains the separator string, it is split into the product name, price, and description fields by means of Perl's split function.
The product's price and description fields are stored in the %Price and %Desc arrays, respectively, using the product name as the key.
Finally, the data file is closed.

Invoking a Specific Function

The next subroutine that the main routine may call is &DoAction-a function that encapsulates all specific processing functions other than parsing and displaying a HTML file. &DoAction consists primarily of a list of if clauses, as you can see from the source code in Listing 9.5.

Listing 9.5 The DoAction Subroutine


# Subroutine to perform a named action for a given session Id.

# Branches to required subroutine.



sub DoAction {



    my ($SessId, $Action ) = @_;





    # Argument check:



    $Action || &HTMLError("DoAction called but no Action specified!");



    # Now a branch for each possible action -



    ( $Action eq "Validate" ) &&

     &Validate($cgivals{'userid'}, $cgivals{'pass'});



    ( $Action eq "Add+to+Order" ) &&

     &AddToOrder($SessId, %cgivals);



    ( $Action eq "Cancel+Order" ) &&

     &ShowLoc($SessId, "mainmenu.htmw");



    ( $Action eq "Return+to+Main+Menu" ) &&

     &ShowLoc($SessId, "mainmenu.htmw");



    ( $Action eq "Review+Order" ) &&

     &ReviewOrder($SessId);



    ( $Action eq "Confirm+Order" ) &&

     &ConfirmOrder($SessId);



    ( $Action eq "Log+Out" ) &&

     &DoLogout($SessId);



}

The $DoAction subroutine takes two arguments: the user's session ID and the name of the action to be taken. After a quick check for valid arguments, the subroutine checks the action name against a list of possible actions and, if it finds a match, calls the appropriate subroutine. The available subroutines are described in their own context later in this chapter.

Parsing an HTML File

If the main routine finds that a location was specified with the $location variable, it invokes the &ShowLoc subroutine to show the contents of that file on the browser. Any tokens found in the file (denoted by means of the syntax described in "Generic Substitutions" earlier in this chapter) are filled in by means of the contents of the %State, %Price, and %Desc arrays.

This function is, in many ways, the core of the wrapper script. Listing 9.6 shows the code.

Listing 9.6 The ShowLoc Subroutine


# Show a HTML file, filling in values using the supplied session ID



sub ShowLoc  {



    my ($ID, $URL) = @_;

    my %SessionValues, @matches;





    # Open the requested file for reading:



    open(RETURNFILE, $URL) ||

     &HTMLError("Unable to open file \"", $URL, "\" for reading.");





    # Send HTML header:



    &HTMLhead;





    # Load all session values for this ID:



    %SessionValues = &GetSessValues($ID);





    # Process each line of requested file:



    while(<RETURNFILE>) {



     # Store this line ($_ will be overwritten):

     $currentline = $_;



     # Check for prices, e.g. "\\Price\itemname\":

     if ( @matches = /\\\\Price\\(\w+)\\/g )  {

         # Interpolate each match on this line:

         foreach $match ( @matches )  {

          $currentline =~ s/\\\\Price\\$match\\/$Price{$match}/;

         }

     }



     # Check for descriptions, e.g. "\\Desc\itemname\":

     if ( @matches = /\\\\Desc\\(\w+)\\/g )  {

         # Interpolate each match on this line:

         foreach $match ( @matches )  {

          $currentline =~ s/\\\\Desc\\$match\\/$Desc{$match}/;

         }

     }



     # Check for tokens, e.g. "\tokenname\" => tokenvalue:

     if ( @matches = /\\(\w+)\\/g )  {

         # Interpolate each match on this line:

         foreach $match ( @matches )  {

          $currentline =~ s/\\$match\\/$SessionValues{$match}/;

         }

     }



     # Now print the line, including any substitutions:

     print $currentline;

    }





    # Tidy up:



    close RETURNFILE;

}

The code is simpler than it looks. Step through the code to see how it works:

The designated file is opened.
The HTML header line is sent by means of the &HTMLhead function (described earlier in this chapter).
All session state values for the supplied session ID are read in by means of the &GetSessValues subroutine (described in detail in "Retrieving Session Data" later in this chapter).
Each line of the input file is read in and printed to standard output.
The input file is closed.

The fourth step is actually slightly more complicated. Each line is checked for substitution tokens before being printed to standard output. If any tokens are found, they are replaced by the appropriate session-specific values.

Each line is checked for description, price, and other tokens. The mechanism is very similar in each case. Start with looking at the substitution of simple tokens, which are denoted by a token name surrounded by single backslashes (\sessionid\, for example).

The following steps are involved in replacing this value with the actual session ID value:

A regular-expression match is carried out. The regular expression is /\\(\w+)\\/ and has a trailing g to denote that all such patterns within the string are to be matched.
This pattern looks for a backslash, followed by at least one alphanumeric character, followed by another backslash. The backslash has special meaning within regular expressions, so it must be escaped by means of a second backslash.
The "at least one alphanumeric character" is the token name, which is saved because it is surrounded by parentheses.
All such tokens are saved in the @matches array, because the regular expression takes place in the context of an array assignment.
The foreach clause replaces all matched patterns on the current line with the actual value of the token in the session-state array. This replacement is made by making a regular-expression substitution; the matched token, surrounded by backslashes, is replaced by a value in the %SessionValues associative array. The index into this array is the current token name, $match.

The steps for replacing price and description tokens are quite similar. In the case of price tokens, the pattern match is /\\\\Price\\(\w+)\\/g, which looks for an additional \\Price\ before the token. The replacement operation is similar, too, but the %Price array is used instead of the %Price%SessionValues array. The procedure for descriptions is identical, except for the fact that the %Desc associative array is used.

Initiating a Login

The final subroutine that may be invoked from the main routine is &DoLogin. This subroutine assigns a session ID and displays the login screen, which challenges the user to enter a valid user ID and password. Listing 9.7 shows the source code for &DoLogin.

Listing 9.7 The DoLogin Subroutine


# Subroutine to perform login.



sub DoLogin {



    # Generate a pseudo-random session id:



    $SessId = time || $$;





    # Store this id in its own session file:



    $sessvals{'sessionid'} = $SessId;

    &SetSessValues($SessId, %sessvals);





    # Show the login page



    &ShowLoc( $SessId, "login.htmw" );

}

The code carries out the following three simple steps:

Generates a unique session ID for this session. This ID consists of the system time on the httpd server combined with the process ID of the process that is running the wrapper program.
Stores the session ID in the session values file. This ID can be used at a later stage as a cross-check on the validity of a session file, but this wrapper program does not use it in this way. The value is stored by means of the &SetSessvalues function (described in "Storing Session Data" later in this chapter).
Displays the login menu, using the &ShowLoc function (described earlier in this chapter).

The mechanics of initiating and manually terminating a session are explained in the following section.

Logging In and Out

The first time that the user runs wrap.cgi, &DoLogin is invoked and displays the login screen on the user's browser. The user enters a user ID and password and then sends them to the server by submitting the form. Then the wrapper program calls &Validate to authenticate the details provided by the user.

Notice that &DoLogin does no more than initiate the login. After the user fills in the user ID and password and submits the form, the wrapper program is invoked again. At that point, the &Validate function is called to perform the actual authentication of the user.

Logging In

Listing 9.8 shows the HTML file login.html.

Listing 9.8 The login.html File


<html>

<head>

<title>Camel's 'R UsLog in</title>

</head>



<body>

<h1>Camels 'R Us Log in</h1>



You must log in as a registered user before you can use the system.

<p>



<ul>



<li>

Click <a href="http://www.camelsrus.com/register.html">here</a> to register as 

åan on-line customer with Camels 'R Us.

<p>



<li>

If you have already registered, enter your userid and password and click "Log on":



</ul>



<form method="post" action="wrap.cgi">



<input name="sessionid" type="hidden" value="\sessionid\">



<input name="action" type="hidden" value="Validate">



<table>



<tr>

<td>User ID:</td>

<td><input name="userid" type="text" size=20></td>

</tr>



<tr>

<td>Password:</td>

<td><input name="pass" type="password" size=20></td>

</tr>



<tr>

<td></td>

<td><input name="logon" value="Log on" type="submit"></td>

</tr>



</table>



</form>



</body>

</html>

Following are the critical lines of this file:

<form method="post" action="wrap.cgi">This statement tells the browser what location to request when the form is submitted (wrap.cgi) and to submit its CGI data via the POST method.
<input name="sessionid" type="hidden" value="\sessionid\">The session ID is inserted into this line by the wrapper program before the browser sees it, so to the browser, the line will look more like the following:
<input name="sessionid" type="hidden" value="838604689">This statement tells the browser to store the CGI value sessionid=838604689 but not to display it. This value will be sent to the server with the other CGI values when the form is submitted, allowing you to identify the user.
<input name="action" type="hidden" value="Validate">Another hidden value, action=Validate, is present in this form. The CGI data that goes back to the server instructs wrap.cgi what step to take next: validation of the user ID and password provided by the user.
Finally, the userid and pass fields create the text boxes where the user enters her authentication details.

Validating the User

When the user submits the login form, the resulting CGI data contains two items that are of interest to the wrapper program: the user's session ID and a CGI value called action, which has the value Validate. The &DoAction function sees this value and invokes the &Validate function, which is shown in Listing 9.9.

Listing 9.9 The Validate Subroutine


# Validate: Given a userid and password, check against

# a user database and if valid, show main menu.



sub Validate {



    my ($uid, $pwd) = @_;

    my %userdb;



    # Argument check: both userid and password are required.



    $pwd || return 0;



    # userid/password pairs are stored in the user db file:



    tie(%userdb, 'SDBM_File', ".userdb", Fcntl::O_RDONLY(), 0664) ||

     HTMLError("Unable to open user database (!$).");



    # Success if password given matches password in file.

    # Note check that a password was actually given...



    if ( $pwd ne "" && $userdb{$uid} eq $pwd ) {



     # Add customer name to session data:



     %sessvals = &GetSessValues($SessId);

     $sessvals{'customerid'} = $uid;

     &SetSessValues($SessId, %sessvals);



     # Show the main menu:



     &ShowLoc($SessId, "mainmenu.htmw");

    }

    else {

     &ShowLoc($SessId, "failedlogin.htmw");

    }



    # tidy up:



    untie(%userdb);

}

&Validate takes two arguments-the user ID and password-and attempts to match them with the contents of a DBM file that contains user ID-password pairs by following these steps:

&Validate first determines that a password has been provided.
The tie statement creates a link between an associative array (%userdb) and the DBM file that contains the valid user IDs and passwords. The arguments are:
- The name of the associative array (%userdb).
- The method to be used by Perl to associate external and internal storage. You'll use SDBM_File so that Perl will use the methods defined in SDBM_File.pm to connect the associative array with a DBM file.
- The name of the DBM database in which the user data is stored.
- The file-access mode. You'll use Fcntl::RDONLY(), which returns a read-only flag.
- The default file protection for the database.

If this call to tie is successful, the %userdb array serves as an interface to the contents of the DBM file.

If the password field is not empty, it is compared with the password for the specified user ID. If the user does not exist, the match fails. Likewise, if the user exists but the password is not the same as the one in the authentication DBM file, the match fails.
If the match succeeds, the authentication details are valid. The user ID is added to the session file, and the main menu is displayed; the user is logged in.
If the match fails, the file failedlogin.htmw is displayed. This file explains what happens and allows the user to try logging in again.
Finally, the untie command breaks the connection between the %userdb array and the DBM file.

Logging Out

Logging out is much simpler than logging in. If the user clicks a submit button called action, with a value of Log Out, the wrapper script's &DoLogout function is called by &DoAction. Listing 9.10 shows the code for &DoLogout.

Listing 9.10 The DoLogout Subroutine


# Perform a logout. Deletes session file and shows log off screen.



sub DoLogout {



    my ($sessionid) = @_;





    # zap the session file: two parts, *.pag and *.dir

    # taint checking => need to save file name via a pattern match:



    $sessionid =~/(\w+)/;

    unlink("./log/$1.DB.pag", "./log/$1.DB.dir");





    # show the farewell screen:



    print "Content-type: text/html\n\n",

    "<html><head>",

    "<title>End of session</title>",

    "</head>",

    "<body>",

    "<h1>Session Terminated</h1>",

    "you have logged out from the Camels 'R Us Web ordering system.<p>",

    "<a href=\"wrap.cgi\">Call again</a> soon!<p>",

    "</body></html>";

}

This subroutine performs two simple steps: deletes the DBM file associated with the session and displays a farewell message. The latter task is simple, but the former is complicated somewhat by the fact that you have turned on Perl's taint checking by using the -T option in the command line.

There are, in fact, two DBM files for each session: one with a .pag extension and one with a .dir extension. Given a session ID stored in the Perl $sessionid variable, the most direct way to delete these two files is to pass them as a literal string to the unlink function, as follows:


unlink("./log/$sessionid.DB.pag", "./log/$sessionid.DB.dir");

This statement fails, however. Perl can see that $sessionid was passed in to the program via the environment and is, therefore, not to be trusted. In this instance, a hacked session ID value might result in the deletion of arbitrary files.

You need to extract the value contained in $sessionid to another variable that Perl does not regard as being tainted. Simply assigning a new variable to $sessionid does not work; Perl will see that the new variable is tainted by such close association with the old one.

Instead, perform a pattern match on $sessionid, looking for all alphanumeric characters and saving the result, as follows:


$sessionid =÷ /(\w+)/;

The expression /(\w+)/ tells Perl to match the first set of alphanumeric characters in $sessionid and store them. Then this stored value-$1-is used in the arguments to the unlink command.

This method works, because Perl assumes that you know what you are doing when you save the results of a pattern match. The assumption is based on the fact that you got hold of the tainted variable and extracted something from it in a very specific way. It would be quite difficult for a suspect value to survive a pattern match of this sort.

Managing Session Data

After you come this far, the management of session data becomes relatively simple. You use associative arrays to store the session data internally, and you use tied hashes to associate these arrays with DBM files for external storage. You've already seen how to use DBM files for user ID-password pairs; the principle is identical for session data.

Storing Session Data

The current session data is stored by calling the &SetSessValues subroutine. Listing 9.11 shows the code for &SetSessValues.

Listing 9.11 The SetSessValues Subroutine


# Store values for a given session id

# Takes an associative array as argument, saves to session file



sub SetSessValues {



    my ($Sessionid, %DBMdb) = @_;



    my %tiedDB;





    # Open the session file and set values:



    tie(%tiedDB, 'SDBM_File', "./log/$Sessionid.DB",

      Fcntl::O_RDWR()|Fcntl::O_CREAT(), 0644) ||

       HTMLError("Unable to open session file for sessionid ",

              $Sessionid, " for writing ($!).");





    # Set the values in the DB to values passed as argument:



    %tiedDB = %DBMdb;





    # Store the new values:



    untie(%tiedDB);

}

The code does the following things:

Passes the user's session ID and the current session state as arguments to the function.
Using the tie statement, creates the relationship between an associative array (%tiedDB) and the session file. The arguments are:
- The name of the associative array (%tiedDB).
- The method to be used by Perl to associate external and internal storage. You use SDBM_File, just as you did for the user-authentication database.
- The name of the DBM database in which the session data is to be stored.
- The file access mode. You use a Boolean or combination of Fcntl::RDWR() and Fcntl::O_CREAT(), which are methods that return file access flags. The flags used here indicate that the file is to be opened in read/write mode and created if it does not already exist.
- The default file protection for the database.
  If this call to tie is successful, the %tiedDB array serves as an interface to the contents of the DBM database. Making a change in %tiedDB has the same effect as making the same change directly in the DBM file.
Copies the entire contents of the session state, represented by %DBMdb, to the tied array (%tiedDB).
Closes the DBM database and breaks %tiedDB's connection with it by calling the untie function. The contents of %tiedDB are written in full to the DBM file at this point.

That's the beauty of using tied hash arrays; they look after all the storage implementation details for you. Simply assign a normal associative array to a tied hash array, and you've stored the contents of the normal array.

Retrieving Session Data

The principle for retrieving session data that has already been stored to a DBM file is analogous. You can retrieve the session state for a given session ID from DBM storage by using the &GetSessValues function, the code for which appears in Listing 9.12.

Listing 9.12 The GetSessValues Subroutine


# Retrieve session values for a given session ID

# Return them as an associative array



sub GetSessValues {



    my ($Sessionid) = @_;

    my %DBMdb, %returnvalue;





    # No session file, no values so just return.



    return unless -e "./log/$Sessionid.DB.pag";





    # Open the session file and get values:



    tie(%DBMdb, 'SDBM_File', "./log/$Sessionid.DB", Fcntl::O_RDONLY(), 0664) ||

     HTMLError("Unable to open session file for sessionid ",

            $Sessionid, " for reading ($!).");



    # Save the array before closing the file:



    %returnvalue = %DBMdb;



    untie %DBMdb;



    # Pass the associative array back to the calling routine:



    return %returnvalue;

}

All the action in this code is contained in the tie and untie statements; the rest is error checking. The following steps show how &GetSessValues works:

The user's session ID is passed in as the sole argument to the function.
If no session file exists for the supplied session ID, the function simply returns control to the calling function.
The tie statement creates the relationship between an associative array (%DBMdb) and the session file. The arguments are:
- The name of the associative array (%DBMdb).
- The method to be used by Perl to associate external and internal storage. You use SDBM_File again.
- The name of the DBM database in which the session data is stored.
- The file access mode. You use Fcntl::RDONLY(), which is a method within the Fcntl package that returns a read-only flag.
- The default file protection for the database.
  If this call to tie is successful, the %DBMdb array behaves as though it contains all the values stored in the associated DBM file.
Next, the code copies the entire contents of %DBMdb into an array called %returnvalue, effectively making a local copy of the entire database.
The code closes the DBM database and breaks %DBMdb's connection with it by calling the untie function. The contents of %DBMdb are undefined after the code closes the DBM file by means of the untie function, which is why you needed to make the local copy of the database in %returnvalue before calling untie.
Finally, the code passes the contents of %returnvalue back to the calling function.

Again, the tied hash looks after the storage implementation details for you. These two functions allow you to store and retrieve an entire set of session data quite easily.

Managing the Orders

You now have the necessary infrastructure to carry out the core business of this application, which is to give the user an interface to an ordering system. You need to allow the user to build an order in stages during the course of a session; review that order at any stage; cancel the entire order, if desired; and confirm the order, at which point the order will be written to permanent storage.

Building an Order

A user builds an order by using the three order forms shown in figures 9.3, 9.4, and 9.5 (refer to "Program Flow" earlier in this chapter). These forms work in the same way, so this section focuses on only one: the Feeds form. The source for the form is stored in feeds.htm. The relevant lines for the first product are as follows, with the other products being set up in an identical fashion:

<form method="post" action="wrap.cgi">The form statement tells the browser to send its CGI data to the server by using the POST method and to request the location wrap.cgi when the data is returned.
<input name="sessionid" type="hidden" value="\sessionid\">The session ID is written to this line before the browser sees the form. Just as in the case of the login menu, this line ensures that the session ID is contained in the form as a CGI value, so that the browser can pass it back to the server with the rest of the CGI data.
<td>\\Desc\feed_driedhusks\</td>The first data cell in the table contains the token \\Desc\feed_driedhusks\, which will be replaced in the &ShowLoc function by the current value of $Desc{'feed_driedhusks'}.
<td align=right>$\\Price\feed_driedhusks\</td>Similarly, \\Price\feed_driedhusks\ is replaced by $Price{'feed_driedhusks'}.
<td><input name="Order_feed_driedhusks" type="text" value="\Order_feed_driedhusks\" size=10></td>This line appears in the browser with the final token filled in. If the number of items of this type that have been ordered so far is 3, the line appears as follows:
<td><input name="Order_feed_driedhusks" type="text" value="3" size=10></td>This line gives the text input field for this item an initial value of 3.
<input name="action" type="submit" value="Add to Order">The submit button labeled Add to Order is called action. If the user clicks this button, a CGI value of action=Add+to+Order is sent to the server. This value is trapped by &DoAction, and the appropriate function is called.

After filling in the desired quantity of each product, the user clicks the Add to Order submit button. A set of CGI data goes back to wrap.cgi, containing an action value that is caught by &DoAction and that in turn invokes the &AddtoOrder function.

&AddtoOrder takes two parameters: the user's session ID and the associative array of CGI values. Notice that these values are the CGI values, not the session values. You want to extract some of the CGI information and discard the rest; the data that you extract will be saved with the session data for later use.

Listing 9.13 shows the code for the &AddtoOrder function.

Listing 9.13 The AddToOrder Subroutine


# Given the cgi values from a form, add fields starting

# with "Order_" to the order for the current session.



sub AddToOrder {



    my ($SessId, %cgivals) = @_;

    my %state;





    # Get current session state first:



    %state = &GetSessValues($SessId);





    # Add order items and quantities to state:



    foreach $item (keys %cgivals) {

     if ( $item =~ /^Order_/ && $cgivals{$item} ) {

         $state{$item} = $cgivals{$item};

     }

    }





    # Save state after adding order:



    &SetSessValues($SessId, %state);





    # Now drop back to main menu:



    &ShowLoc($SessId, "mainmenu.htmw");

}

This code takes the following actions:

The current session values are retrieved from the DBM file by means of &GetSessVals and stored in %state.
Each item in the %cgivals array is checked. If an item begins with Order_, it is an order and is saved in the %state array. If one of the CGI values is Order_feed_driedhusks=4, for example, $state{'Order_feed_driedhusks'} is set to 4.
The updated %state array is saved back to the DBM file.
The main menu is displayed again, allowing the user to continue building the order, review it, or commit it.

Reviewing the Order

It is reasonable to expect that the user may want to review the order before confirming it. She can do so by selecting Review Order from any of the menus. This option passes a CGI value of action=Review+Order to the wrapper script. This value is trapped by &DoAction, causing &ReviewOrder to be invoked.

Listing 9.14 shows the code for &ReviewOrder.

Listing 9.14 The ReviewOrder Subroutine


# Review the order for the current session



sub ReviewOrder {



    my ($sessionid) = @_;

    my %state = GetSessValues($sessionid);





    # Use &ShowLoc to display start and end parts of form:

    # We'll build the list manually in this subroutine.



    # Print the form up to the start of the list:



    &ShowLoc($sessionid, "review_head.htmw");





    # Show the current order in a table:



    print "<table border=2>",

    "<tr>",

    "<th>Item</th>",

    "<th>Unit price</th>",

    "<th>Number Ordered</th>",

    "<th>Total Price</th>",

    "</tr>";





    # Keep a running total of price as we go



    $grand_total = 0;

    foreach $item ( keys %state ) {



     # If it starts with "Order_", it's an order.



     if ( $item =~/^Order_(\w+)/ ) {

         $thisprice = $state{$item} * $Price{$1};

         print "<tr>",

         "<td align=left>$Desc{$1}</td>",

         "<td align=right>\$$Price{$1}</td>",

         "<td align=right>$state{$item}</td>",

         "<td align=right>\$$thisprice</td>",

         "</tr>\n";

         $grand_total += $thisprice;

     }

    }



    print "</table><p>";



    print "Total cost this order: \$$grand_total. ",

    "Residents of Ireland please add 21\% sales tax.";



    # Now show the rest of the form:



    &ShowLoc($sessionid, "review_tail.htmw");



}

This code builds a HTML table that shows the current order details, one item at a time. To create this table, the code follows these steps:

Gets the current session state and stores it in %state.
Calls &ShowLoc to display the header part of this page. This header does not vary from one invocation to the next, so it is stored in a HTML template file.
Prints the table header. The columns are Item, Unit Price, Number Ordered, and Total Price.
Checks each item in the %state array and, if the key starts with Order_, prints the details for that item.
Finishes the table.
Calls &ShowLoc to display the standard footer for this page.

You need to look closely at the code that displays the order information for a given item. Notice first that the regular-expression match that determines whether the item is an order item stores the text after Order_. This backreference is available as $1 after the match takes place. If the item's key is Order_feed_driedhusks, for example, $1 will be feed_driedhusks. You need to store this backreference so that you can reference values in the %Price and %Desc arrays.

For each item, &ReviewOrder does the following:

Multiplies the number of items ordered ($state{$item}) by the unit price of this item ($Price{$1}). The result is stored in $thisprice.
Prints an HTML table cell that contains the product description: $Desc{$1}.
Prints the unit price of this product: $Price{$1}. The \$ before $Price produces a real dollar sign on-screen.
Prints the number of these items ordered: $state{$item}.
Prints the total price for this product, in dollars.
Keeps a running tally of the grand-total price for this order. This total is printed below the table.

Figure 9.6, earlier in this chapter, shows an example of the resulting table.

Placing the Order

Finally, the order that you have so carefully built must be confirmed by the user and written to a file. Order confirmation is triggered when the user clicks one of the many Confirm Order buttons that you have helpfully scattered around the various forms. The CGI data that arrives back at wrap.cgi then contains the setting action=Confirm+Order, which is caught by &DoAction; then &ConfirmOrder is invoked.

Listing 9.15 shows the source code for &ConfirmOrder.

Listing 9.15 The ConfirmOrder Subroutine


# Confirm the order and write it to file.



sub ConfirmOrder {



    my( $sessionid ) = @_;



    my %state = GetSessValues($sessionid);



    # Write a record to the orders file:



    open(ORDFILE, ">>./orders.dat") ||

     &HTMLError("Unable to open orders file for appending.");





    # Print a header line for this order:



    print ORDFILE "Order for customer $state{'customerid'} at ",

                  scalar(localtime(time)), ":\n";





    # Each order item:



    foreach $item ( %state ) {

     $item =~ /^Order_(\w+)/ &&

         print ORDFILE "$1 ($state{$item});\n";

    }





    # Finish:



    print ORDFILE "End of order for customer $state{'customerid'}.\n";

    close ORDFILE;



    # Inform the user:



    &ShowLoc($sessionid, "confirm.htmw");

}

&ConfirmOrder does the following things:

Retrieves the current session values from DBM storage into the %state associative array.
Opens the orders file (orders.dat) in append mode.
Prints a header line for this order in the orders file. This line contains the customer name ($state{'customerid'}) and the current time.
Recognizes any state item that begins with Order_ as an order item. The product name and number of items ordered are recorded in the orders file for each item.
Closes the orders file.
Notifies the user that the order has been accepted.

Wrapping Up

The example wrapper application shown in this chapter, while primitive, is functional. You could easily develop this application into a practical package. Among the issues that need to be addressed to make this application production-ready are:

The user authentication used in this example is for illustrative purposes only. If you have authority to control httpd user authentication on your server, you probably should delegate the responsibility for user authentication to the httpd. You can create user databases, using standard tools, and know that you are benefiting from years of development of secure user authentication technology.
You need a regular procedure for clearing orphaned session files-files that remain on disk after a session is abandoned without the user's explicitly logging out. A simple Perl script run as a cron job should suffice.
The entire system, as it stands, will take orders but not process them. A real system will feed directly into an ordering database, so that orders are processed the same way as orders that are taken by telephone or any other medium.

From Here...

You can learn more about the issues raised in this chapter by reading the following chapters:

Chapter 1 "Perl Overview," provides more information on some of the Perl syntax used in this chapter.
Chapter 2 "Introduction to CGI," provides more information about passing CGI values between browser and server.
Chapter 8 "Understanding Basic User Authentication," provides background material on user authentication and Web security.