Chapter 12 Database Application Using CGI

by Matthew D. Healy

CONTENTS

Fundamental Design and Security Issues
A Simple Working Example in Perl
Perl Tools for Web/DBMS Work
Problem-Solving
- Debugging
- Tuning Performance
The Future of Web/Database Interfaces
From Here...

Although an increasing proliferation of commercial tools for World Wide Web/database integration exist, there still is a place for custom-written Perl CGI scripts in database work. By using CGI scripts to access an existing internal SQL database, you can provide controlled access for outside users without having to redesign the database itself and without purchasing any proprietary software. As a bonus, your WWW front end will automatically be platform- and location-independent: All that your users need is a WWW browser. Building a good Web/database application is not easy, but the results can be well worth the effort.

Fundamental Design and Security Issues

Figures 12.1 and 12.2 show how information flows between the various programs that together constitute a complete Web/database application. Each layer hides many internal details from the other layers and presents the layers above and below it with standardized protocols. This information-hiding is the great advantage that a Web front end has over conventional client/server systems; the numerous, widely distributed users need nothing but standard Web browsers on their computers. The Web server and the database server could reside on one machine, or they could reside on different machines, in which case your CGI program talks over a network to the database server.

Figure 12.1 : This schematic of Web/database interaction shows how information flows between the programs that make up the application when the database server and Web server are on the same machine.

Figure 12.2 : This schematic shows how information flows between the programs that make up the application when the database server and the Web server are on different machines.

More complex possibilities exist. One application running at my site talks to several database servers, integrating information from a local database with several remote databases. Indeed, users may not even realize how many computers are cooperating to answer their queries; with a well-written database application, all a user needs to know is what information to request. Your job as a Web/database developer is to build tools that allow users to find information without their knowing or caring where the information is actually stored. Security access controls, although often important, should be as unobtrusive as possible, especially for users who are accustomed to the wide-open Internet.

Advantages and Disadvantages of a Web/RDBMS Interface

Building a Web interface to an existing relational database management system, or RDBMS, isn't simple. You need to be familiar with both your relational database and CGI scripting to plan their integration.

You may also need to do much more "roll-your-own" work than you would with a conventional client/server design, because you need to polish the rough edges off the available tools for Web gateways to relational databases. Many of the most powerful products are freeware or shareware, and all the commercial products are very new. Just about every database vendor now has some kind of Web offering, but most products are either in beta or just barely out of beta. In short, if you do Web/database development, you'll be on the cutting edge.

So why do it? For one thing, the Web is a hot field, so you may feel a compelling desire to be in the vanguard. You may also see the neat tricks that other people do at their Web sites and wonder how you can duplicate such effects-especially if the WWW site doing the neat tricks belongs to a competitor.

I believe that some very good technical reasons exist for putting databases on the Web. Your existing relational database serves as a robust repository for your data, with all the maintenance and security features that you have come to expect from your RDBMS, and the WWW front end provides the open access and user-interface flexibility that your users have come to expect from desktop applications.

I won't try to teach relational design in this chapter, because that's beyond the scope of this book. Many excellent books on relational database design are available; I recommend practically anything by C.J. Date, David McGoveran, Joe Celko, George Tillman, and Andrew Warden (pseudonym for Hugh Darwen; some of his works have appeared under both names at different times). Any large technical library or bookstore should have works by most of these authors. At a more advanced level, the writings of E.F. Codd, who invented the relational model, will reward prolonged study. If more RDBMS programmers had read Codd's works with care, the RDBMS implementations that are now on the market would be far better.

After reading what these people have to say, I hope that you'll share my passionate conviction that the relational model is both an elegant mathematical formalism and a powerful tool for solving real-world database problems in a manner that no other approach can match. As is true of any type of front end, no CGI application can replace a good database design, but you probably know that already. End of sermon.

SQL and Its Dialects

SQL has become the lingua franca of relational database work. Starting at IBM in the mid-1970s as a research project called SEQUEL, SQL has grown in popularity over the intervening decades. Today, SQL is by far the most important database query language, with probably hundreds of implementations in use. ANSI has published several official standards (in 1986, 1989, and 1992) and is now working on a new version, informally known as SQL3.

Though official SQL standards exist, few implementations follow them strictly. Every implementation that derives from one or another ANSI-standard dialect of SQL has its own extra features that are deemed useful by those who built it-a situation that creates a big mess. Fortunately, in a typical CGI application, you'll mostly use a rather small subset of SQL that appears (in nearly the same form) in all commonly used dialects of the language.

Limitations of HTTP in a Database Context

With a CGI front end to an RDBMS, you have multiple programs cooperating to accomplish a task. The remote browser is a client to your HTTP server, but the CGI script launched by the HTTP server is itself a client to the RDBMS server. Furthermore, the HTTP protocols are stateless-the browser connects to the server, sends one query, and waits for the reply. After the server sends its reply, it closes the connection. There's no concept of a current connection, so there's no simple mechanism for keeping client-state information around on the server.

For most searches, you can get around the statelessness of the server by keeping all client-state information on the client. Suppose that a user has just performed a search that returned the list of employee records shown in figure 12.3.

Figure 12.3 : In this sample query-results screen, each item in the results list is a hot link that performs another search.

Encoded in the URL behind each hot link in a list like that shown in figure 12.3 is sufficient information for the CGI script to perform a database lookup that will return more details to the user. This technique-sending program-generated HTML that contains enough state information to perform the next search-is known as dynamic HTML. The URLs listed in a search-results screen, such as the one in figure 12.3, typically look something like the following:


<a href=../../../cgi-bin/EmpSrch_id=503">Tom.html     Anderson</a>

<a href=../../../cgi-bin/EmpSrch_id=229">Mike.html    Johnson</a>

<a href=../../../cgi-bin/EmpSrch_id=507">Steve.html   Jones</a>

<a href=../../../cgi-bin/EmpSrch_id=917">Sarah.html   King</a>

<a href=../../../cgi-bin/EmpSrch_id=467">Susan.html   Moore</a>

<a href=../../../cgi-bin/EmpSrch_id=327">John.html    Wang</a>

id is the primary key in the employees table that is to be searched. (By primary key, I mean that the database designer has set things up so that a given key value uniquely identifies a single record.)

How does this allow the database server to display a detail view without any information beyond that in the URL? You may recall from Chapter 3 "Advanced Form Processing and Data Storage," that when an URL contains a question mark, the part that follows the question mark is passed to the CGI program as the environment variable QUERY_STRING. The SQL code generated in response to a QUERY_STRING that contains a primary key value of 503, as in the first sample URL in this section, might look something like Listing 12.1.

Listing 12.1 Sample SQL Query Generated by Clicking a Hot Link


select

     employees.first_name,

     employees.last_name

     employees.salary,

     employees.startdate

     depts.name,

     depts.locid

     emp_dept.empid,

     emp_dept.deptid

from

     employees,depts,emp_dept

where

     employees.id = 503

     and depts.id = emp_dept.deptid

     and employees.id = emp_dept.empid

TIP

Look at the bottom of any Yahoo (http://www.yahoo.com) search-results screen for an elegant example of how URLs can pass information from query to query. The links below the heading "Other Search Engines" automatically search other databases.

When you're updating a database, the use of a CGI interface presents some serious difficulties if more than one user is allowed to make changes to the database. If you aren't careful, the following sequence of events may occur:

User 1 downloads the form to edit a record.
User 2 downloads the same record for editing.
User 1 submits an edited version of the record.
User 2 submits an edited version of the same record.

In a conventional client/server database system, each user maintains an active connection with session-specific state information. If a user downloads a record for editing, the server can keep that record locked until the user who downloaded that record for editing either submits the changes or cancels the editing. Any other user who tries to change the locked record will be told that it's locked by the user who is editing it. With a stateless HTTP server, there's no concept of a current connection; thus, there is no simple mechanism for locking a record.

TIP

One way to handle updates in a CGI program is to place a time-stamp field in each record. You must update this field every time the record changes. In your editing forms, include hidden time-stamp fields so that the CGI program can detect conflicting updates.

Another significant limitation of HTML forms for database work is the lack of field-level validation. A conventional client/server database typically allows the designer to specify various constraints for each field-this one must be an integer between 5 and 99, that one must be in telephone-number format, and so on. When a user tries to type a name in a phone-number field, for example, the program immediately beeps and displays an error message. But an HTML form is sent to the server all at once when the user clicks the submit button, so your script must handle the errors all at once. As you'll see near the end of this chapter, new technologies such as Java and JavaScript can remove this limitation.

Security Issues

Remember that any CGI script is being executed on the same machine as your HTTP server to fulfill a request from an untrusted client browser. Although this situation is true of any CGI script, with a CGI RDBMS application, your script is itself a trusted client to your RDBMS server. Accordingly, you must be even more careful about what you allow users to do through your CGI interface than you do when you write other types of CGI programs.

I said that your CGI program is a trusted client to your RDBMS server. How does the RDBMS server know that your script is to be trusted? Database servers commonly use two mechanisms to authenticate client programs. Both of these mechanisms have important security implications.

One approach is for the database server to implement its own user name and password system, independent of your operating system's or Web server's user names and passwords. In this case, your program source code must contain a database password, which it transmits every time that it tries to connect. When you use this approach, you must be careful to prevent strangers from seeing your actual program code.

Also, to limit the damage in case someone does manage to see the password contained in your program, you should create a user account on your database server, with only the access rights needed for your CGI program to function, and use that account in all your CGI programs. If most of your CGI programs perform only searching, and if only one updates the database, only that program should have update rights in the database.

CAUTION

A particular trap in the CGI-database context is the use of file extensions to tell your HTTP server which files are CGI executables and which are documents. If your text editor makes a backup copy in the same directory with a different extension (such as NAME.BAK or NAME.CGI÷), a wily cracker might be able to download your actual code. For this reason, I strongly advise anyone who uses CGI scripts with hard-coded passwords to configure the HTTP server so that certain directories are defined as CGI directories. Files in those directories can be executed but never displayed. In my experience, any time you mix documents and programs in the same directory, you're asking for trouble.

The other common mechanism for the database server to decide whether a client can be trusted is to define database access rights for specified operating-system user names. In this approach, the database server must trust the operating system to authenticate users. In the CGI context, the use of operating-system user names for authentication presents an especially tricky issue, because most HTTP servers run all CGI scripts under a special, low-privilege user name. (Most UNIX HTTP servers, for example, run all CGI scripts under the name nobody or the name www.) Therefore, you must trust every person who writes CGI scripts on your HTTP server.

One alternative provided in some operating systems is the capability to have a CGI program run as the user name of its owner, with all rights that the owner would have. This method eliminates the need to have your database server trust every CGI script, but it creates its own security problems-now your CGI program has significantly more access rights to the rest of your system than does a CGI running under a special, low-privilege ID. If your HTTP server is running under a single-user operating system that lacks the concept of user names, of course, you must trust every CGI program in any case.

TIP

If you use a UNIX server, the program cgiwrap (available at ftp://ftp.cc.umr.edu/pub/cgi/cgiwrap/) provides an excellent way to run a script as a given user name. This program is better than others for this purpose because it was designed to plug some CGI-specific security holes.

TIP

To handle multiple classes of users with one script, put symbolic links (aliases) to your script in multiple directories with different access policies. Your script checks the SCRIPT_NAME environment variable to see which version was called.

A Simple Working Example in Perl

I often find vague generalities and isolated code snippets to be frustrating, because they don't tell me how the various pieces fit together to form a complete application. To show you how a Web/database gateway works, in this section I'll build a small working application that maintains a hotlist of Web sites, which remote users can search.

First, you see how a Perl script talks to the database. I'll demonstrate the specific calling conventions of two database engines; then I'll show you how to make the code much more portable. The sample application is written with this portable interface.

Accessing a DBMS from Perl

As I noted in the sidebar "SQL and Its Dialects" earlier in this chapter, the subset of SQL that is needed for most CGI/database programming is nearly universal; the same SQL code can often be ported from one database engine to another with little or no change. The details of sending that SQL to the database server and getting the results back from the server, however, are much more varied. Most SQL database servers provide some form of C API for this purpose-typically, a set of functions that can be linked into a C client program. Because Perl, with its strong string manipulation and I/O facilities, lends itself so well to database manipulation, Perl wrappers have been written for most of the common database-server APIs. The use of such a Perl wrapper permits database access from within Perl programs but limits portability, because each database server API is unique; therefore, its Perl wrapper is also unique.

Two DBMS APIs To illustrate the variation in database APIs, listings 12.2 and 12.3 show two short sample programs written in the Perl wrappers for Sybase and mSQL (mini-SQL) servers. The Perl wrapper for Sybase is called SybPerl, and the Perl wrapper for mSQL is called msqlPerl; Perl wrappers for other database engines typically follow the same naming convention. Each of the following two programs connects to the server, asks for a database called test, sends a single SQL statement, and prints the results.

Listing 12.2 An Example of a Simple SQL Query with SybPerl


#!/usr/local/bin/perl

require sybperl;

#

#This code tested with Sybase 4.9.1 and Sybase 10.0.1 under SunOS 4.1.2

#

#NOTE: for Perl4, or for Statically loaded Perl5 versions

#of sybperl, you must edit the first line to replace

#the name 'perl' with the name of your sybperl version



#raw_syb_perl_demo.p

#A simple demonstration of Sybperl in action

#

#Must define $USER,$PWD,$SERVER here!

    $dbproc = &dblogin( $USER,$PWD,$SERVER);

    $dbproc != -1 || die "Can't connect to $server ...\n";

    &dbuse( "test" ) || die "Can't use $database ...\n";



#Create the SQL statement & send to the server

$SQL = "select last_name,salary,id from employees";

&dbcmd( $SQL ) || die "Error in dbcmd.\n" ;

&dbsqlexec || die "Error in dbsqlexec.\n" ;

$result = &dbresults($dbproc);



#and get the resulting rows

%row = &dbnextrow($dbproc, 1); #get first row

while (%row = &dbnextrow($dbproc, 1))

     {

          print "last_name:$row{'last_name'}\t";

          print "salary:$row{'salary'}\t";

          print "id:$row{'id'}\t";

          print "\n";

     }

Listing 12.3 The Same Query as Listing 12.2, but Using mSQL


#!/usr/bin/perl

#raw_msqlperl_demo.p

#

#This code has been tested with Msql 1.0.6 under SunOS 4.1.4

#

#A simple demonstration of Msqlperl in action

require "Msql.pm";$host = shift || "";

package main;

#Connect in two steps: (1) Connect and (2) SelectDB...

if ($dbh = Msql->Connect($host))

    {print "Connected\n";} else {die "failed to connect\n";}

if ($dbh->SelectDB("test"))

    {print("Test db\n");} else {die "Select db failed\n";}



$SQL = "select last_name,salary,id from employees";

$sth = $dbh->Query($SQL) or die $Msql::db_errstr;

#get the hash associating fieldnames and numbers:

@fieldnum{@{$sth->name}} = 0..@{$sth->name}-1;

# %fieldnum is now a list of fieldnums, indexed on names

#and get the rows

while (@row = $sth->FetchRow())

     {

          print "last_name:$row[$fieldnum{'last_name'}]\t";

          print "salary:$row[$fieldnum{'salary'}]\t";

          print "id:$row[$fieldnum{'id'}]\t";

     print "\n";

     }

The output of either program looks something like the following:


last_name:Smith     salary:21000     id:123

last_name:Huskins     salary:19500     id:124

last_name:Williams     salary:51075     id:125

last_name:Jones     salary:27000     id:126

last_name:Hill     salary:17500     id:127

Notice in listings 12.2 and 12.3 that the SQL code string is exactly the same for either database server and that the output is also the same (assuming identical data in the table, of course). Also, the structure of the two programs is similar: connect to the database, send the query, and get the rows. But the details of how the client communicates with the server are different. If your code is to be portable, you need some kind of abstraction layer that insulates the programmer from most database-specific details. Fortunately, such a layer has been written.

A Simple DBMS Abstraction Layer Bo Frese Rasmussen, the author of the excellent WDB database forms-generation package (discussed in detail in the section "WDB" later in this chapter) has written a simple database interface, or dbi layer. By isolating most of the database-specific details to one Perl function library, he made the entire package easy to port, and various database programmers have written versions of the dbi library. As of late August 1996, versions of WDB (and, therefore, of the dbi library) were available for Sybase, Informix, mSQL, Oracle, and Postgres95.

Listing 12.4 shows a dbi version of the simple Sybase and mSQL clients from listings 12.2 and 12.3.

Listing 12.4 The Same Query as Listings 12.2 and 12.3, Using the dbi Layer


#!/usr/local/bin/perl

#Either_dbi_demo.p

#

#This works with either Sybperl or Msqlperl



#AS SHOWN HERE, this works with Msqlperl.

#To make it work with Sybperl, change the

#    $dbFlag line below.

#

#Also, if you are using the Perl4 version of sybperl

#then you must change the first line of this program



$dBFlag = 'MSQL';  ## OR $DbFlag = 'SYBASE'

#this is the msql version!



if ($DbFlag eq 'MSQL') {require 'msql_dbi.pl';}

elsif ($DbFlag eq 'SYBASE') {require 'syb_dbi.pl';}

else {die "unsupported database\n";}



$database = "test"; #define $User, etc here!

&dbi_connect( $user, $pswd, $server, $database );



$Query = "select last_name,salary,id from employees";

&dbi_dosql($Query);



if ($DbFlag eq 'MSQL') #one extra thing needed for Msql

{&dbi_fieldnames( 'last_name', 'salary','id');}



while( %row = &dbi_nextrow  ) {

          print "last_name:$row{'last_name'}\t";

          print "salary:$row{'salary'}\t";

          print "id:$row{'id'}\t";

          print "\n";



       }

If you have either Sybase and Sybperl or mSQL and MsqlPerl installed on your system, you can run the code in Listing 12.4 on either platform by editing it as indicated by the comments in the program. Revising the code to work with the other versions of the dbi library shouldn't be much more difficult. All Perl examples in the rest of this chapter use the msql_dbi.pl interface, so they can easily be ported to any other database for which WDB has been ported.

Defining the Database Schema for the Working Example

This example, as I said in "A Simple Working Example in Perl" earlier in this chapter, is a simple interactive hotlist of Web sites, providing the URL and a description for each site.Remote users can search the hotlist and submit new entries for potential inclusion. The administrator (who knows the appropriate password) can review submissions, adding approved submissions to the hotlist for public viewing. Think of the example as being a rudimentary equivalent of Yahoo (just as the Wright brothers' flying machine of 1903 was a rudimentary equivalent of an airliner).

Database Tables and Fields This database has three tables. The UIDs table shown in Table 12.1 is used for generating UIDs, so that each record in the other tables has a unique identifier that can be used as a primary key.

Table 12.1 The UIDs Table

Column	Type	Length	Not Null	Key
`TableName`	char	40	Y	Y
`MaxUID`	longint	4	Y	N

This technique is commonly used by database designers. You create one row in the UIDs table for each table that needs to have UIDs generated. MaxUID then records the highest UID yet assigned. Each time you create a new row for a data table, you increment the MaxUID value for that table and use this value for the new row of data.

The Hotlist table, shown in Table 12.2, contains data for all approved submissions to the database.

Table 12.2 The Hotlist Table

Column	Type	Length	Not Null	Key
`UID`	longint	4	Y	Y
`URL`	char	100	Y	N
`SHORTDESC`	char	50	N	N
`DESCRIPTION`	char	200	N	N

New submissions are stored in the Submissions table until they have been approved by the database administrator, as shown in Table 12.3. The Hotlist table and the Submissions table are otherwise identical.

Table 12.3 The Submissions Table

Column	Type	Length	Not Null	Key
`UID`	longint	4	Y	Y
`URL`	char	100	Y	N
`SHORTDESC`	char	50	N	N
`DESCRIPTION`	char	200	N	N

Directory Layout The data tables are stored by the database server (mSQL was used for this example). Although any database server almost certainly stores the actual data as disk files, the database server manages those files internally; indeed, that's the fundamental reason for using a database server. In addition to the database tables described in the preceding section, my sample application consists of three HTML documents and three Perl scripts placed in three directories, as follows:

~hhealy/public_html: (documents)
DemoHome.html Search.html Submission.html

The three files are the top-level main screen and the two forms for searching and submitting data. Because the files reside in my public HTML directory, they can be viewed by any user on the Web.

The following two programs reside in an unprotected directory within the CGI-BIN hierarchy on this Web server, so they can be run as CGI scripts by anyone on the Web:

.../cgi-bin/healy/public: (public CGI programs)
SearchHotlist.p ShowDetails.p

The following directory is password-protected by means of the .HTPASSWD and .HTACCESS files, so you must type a name and password to run the program in this directory as a CGI script:

.../cgi-bin/healy/private: (private program)
.htpasswd .htaccess ListSubmissions.p

Searching the Hotlist

The user of this application typically begins with a simple opening screen that lists the available options. I've intentionally kept the screen in figure 12.4 as simple as possible. Most of the hot links in this opening screen point to scripts that perform the actual work of providing database access.

Figure 12.4 : The opening screen for the Hotlist database has hot links to the available programs for database access.

Listing 12.5 shows the HTML for the opening screen.

Listing 12.5 The Opening Screen


<HTML>

<HEAD><TITLE>Hotlist Demo HomePage</TITLE></HEAD><BODY>

<H1>Hotlist Demo HomePage</H1>



<A HREF="Search.html">Search the Hotlist</A><p>



<A HREF="Submission.html">Submit an Item for the Hotlist</A><p>



<a HREF="/cgi-bin/healy/SearchHotlist.p">See All Records in Hotlist</a><p>



<a HREF="/cgi-bin/healy/ListSubmissions.p">Transfer Submitted Data</a>

to the Public portion of the database (password required).<p>

</PRE></BODY>

</HTML>

Clicking Search the Hotlist calls up the search form, which I've likewise kept as simple as possible (see fig. 12.5).

Listing 12.6 shows the HTML for the search form shown in figure 12.5.

Figure 12.5 : The user enters search criteria in this form, which then posts the criteria to the searching script.

Listing 12.6 The Search Form


<HTML>

<HEAD>

<TITLE>Hotlist Searching Form</TITLE>

</HEAD>

<BODY>

<H1>Hotlist Searching Form</h1>

Please enter a keyword and click the <b>Send</b> button at the bottom.

The <b>Reset</b> button will clear everything you've typed.<P>

<FORM ACTION="http://server.wherever/cgi-bin/healy/SearchHotlist.p" METHOD="POST"><hr>

<p>

<INPUT name="SHORTDESC" size=20 value=""><b>SHORT description</b><BR>

<hr>

<INPUT TYPE="submit" VALUE="Send"> <INPUT TYPE="reset" VALUE="Reset"><P>

<hr>

</FORM>

</BODY>

</HTML>

Submitting a search request (by entering a key to search and clicking the submit button) POSTs the search key to a simple searching script. This script generates a SQL query and submits it to the server. The script works just like the three sample database query scripts shown in listings 12.2 through 12.4, except that the SQL string is built up from the form data. SQL that has been generated on the fly is known as dynamic SQL.

NOTE

I do no error checking on input; I just wrap it in an SQL like clause in the program, which is shown in Listing 12.7 later in this section.

To keep this example as simple as possible, I provided only one search field, and I converted that field to all uppercase in the database. If your search form provides multiple lookup fields, you must generate a complex where clause based on which fields contain search strings. The WDB package (discussed in "WDB" later in this chapter) can build up such a where clause based on form contents.

In the program shown in Listing 12.7 (and in all my other form-handling CGI programs), to avoid all the messy details of reading and parsing the form information, I call Steven Brenner's cgi-lib.pl routines. You can find this library at many FTP sites or on the CD-ROM that is included with this book. One particular advantage of using cgi-lib.pl for database work is the fact that it handles GET or POST identically. Name-value pairs can be appended to the URL as ?name1=value1&name2=value2... or sent as a POST data block.

Listing 12.7 Perl Code to Perform the Search


#!/usr/local/bin/perl

#

#This program tested with Msql 1.0.6 under SunOS 4.1.4 and

#NCSA httpd 1.5 with Perl 5.001m



#do this as soon as possible!

print "Content-type:text/html\n\n";



#Define two little subroutines for urlencode/decode

#

#replace funny characters with %xx hex for urls

sub escape

{

    ($_)=@_;

    s/([^a-zA-Z0-9_\-.])/uc sprintf("%%%02x",ord($1))/eg;

    $_;

}



#replace + with space and %xx with that ASCII character

sub unescape {

    ($_)=@_;

    tr/+/ /;

    s/%(..)/pack("c",hex($1))/ge;

    $_;

}



#load the cgi library

require "cgi-lib.pl";

#load the Msql database interface library

require 'msql_dbi.pl';



# Start output



#read in the form contents:



&ReadParse(); #handles GET or POST forms w/identical results

#now @in has key=value pairs, and %in{key} = value

#Main Program Begins Here



$SHORTDESC = $in{'SHORTDESC'};

$SHORTDESC =÷h tr/a-z/A-Z/;  #convert to uppercase

$SCRIPT_NAME = $ENV{'SCRIPT_NAME'};



#connect to database server

$user = "healy";

$server = "server.wherever";

$passwd = "dummy";  #not used, for msql goes by Unix UID;

$database = "test";

&dbi_connect( $user, $pswd, $server, $database );



$Query = "select UID,URL,SHORTDESC from HOTLIST";

$Query = $Query . " where SHORTDESC like '%";

$Query = $Query . $SHORTDESC . "%'";



&dbi_dosql($Query);

#the next line is msql-specific; comment-out for other ver

&dbi_fieldnames('UID','URL','SHORTDESC','DESCRIPTION');



print "<h1>Search Results</h1>\n";



while( %row = &dbi_nextrow  )

     {

      print '<a href="';

      print "$row{'URL'}";

      print '">';

      print &unescape($row{'SHORTDESC'});

      print "</a> ";

      print '<a href="';

      print '/cgi-bin/healy/ShowDetails.p?';

      print 'UID=';

      print $row{'UID'};

      print '">';

      print "Details</a><p>\n";

      }



print "Click on a link to go there, or click on

<b>details</b> for a more-detailed description of the link\n";

After a search is performed, the output looks something like the sample screen shown in figure 12.6.

Figure 12.6 : After searching the hotlist database, the user sees a list of hot links like this one. Each hot link calls a Perl script that shows the details of that record.

Listing 12.8 is the HTML generated for the search results shown in figure 12.6.

Listing 12.8 Typical Output from Search of Hotlist Database, With URLs to Detailed Views


<h1>Search Results</h1>



<a href="http://arch-http.hq.eso.org/bfrasmus/wdb/install.html">

WDB INSTALLATION INSTRUCTIONS</a>

<a href="/cgi-bin/healy/public/ShowDetails.p_UID=2">Details</a><p>

<a.html href="http://arch-http.hq.eso.org/bfrasmus/wdb/">WDB HOMEPAGE</a>

<a href="/cgi-bin/healy/public/ShowDetails.p_UID=3">Details</a><p>

<a.html href="http://cscsun1.larc.nasa.gov/÷bbeowulf/db/

existing_products.html">LIST OF WWW DBMS GATEWAYS</a>

<a href="/cgi-bin/healy/public/ShowDetails.p_UID=7">Details</a><p>

<a.html href="http://server.wherever/÷hhealy/Submission.html">DB GATEWAY</a>

<a href="/cgi-bin/healy/public/ShowDetails.p_UID=13">Details</a><p>

Click.html on a link to go there, or click on <b>details</b>

for a more-detailed description of the link

Viewing the Detail Record

Notice that the Details links in the search-results screen shown in figure 12.6 (with the HTML given in Listing 12.8) point to a second CGI script and that each URL has ?UID=nn appended. This simple example shows how state is maintained on the client browser side; no history is maintained on the stateless server. Listing 12.9 shows the code for ShowDetails.p, the CGI program that generates the detail record. This program is similar to the preceding example.

Listing 12.9 Perl Code to Return the Detail View


#Up to here code is identical with SearchHotlist.p above

#

#now @in has key=value pairs, and %in{key} = value

#Main Program Begins Here

#

$UID = $in{'UID'};



#connect to database server

$user = "healy";

$server = "server.wherever";

$passwd = "dummy";  #not used, for msql goes by Unix UID;

$database = "test";

&dbi_connect( $user, $pswd, $server, $database );



$Query = "select UID,URL,SHORTDESC,DESCRIPTION from HOTLIST where UID = $UID";



&dbi_dosql($Query);

#the next line is msql-specific; comment-out for other ver

&dbi_fieldnames('UID','URL','SHORTDESC','DESCRIPTION');



print "<h1>Detail View</h1>\n";



while( %row = &dbi_nextrow  )

     {

      print "Hot link to this item: ";

      print '<a href="';

      print "$row{'URL'}";

      print '">';

      print &unescape($row{'SHORTDESC'});

      print "</a><br>";

      print "Detailed description: ";

      print &unescape($row{'DESCRIPTION'});

      print "<p>\n";

      }

Figure 12.7 shows an example of the detail screen, and Listing 12.10 shows the HTML generated for this screen.

Figure 12.7 : Clicking an URL in the search-results screen of figure 12.6 generates a detail view such as this one.

Listing 12.10 The Detail View of Figure 12.7


<h1>Detail View</h1>

Hot link to this item:

<a href="http://cscsun1.larc.nasa.gov/÷bbeowulf/db/existing_products.html">

LIST OF WWW DBMS GATEWAYS</a><br>

Detailed description: Comprehensive List of Tools for Building RDBMS CGI Gateways<p>

This simple example has only one hot link-to the URL that is being described. In a real application, you can (and should) have multiple hot links in your detail screens-hot links that perform lookups on this database or other databases. The HTML snippet in Listing 12.11, taken from a hypothetical Employee Detail screen, shows what I mean.

Listing 12.11 A Hypothetical Employee Detail Screen


<h1>Tom Anderson</h1>

Department:

<a href="http://server.wherever/cgi-bin/DeptSrch_Deptid=17">Engineering</a><p>

Location:

<a.html href="http://server.wherever/cgi-bin/LocSrch_Locid=29">Podunk</a><p>

Position:

<a.html href="http://server.wherever/cgi-bin/PosSrch_Posid=17">CAD.html Technician</a><p>

Mail Stop:

<a href="http://server.wherever/cgi-bin/EmpSrch_Mailid=97">POD-43</a><p>

Clicking any field in the detail record performs a lookup of related records in that category. The list would contain the names of employees, and the URL behind each name would perform a lookup on the employee ID. This use of URLs to generate further searches effectively converts a relational database to a giant hypertext document.

Such hyperlinks can just as easily link one database with another. Several large international databases that are widely used in molecular biology, for example, have Web interfaces. A local database used by one research group for its own internal data can include hot links to related information from one or more of the international databases.

Some Major Scientific Databases on the WWW

If you want to check out some major biological and chemical databases, use the following URLs:

http://expasy.hcuge.ch/-the molecular-biology server of Geneva University Hospital
http://dot.imgen.bcm.tmc.edu:9331/seq-annot/home.html-the Sequence Annotation server
http://www.ncbi.nlm.nih.gov/-the National Center for Biotechnology Information home page
http://www3.ncbi.nlm.nih.gov/Omim/-the Online Mendelian Inheritance in Man Genetic Database
http://www.pdb.bnl.gov/-the Brookhaven Protein Data Bank
http://www.unige.ch/crystal/w3vlc/data.index.html-a list of crystallography databases
Most of these URLs have links to other sites that have related information.

Consider a hypothetical business-related example. The marketing people in your company have created a Web-accessible database of product information, including which company locations build given parts of each product. Now suppose that you're a programmer in the personnel department and that you've been given the job of putting your company directory on the Web.

You could use hyperlinks in your directory database to link product information with directory information. The location field in a directory detail screen could show the list of products made at that location, for example.

With the cooperation of the people who maintain the product-information database, you could also work the other way. When you see a record that lists the various locations that make each part of the product, you could click links that locate people who work at each location.

The possibilities for using the Web to integrate multiple databases in different locations are limited only by the programmer's imagination-and by the quality of available databases. The task may also require a much higher degree of connection among the parts of your MIS organization than was the case before. After all, Internet and intranet applications are basically about pulling scattered information together.

Submitting Data to the Hotlist

A database is scarcely complete without providing a means for entering data. For some databases, the Web front end allows only searching. For this example, however, I also include a simple data-submission form, so that any user can submit proposed records for possible inclusion in the publicly searchable database. Figure 12.8 shows the submission screen.

Figure 12.8 : This form is used for the remote submission, via the Web, of records to be added to the database.

Listing 12.12 shows the HTML for the submission screen.

Listing 12.12 A Simple HTML Form to Submit New Data via the Web


<HTML>

<HEAD>

<TITLE>Url Submission Form</TITLE>

</HEAD>

<BODY>

<H1>Url Submission Form</h1>

Please enter data in the fields below and click the <b>Send</b> button

at the bottom. The <b>Reset</b> button will clear everything you've

typed.<P>

<FORM ACTION="http://server.wherever/cgi-bin/healy/public/Submit.p"

METHOD="POST">

<hr>

<p>

<INPUT name="URL" size=60 value="" ><b>The URL</b><BR>

<INPUT name="SHORTDESC" size=20 value=""><b>SHORT description</b><BR>

<TEXTAREA name="DESCRIPTION" ROWS=2 COLS=40></TEXTAREA>

Longer Description (up to 3 lines)<BR>

<hr>

<INPUT TYPE="submit" VALUE="Send"> <INPUT TYPE="reset" VALUE="Reset"><P>

<hr>

</FORM>

Note:<p>

The data entered will be reviewed by the database administrators before being added to

the database; submitted records will usually be available for viewing in one or two

working days. Please check back in a few days to confirm that your submission has been

added.<p>

</BODY>

</HTML>

Submitted data is posted to the script shown in Listing 12.13.

Listing 12.13 Perl Script to Handle Data-Submission Form


#Up to here code is identical with SearchHotlist.p above

#

#now @in has key=value pairs, and %in{key} = value

#Main Program Begins Here

#connect to database server

$user = "healy";

$server = "server.wherever";

$passwd = "dummy";  #not used, for msql goes by Unix UID;

$database = "test";

&dbi_connect( $user, $pswd, $server, $database );



$UID = $in{'UID'};

$URL = $in{'URL'};

$SHORTDESC = &escape($in{'SHORTDESC'});

$SHORTDESC =÷ tr/a-z/A-Z/;  #convert to uppercase

$DESCRIPTION = &escape($in{'DESCRIPTION'});

$Query = "select MaxUID from UIDs where TableName = 'SUBMISSIONS'";

&dbi_dosql($Query);

#the next line is msql-specific; comment-out for other ver

&dbi_fieldnames('MaxUID');

%row = &dbi_nextrow;

$MaxUID = $row{'MaxUID'} + 1;

$Query = "Update UIDs Set MaxUID = $MaxUID where TableName =

   å'SUBMISSIONS'";



&dbi_dosql($Query);

$Query = "Insert into SUBMISSIONS values(";



$Query = $Query . $MaxUID . ",'";



$Query = $Query . $URL . "','";

$Query = $Query . $SHORTDESC . "','";

$Query = $Query . $DESCRIPTION . "')";



&dbi_dosql($Query);



print "<h1>Submission Accepted</h1>\n";

print "Thank you for your submission. \n";

print "It will be reviewed by the database administrator \n";

print "for possible inclusion in our hotlist \n";

A couple of interesting wrinkles to this script don't appear in the other programs:

All data is inserted-in URL-encoded form-into the Submissions table, not into the publicly searchable Hotlist table. This feature gives the administrator a chance to review all submissions.
The UIDs table is used to generate a unique ID for each submitted record.

The possibility exists that two people might submit new data at precisely the same moment, so between the instant of getting the current value of MaxUID and the instant of updating UIDs, another user could get the same UID value. Although that scenario is not very likely for a simple application such as this one, it's a real concern for active databases. Most high-end database engines have a feature called transactions (which mSQL doesn't support). The programmer declares the actions of getting the UID and updating the UID to be one transaction that must be run as a unit or not at all.

NOTE

For simplicity, the script shown in Listing 12.13 doesn't validate user input-it simply sticks whatever the user entered into a table. Your real applications should perform appropriate validation on the input data. Your script also could try to GET the URL entered by the user, to verify that it's a valid URL.

Generating SQL Code to Transfer Submitted Data

The last piece of this package is a mechanism by which an administrator can transfer data from the Submissions table to the Hotlist table. To sidestep the complexities of updating via the stateless Web server, I use a different approach: a CGI script that doesn't perform any updating itself but that generates a SQL script to perform the required actions. Listing 12.14 shows the Perl code.

Listing 12.14 Generating Transfer SQL to Move Data to Public Table


#Up to here code is identical with SearchHotlist.p above

#

#now @in has key=value pairs, and %in{key} = value

#Main Program Begins Here

#connect to database server

$user = "healy";

$server = "server.wherever";

$passwd = "dummy";  #not used, for msql goes by Unix UID;

$database = "test";

&dbi_connect( $user, $pswd, $server, $database );



$Query = "select UID,URL,SHORTDESC,DESCRIPTION from SUBMISSIONS";

    print "#SQL Query: $Query\n\n";

    print "#\n#\n#\n";

    print "#Review this SQL script with care, then ";

    print "pipe it through msql\n#\n#\n";



&dbi_dosql($Query);

#the next line is msql-specific; comment-out for other ver

&dbi_fieldnames('UID','URL','SHORTDESC','DESCRIPTION');



print "#Inserting into HOTLIST\n\n";

while( %row = &dbi_nextrow  )

     {

      print "Insert into HOTLIST values(\n";



      print "$row{'UID'}'\n,";



      print "$row{'URL'}'\n,'";

      print "$row{'SHORTDESC'}'\n,'";

      print "$row{'DESCRIPTION'}'";

      print ')\g';

      print "\n";

      }



$Query = "select MaxUID from UIDs where TableName = 'SUBMISSIONS'";

&dbi_dosql($Query);

#the next line is msql-specific; comment-out for other ver

&dbi_fieldnames('MaxUID');

$MaxUID=0;

$Query = "select MaxUID from UIDs where TableName = 'SUBMISSIONS'";

&dbi_dosql($Query);

#the next line is msql-specific; comment-out for other ver

&dbi_fieldnames('MaxUID');

$MaxUID=0;

%row = &dbi_nextrow;

$MaxUID = $row{'MaxUID'};

print "\n\n#Updating UIDs\n\n";

print "Update UIDs Set MaxUID = $MaxUID where"

print " TableName = 'HOTLIST'" . '\g' . "\n\n";



print "\n\n#Deleting from SUBMISSIONS\n\n";

print 'delete from SUBMISSIONS where UID <= ' . $MaxUID . '\g';

Running this script via the Web generates SQL similar to that shown in Listing 12.15.

Listing 12.15 Typical Transfer SQL Generated by the Program in Listing 12.14


#SQL Query: select UID,URL,SHORTDESC,DESCRIPTION from SUBMISSIONS

#

#

#Review this SQL script with care, then pipe it through msql

#

#

#Inserting into HOTLIST



Insert into HOTLIST values(

18

,'http://gasnet.med.yale.edu/'

,'GASNET'

,'The%20Gasnet%20server%20has%20various%20resources%0D%0Afor

å%20Anesthesiology...')\g

Insert into HOTLIST values(

17

,'http://www.ncbi.nlm.nih.gov/BLAST/'

,'BLAST'

,'BLAST%20Homepage%20at%20the%20National%20Center%0D%0Afor

å%20Biotechnology%20Information')\g

Insert into HOTLIST values(

16

,'http://www.eol.ists.ca/÷ddunlop/wdb-p95/'

,'WDB%20POSTGRES'

,'WDB%20Port%20to%20Postgres')\g

Insert into HOTLIST values(

15

,'http://www.comvista.com/net/www/cgidata.html'

,'MAC%2FWWW%2FDB'

,'List%20of%20URLs%20with%20info%20on%20Mac%20WWW%2FDBMS%0D

å%0Ascripting')\g



#Deleting from SUBMISSIONS



delete from SUBMISSIONS where UID <= 18\g

#Updating UIDs



Update UIDs Set MaxUID = 18 where TableName = 'HOTLIST'\g

The database administrator edits the SQL that is generated to delete records that shouldn't be added to the Hotlist table and then feeds the script through the mSQL command line. Alternatively, if you want a purely Web solution, you can modify this script to generate a form that contains all the SQL in a scrollable text area. That form would submit the edited SQL to another script that pipes the SQL through the mSQL command line.

Probably the cleanest approach is to generate updating forms that contain database fields instead of SQL code. You then would need to address the issue of conflicting updates, however, probably by using time stamps. The sequence would be something like the following:

Generate an updating form with all data in editable fields and the time stamp included in the form as a hidden field.
On submission of the edited form, first check the time stamp in the database record against the time stamp in the form. If the time stamps vary, generate an error message and quit without updating the record.
When you update the record, also update the time-stamp value.

With mSQL, you need to update the time stamps in the CGI script and in every program that updates the database. With many higher-end servers, you can define a time stamp that the database engine maintains automatically, eliminating the possibility that a programmer will forget to change time stamps whenever data changes.

NOTE

In Sybase, the term time stamp is a misnomer; the time-stamp value has no relationship to clock time at all. Sybase simply takes responsibility for guaranteeing that the value of this field will change every time that any other field in that record changes. Thus, the sole value of Sybase time-stamp fields is to check for conflicting updates.

Perl Tools for Web/DBMS Work

Database access via the Web is such a hot area that listing every available or promised product would be impossible. Every major database vendor either has a Web-related product available now or promises to have one soon, and many programmers are working on their own database gateways, running with a wide range of database engines. You can safely assume that a mechanism exists for linking any reasonable combination of database server and HTTP server, but the quality of the implementations varies wildly.

If you're thinking about putting a database on the Web, I suggest that you start by reading Usenet postings in newsgroups that are related to your favorite database packages. You also should look at the Web site of your database vendor.

The following sections list a few tools that are available on the Web and some Web sites that contain pointers to information about Web/database tools.

Installing Web/Database Tools

Some of the tools mentioned in the following sections are commercial programs, which generally come with installation instructions; others are shareware or freeware. With all shareware and freeware programs, part of the price that you pay for low-cost software is less hand-holding when it comes to installation. Generally, you install these programs in the same manner that you would install any shareware program for your OS.

Many of the tools are for UNIX servers and typically come as source code in tar form. If you've never installed such a package, you should seek out your local UNIX guru for assistance. UNIX comes in many flavors, and every system that I've used has had local customizations added to the standard system, so giving detailed directions here is impossible. Most of the UNIX packages mentioned in this chapter come with makefiles or other installation scripts, but you probably need to check and edit these scripts as required by your local configuration. Pay particular attention to where various directories are located on your system.

Installation and configuration information for most of these tools is available on their authors' Web sites, so visit those sites for details. The directions for many database interfaces assume that you have the database and its API library already installed, so you should do that part first.

Most tools for microcomputer operating systems (such as Windows and the Macintosh System software) are distributed as executable binaries, rather than source code, so installation typically is somewhat simpler than for UNIX.

WDB

WDB is a freeware package, written entirely in Perl, that greatly simplifies the task of building a Web front end to a relational database under UNIX. (Source code and installation directions for WDB, in UNIX tar format, are included on the CD-ROM that comes with this book.) Because all the database-specific aspects of WDB are confined to one module of the program, and because the entire package is written in Perl, porting WDB to various databases has proved to be relatively easy. WDB has been successfully ported to several database servers and to many versions of UNIX, including Linux. If you're UNIX- and Perl-oriented, WDB should be high on your short list of preferred free solutions, especially because the excellent shareware database mSQL is one of the supported SQL engines.

The heart of WDB is what its author, Bo Frese Rasmussen, calls a form definition file (FDF). For each searching form, you write a description in a relatively high-level form definition language to list the tables and fields to be used for this form, the search constraints, how you want the output to be formatted, and so on. WDB comes with a utility that generates an FDF template from a database table or view by querying the metadata provided; you then edit its output. Although the program can't do your thinking for you, many details that otherwise would require tedious hand-hacking are filled in by the FDF-making utility.

Given an FDF, WDB generates an HTML search form on the fly. As with the program-generated FDFs, you may want to edit the output. When the user submits a completed search form, WDB generates the SQL query; performs the search; and returns a list of matching records, each item of which has a hot link to the full screen view of that record.

The upshot is that WDB does much of the donkey work (especially for ad hoc searches) for you. WDB's feature set is somewhat limited, but if you know Perl, you probably can modify it to your heart's content; the code is well structured and well commented. You can handle simpler customizations by adding snippets of Perl code to the FDF, because several "hooks" enable you to call your own code at various stages in the query process.

A particularly nice feature is the capability to define input- and output-conversion functions that apply to specified fields. WDB automatically invokes these functions at the correct time in its query-generation and formatting cycle. WDB also has several features that help you generate URLs from database lookups, which can perform other lookups of related data when they are clicked. Clever use of this elegant feature of WDB can effectively turn your relational database into a gigantic hypertext document.

ON THE WEB

You can find information about WDB, installation instructions, and the downloadable source code at http://venus.dtv.dk/~bfr/.

Web/Genera

Web/Genera (by Stanley Letovsky, Mary B. Berlyn, and others) is another public-domain software tool set that simplifies the integration of Sybase databases into the Web. You can use the tool to retrofit a Web front end to an existing Sybase database or to create a new database. Like WDB, Web/Genera requires you to use a high-level schema notation to write a specification of the Sybase database (and of the desired appearance of its contents on the Web). The Web/Genera programs process this description to generate SQL commands and formatting instructions that together extract objects from a database and format them in HTML.

Web/Genera also supports form-based relational querying and whole-database formatting into text and HTML formats. The notation of Web/Genera seems to be richer than that of WDB, because the former includes notions (such as set and group) for field values, enabling you to build pick lists on the fly. Unlike WDB, however, Web/Genera doesn't provide hooks that allow users to write customized extensions. To make any extensions to Web/Genera, a user must change its source code, which is written mainly in C and Perl.

ON THE WEB

You can find information about Web/Genera, installation instructions, and the downloadable source code at http://gdbdoc.gdb.org/letovsky/genera/genera.html.

MiniSQL (mSQL) and W3-mSQL

mSQL is a shareware SQL engine (free to academic users) that has been ported to most versions of UNIX. In combination with WDB, mSQL provides an excellent, low-cost way to build a simple relational database with a Web front end. An excellent Perl interface for mSQL is available, and the combination of WDB with mSQL is known to work well under Linux.

mSQL's author, David Hughes, has also written a new program called W3-mSQL, which was in alpha at the time when this chapter was written. Although both mSQL and W3-mSQL are included on the CD-ROM that comes with this book, I haven't yet tried W3-mSQL, so I don't know how well it works. According to Hughes, W3-mSQL works via HTML documents with embedded tags that perform database lookups, so you don't need to write any CGI scripts to use it.

mSQL supports only a relatively small subset of SQL, but what it does support is in strict conformance to ANSI standards. According to the mSQL WWW site, David Hughes is working on a major upgrade of mSQL that will support a much larger set of features, but he doesn't yet know when this upgrade will be released.

ON THE WEB

You can find information about mSQL and W3-mSQL, installation instructions, and the downloadable source code at http://Hughes.com.au/.

DBI

The long-term future of Perl-database interfaces in many applications, including Web interfaces, may well lie with the new DBI project. The long-term goal of this project is to create a generic Perl interface with versions for every SQL database, in the hope of providing greater portability for Perl database applications. DBI also takes advantage of the Perl 5.0 object-oriented extensions.

ON THE WEB

You can find information about the evolving DBI standard at http://www.hermetica.com/technologia/DBI/index.html.

Why I Chose the WDB Library Instead of This One

The DBI (uppercase) interface defined by this project isn't the same as the dbi (lowercase) interface used in WDB. I used the WDB version in my examples for three reasons:

The WDB interface is simpler.
The DBI version for mSQL was released recently; all my existing code for both Sybase and mSQL uses the WDB interface.
At the time when this chapter was written, the new DBI standard was something of a moving target. DBI's authors recently performed a complete rewrite based on ODBC standards.

ODBC Tools

In the Windows/Windows NT environment, ODBC has long been one of the most popular ways to access databases from other programs. As a result, most popular Windows databases, and most programming languages targeted for the Windows environment, have ODBC drivers included or available. If you come from a UNIX-oriented background, you may want to use one of the ODBC drivers for NTPerl, which you can find in various places on the Net. If you come from a Windows background, you'll probably be happier using Visual Basic or Visual C with any of the numerous ODBC drivers that are available for those languages.

NOTE

If you use ODBC under Windows or Windows NT for your database CGI programs, you have a choice of query languages: the SQL interface to ODBC, or such ODBC-specific features as dynasets. On one hand, dynasets can improve performance significantly, because the SQL parsers in many ODBC drivers are rather slow; and dynasets can be easier to code than SQL for simple lookups. On the other hand, the use of SQL is more portable to non-ODBC environments.

ON THE WEB

You can find information about ODBC access from Perl under NT at the following URLs:
http://info.hip.com/ntperl/PerlFaq.htm
http://www.bhs.com/
ftp://ftp.digex.net/pub/access/psii/public_html/home.html

Some Useful Hotlists

Many sites on the Web have pointers to various Web/database tools and projects. The following URLs are the most comprehensive listings of Web/database resources that I have found in an extensive search for such information:

ON THE WEB

http://cscsun1.larc.nasa.gov/~beowulf/db/[em]is the most comprehensive list of Web/database resources that I have found. The hotlist includes freeware, shareware, and commercial products, as well as a wealth of Perl-database tutorial materials.

http://cuiwww.unige.ch/~scg/FreeDB/FreeDB.list.html is an extensive list of free or low-cost database tools, not all of which are specifically Web-oriented.

Problem-Solving

Because a Web/database gateway involves multiple programs, possibly running on multiple machines, various things can go wrong. Your application may not work at all, or it may run far too slowly. For database-specific debugging and performance tuning, consult the documentation for your database package. The following sections provide a few general hints on debugging and tuning a Web/database gateway-hints that should apply to any platform.

NOTE

Bear in mind that no amount of after-the-fact tweaking can replace careful planning at the outset of your project.

Debugging

Debugging a Web/database gateway isn't simple, because problems can occur at multiple levels. I suggest that the first version of any database CGI script not try to access the database, but instead display the generated SQL code in a document that begins with the header Content-type: text/plain, followed by a blank line. When you think that the generated SQL looks correct, pipe it through the command-line interface of your database. After this step confirms that you can generate correct SQL, the next step is to write your own client program to send the SQL to the server and execute it from the command line.

If you can't get your CGI script to generate any SQL, try sending your form input to a CGI script that does nothing more than list all the information that it got from the Web server. Given that list, add hard-coded assignment statements to your code, so that you can run it from the command line and watch how it behaves in a simulated CGI environment. Only then should you try to combine the CGI and database interfaces in one program and run that program as a CGI script.

If that procedure doesn't work, look at the error log of your server for any messages that your program may have sent to standard error. If that log tells you little or nothing, you need to think about the differences between a command-line environment and the CGI environment. Are you assuming something about the PATH environment variable, which has far fewer directories in a CGI context? Do you need other environment variables for database work? Most Sybase installations, for example, define an environment variable (DSQUERY) that tells Sybase clients which server to use if none is specified. In the CGI environment, this variable isn't defined, so your script will fail unless you specify the server in your script.

If you have installed cgiwrap (mentioned in "Security Issues" earlier in this chapter), it includes a very handy debugging flag; see the documentation that comes with the cgiwrap distribution. This feature can be a tremendous time saver, and it alone can repay the modest effort of installing cgiwrap.

Finally, remember that CGI scripts usually run at a lower-privilege user ID than regular programs do. If your database server uses OS-level user IDs for authentication, you may have client programs that work from the command line but not as CGI scripts.

The general strategy is to divide and conquer, so that you can isolate the problem to a particular part of your system. Start simply, and build up to the full functionality that you want one step at a time.

Tuning Performance

From the hardware perspective, database and HTTP servers tend to be I/O-bound, not CPU-intensive, which means that the performance of your Web/database application depends mainly on disk access and process-launching overhead. A fast hard drive and ample memory are more important than a fast processor. A badly designed Web/database application, however, can bring even the most powerful server platform to its knees.

Remember that your application is a client/client/server application in which your CGI script connects as a client to the database engine, sending SQL to the engine and accepting data returned from the engine. Considerable overhead is associated with setting up and tearing down the connection between the CGI script and the database server. Even a well-designed CGI application has to incur this overhead every time a request comes to the HTTP server; from the viewpoint of the remote user, a single session involves multiple database logons. This situation is unavoidable, but a badly designed CGI program can make matters worse if it opens and closes multiple connections to the database engine in a single HTTP request.

Even if you avoid opening an excessive number of connections to the database server, you can still hurt performance by sending too much data back and forth between the CGI script and the database server. It's a nearly universal rule in client/server database programming to do as much work as possible inside the database server, to minimize the overhead associated with transferring data between the database client and the database server. In CGI work, in which you're already incurring some extra overhead because each HTTP request requires a new connection to the database, this principle applies with particular force.

For server efficiency, observe the following rules:

Do as much work as possible per connection to the database server.
Do as much work as possible per SQL statement.
Filter the results inside the database server as much as possible.

These rules apply especially if your database engine and your HTTP server reside on different machines, because all data transfer between your CGI program and the database server incurs network overhead, as well as the overhead inherent in any connection to the database server.

Recently, a colleague asked me for help with a CGI interface to an Illustra database that was taking more than three minutes to respond to some queries. By applying these principles and taking advantage of Illustra's unusually powerful version of SQL, he could get the response time to less than 10 seconds. Most database servers can perform only internal manipulation of fixed-size data types; large text fields can be copied in or out but must be manipulated outside the database program. Illustra's string-manipulation commands can be applied to any amount of text. We used this feature of Illustra to build complex SQL queries that would perform extensive string manipulation inside Illustra and return the results. Most Web queries can now be handled by one or two monster SQL statements instead of by many small SQL statements.

The Future of Web/Database Interfaces

This chapter presented the fundamentals of building a Web gateway to a relational database engine, explaining the unique capabilities that this combination makes possible and pointing out some of the limitations inherent in doing this kind of thing over the Web.

As you've seen, current Web technology has some serious limitations for database work: no client-side validation, no facility for extending the user interface toolkit provided by HTML, and no mechanism for the graceful handling of concurrent updates. Equally serious for multimedia database work-something for which the Web, by its nature, seems to be so well-suited-are the limitations of conventional database technology, which supports a limited set of data types. All other types of data must be stored as text or binary fields that can't be manipulated within the database itself.

On the other hand, object-oriented databases are extensible, but they lack the data integrity and flexible querying features of relational databases. The new object-relational paradigm exemplified by Illustra has enormous promise, because it addresses the limitations of both relational and object-oriented databases. On the one hand, like most relational databases but unlike most object-oriented databases, Illustra has a full SQL implementation and facilities for defining database integrity constraints. On the other hand, unlike standard relational databases, Illustra provides the capability to define new data types and associated operations. And the recent merger of Illustra with Informix means that the object-relational technology of Illustra should have a very bright future indeed-joining the powerful new technology of Illustra with the industrial-strength server technology for which Informix is known.

Object-oriented database technology-and especially object-relational database technology-may also be an excellent server-side counterpart to such client-side extensions as Java and JavaScript. For database work, client-side scripting promises three major advantages:

An active form written in a client-side scripting language can provide field-level validation of entered data.
Because Java applets can talk to the network, an active form that's used to edit an existing record could log on to the server and maintain an active connection, thereby averting the problems caused by the stateless nature of the Web.
Client-side scripting promises to make Web browsers extensible. Just as you can define a new type of data in a object-oriented database, you can define a new type of data object for use on the Web and write active forms to display it. This method averts the difficulties of the current helper-application approach. With a conventional Web browser, each new type needs a new helper application. Versions of this helper application must be written for every platform that your users have, and you must help users obtain and install your helper application. With a platform-independent language for creating scripts that a browser automatically-and transparently-downloads and executes, creating and using new data types should be much easier.

In my view, these two new technologies have the potential to revolutionize the Web-after the bugs are worked out and robust implementations are made available for every common platform. And I don't say this because of all the media hype about Java; I say it because I'm all too familiar with the limitations of the current Web and database technologies and with the various kludges that developers use to circumvent them. Some real substance exists behind the media excitement; in the fullness of time, we'll all find out whether the implementations of these new ideas live up to their promise. Database-specific Java and JavaScript tools are beginning to appear, but they have yet to be integrated with the database-oriented CGI tools discussed in this chapter. So if you combine client-side scripting with CGI/DBMS programming right now, you truly are on the bleeding edge.

NOTE

This work was supported in part by the Human Brain Project (NIMH, NASA, and NIDCD) grant R01 DC02307 and by NIH Grant G08LM05583 from the National Library of Medicine's IAIMS Program. The section on Web/GENERA was written by Kei Cheung, one of the authors of that program.

From Here...

In a single chapter, I have only scratched the surface of what you can do with a relational database engine and some CGI scripts. A whole world of possibilities awaits you; the field is exploding at a truly amazing pace. Above all, remember that the Internet itself is your single most useful resource. I've tried to get you started by listing URLs to every tool that I mention in this chapter, but new things appear every day.

The rest of this book is full of information that you will need as you build your WWW/database application. You especially may want to read the following chapters:

Chapter 3 "Advanced Form Processing and Data Storage." This chapter really is about one specific aspect of what Chapter 3 discusses in more general terms.
Chapter 7 "Dynamic and Interactive HTML Content in Perl and CGI." The section on using persistent cookies mentions a technique that can be very useful in database work.
Chapter 9 "Understanding CGI Security." The CGI security issues discussed in this chapter are especially important if you are hooking a production database to your WWW site.
Chapter 11, "Database Interaction." Whether or not you will be working with Oracle, you should at least take a quick look at the Oracle example in this chapter.