by Paul Doyle翻譯:劉智漢
當你要在 www servers 上使用 Perl 之前,你必須要花費點時間來了解 Perl 這個程式 語言。
本章提供了 Perl 語言的概觀。雖然它不是 Perl 語言的核心,但也足夠去使用
Perl 了。就像你使用語言一樣,當你使用 Perl設計了一段時間之後,你便會想要去鑽研更
深的東西。
The "Camel Book" |
|
本章節我們不會去研究太深的東西,所有深入的課題我們會在本書的第五部分來探討。 本章結束之後,, 你應該知道要去哪裡找到一些特定問題的答案。 如果你對 Perl已經 有些了解,你可以跳過本章去尋找你想要學習的部分。如果你根本就不熟悉任何一種程 式語言,本書並不適合作為初學者學習程式語言的入門書籍。
根據原作者的自述,Perl 會被發展出來起因其實是因為某人過度懶惰的結果。
NOTE |
|
讓我們回到 1986年,當時有個名叫 Larry Wall 的 UNIX 程式設計師,發現到他每天都得 和為數眾多又複雜的文件報告為伍 ,因此他開始使用 awk 來處理這些文件。不久,他發現 到 awk 不能符合他的需求,在找不到其它合用的工具的情況下,他決定自己寫一些 程式,來解決這些困擾他許久問題。
Larry 寫了一些公用程式來管理他工作上的特殊需求。 但是不久這些工作特性又變了, 於是他又必須要重寫一些工具來應付新的工作需求。為了不浪費時間,他發明了新的程式語 言,並且為這個語言寫了個直譯器。他發明的語法看起來像是 paradox,但事實上卻不然。
新的語言加強了對系統的管理及文字的處理,經過幾次改版之後,新的語言可以處理正 規的語法,符號,以及網路通信協定。這就是現在聞名的 Perl 語言。
Perl 從其它的工具借了許多東西來使用,特別是 sed 及 awk.雖然 Perl 作的許多事情
用sed,awk 或是 UNIX shell scripting 語言來作,都可以得到相同的結果,但是 Perl 會
作的比較好。
NOTE |
|
Perl 可以將低階的東西處理的非常好,特別是 Perl 5。他有許多處理方式和 C 是相通 的,但是 Perl 處理資料型態,記憶體配置,卻比 C 來的更自動化。
Perl 程式碼也呈現出和 C 的程式碼相似。也許這是因為 Perl 是用 C 寫成的,也可能是 Larry 發現用 C 的語法來作表達會更容易上手。雖是如此,Perl 卻有著比 C 更簡潔的語 法。
Perl 是一套免費的工具。 全部的 source 以及文件都可供人免費複製,編譯或是列印 。當你散播軟體時你並不需要付給任何人版稅,也沒有任何的法律約束。
Perl 仍然是一個保有著作權的產品,假如原始程式碼被完全公開,某人可能就會對程式作 少部分的修改,然後重新編譯接著便用其他的產品名稱去販賣牟利。
GNU 組織是一個可以保護散播的免費軟體被竄改牟利的組織。 在他的授權之下,原始碼可 以被任何人免費的使用,但是任何根據原始碼所發展出來的程式也必須用同樣的方式來散 播出去。換句話說,如果你用來發展妳程式的原始碼是在 GNU 授權之下取得的,那你也必 須將你的原始碼給任何有需要的人。
這種方式可以提供對軟體發展有興趣的人一項充分的保護,但卻可能產生出過多的類似程 式出來,這可能會造成喧賓奪主的情況發生。也可能會造成版本控制上的問題,可能某些 特定的程式需要在某些特定的版本上執行,但卻用了不同的版本,如此一來,對使用者便會 造成相當大的困擾。
這便是為什麼 Perl會在 Artistic License 這個機構下發表的原因。這個機構的理念不同 於 GNU。Artistic License 表示如果有人使用了 Perl 的原始碼來發展他們自己的程式, 他必須明白地表示他所發表的軟體並不是 Perl。所有更改的部分都必須明白的指出來, 而且執行檔不能和被更改的程式同名。如果有必要的話,原始程式碼必須和更改過的程式 一起被散播出去。這種結果會讓原作者被清楚的認同為此套軟體的原作者。
新版本的 Perl 可以在網路上或是各 FTP 站上找到。Perl的原始碼以及文件可以在非 UNIX 系統上執行。UNIX 的二位元檔通常在網路上找不到,這是因為幾乎所有的 UNIX 系統都有 C 語言編譯器,你可以用內附的 C 編輯器在你的機器上重新組譯一次,以確保 Perl 能在 妳的系統上執行,這種方式比在別人機器上組譯 Perl 程式後,再移到你的電腦上會好的 多。
Perl 套件通常都會伴隨著一套稱為 Configure 的公用程式,它可以根據不同的電腦主機狀 況自動幫你設定好原始碼以及 makefile。他會自動偵測系統內的組態,然後依據不同的狀 況幫你把組譯 Perl 的需要訊息都設定好,當然,你也可以自行去設定這些訊息。
當你安裝完 perl 之後,你要如何使用這個奇妙的工具來使網站更多采多姿呢?什麼是 Perl 的程式設計?你要如何來使用它呢?
我們打算在本章節的最後才來回答前面的兩個問題。所以我們直接來探討第三個問題, 使用 perl 是一件極簡單的事, 但是程序變數(procedure varies)卻會因為不同的系統而有些許的變化。
首先假設 perl 已經正確的安裝在你的電腦上了。 執行 perl 的最簡單的方法就是去啟動perl 直譯器,就像下面一樣:
perl sample.pl在這個範例中,SAMPLE.PL 是 perl 檔案的名稱,而 perl 則是 perl 直譯器。這個範例假設 perl 是目前的執行路徑,如果不是的話,你必須要給定 perl 的所在路徑,就像下面一樣:/usr/local/hin/perl sample.pl
這個語法是執行 perl 時的最好語法。因為他排除了你可能會呼叫到其它版本 perl 而不是你想要執行的版本。 因為我們將會在 Web servers 上執行我們的程式,因此,最好還是使用完整的路徑以避免出錯。
幾乎每一種系統都有命令列的介面( command-line interface)。 在使用 Windows NT 時有個小訣竅:
c:\NTperl\perl sample.pl
在 UNIX 中執行 perl: UNIX 系統有另外一種執行方法便是可以在 script 檔中執行perl程式。你可以在 perl 檔案中的第一行放置下列的敘述,讓程式自動去執行perl程式:
#!/usr/local/bin/perl
這一行告訴 UNIX 這個程式是由這個 /USR/LOCAL/BIN/PERL 路徑的 perl 程式去翻譯底下的命令,如此妳就可以讓程式自行去執行,但要作以下的修改:
chmod +x sample.pl
如此,便可以直接執行這個程式,此程式的第一行便會告訴作業系統到哪裡去找到執行這個 perl的程式。
在 Windows NT中執行PERL程式
前面所提到的東西在 UNIX 中可以很正常的執行,但是在 Windows NT 中就不見得可以執行
。你可以使用 NT 的檔案管理員來建立附檔名為 .PL 的檔案和 Perl 的檔案關聯,這樣
Perl 就可以在 NT中執行。只要是附檔名為 .Pl NT 就知道這是 Perl 程式,就會自動去
呼叫 Perl 來執行這個附檔名為 .PL 的程式。
備註 |
|
Perl 提供了各種針對不同目的而使用的參數(參閱 Table 1.1,大部分的參數都列在這
裡)。
the -t switch in particular is de rigueur in Web-based Perl scripts.
Table 1.1 Perl 命令列參數
參數 |
功能 | 使用目的 | 備註 |
-0 |
Octal character code | Specify record separator | Default is new line (\n) |
-a |
Automatically split records | Used with -n or -p | |
-c |
Check syntax only do not execute | ||
-d |
Run script, using Perl debugger | If Perl debugger is installed | |
\-D |
Flags | Specify debugging behavior | Refer to the PERLDEBUG man page on the CD-ROM that comes with this book |
-e |
Command | Pass a command to Perl from the command line | Useful for quick operations; see tip after this table for an example |
-F |
Regular expression | Expression to split by if -a is used | Default is white space |
-i |
Extension | Replace original file with result | Useful for modifying contents of files; see tip after this table for an example |
-I |
Directory | Specify location of include files | |
-l |
Octal character code | Drop new lines when used with -n and -p, and use designated character as line-termination character | |
-n |
Process the script, using each specified file as an argument | Used for performing the same set of actions on a set of files | |
-p |
Same as -n, but each line is printed | ||
-P |
Run the script through the C preprocessor before Perl compiles it | ||
-s |
Enable passing of arbitrary switches to Perl | Use -s -what -ever to have the Perl variables $what and $ever defined within your script | |
-S |
Tell Perl to look along thepath for the script | ||
-T |
Use taint checking; don't evaluate expressions supplied in the command line | Very important for Web use | |
-u |
Makes Perl dump core after compiling your script; intended to allow for generation of Perl executables | Very messy; wait for the Perl compiler | |
-U |
不安全模式; overrides Perl's natural caution. | 盡量不要使用! | |
-v |
顯示 Perl 版本 | ||
-w |
印出語法錯誤的警告語 | 非常有用,特別是在發展階段 |
TIP |
|
你可以在 UNIX 中使用 Perl命令列參數,底下即為範例:
#!/usr/local/bin/perl -w -T
注意事項 |
|
一個 Perl 程式是由 包含了不同的 perl 命令的 文字檔所組成。這些命令是由一些看起來非常像 C, shell script, 以及英文所組成。
Perl 程式可以是非常自由的形式,因為他的語法規則非常寬鬆:
底下就是一個 Perl 表達式:
print "My name is Andy Liu\n;
當 Perl 執行上面這個程式時,會在螢幕上印出: My name is Andy Liu。當 perl 程式執行到 \n 時,程式會自動換行。 (用其他的話來說,就是跳到下一行的開頭)。
列印更多行的方法,就是將上面的例子執行多次的結果。
print "My name is Yon Yonson,\n"; print "I live in Wisconsin,\n", "I work in a lumbermill there.\n";
完整的 Perl 程式看起來像什麼呢?底下是一個 UNIX 的範例,一開始指出了 perl程式的所在地以及幾行註解:
#!/usr/local/bin/perl -w # 顯示警告訊息 print "My name is Andy Liu,\n"; # Let's introduce ourselves print "I live in Taiwan,\n", "I work in a lumbermill there.\n"; # 別忘了以分號最為結束
Perl 的資料種類非常地少。如果你習慣 C 語言這種連字元都分為 unsigned 及 signed 那你使用 Perl 將會非常愉快。基本上 Perl 只有兩種資料結構: 數值 scalars 以及 陣列 arrays。 Perl 同樣也有 聯合陣列 associative arrays這是一種較特殊的陣列,幾乎可自成一類。
所有的數字及字串都稱為 數值資料 scalars。所有的 Scalar-variable
前面都有一個 ($).
註解 |
|
Perl 會自己將 Scalars 轉換成數值資料或是數字型態。
$a = 2; $b = 6; $c = $a . $b; # "." 運算元會將兩個字串連起來。 $d = $c / 2; print $d;
執行的結果為:
13
上面這個例子會將兩個整數轉成字串,然後將兩個字串連結成一個新的字串變數,然後 將新字串轉換成整數型態,再用2去除,再將結果轉換成字串並印出結果到螢幕上。
This situation might be a problem if Perl were regularly used for tasks in which explicit memory offsets were used, for example, and data types were critical. But for the type of task for which Perl is normally used-and certainly for the types of tasks that we'll be using it for in this book-these automatic conversions are smooth, intuitive, and generally a Good Thing.
We can develop the earlier example script with some string variables, as follows:
#!/usr/local/bin/perl -w #顯示警告訊息 $who = 'Yon Yonson'; $where = 'Wisconsin'; $what = 'in a lumbermill'; print "My name is $who,\n"; #介紹自己 print "I live in $where,\n", "I work $what there.\n"; print "\nSigned: \t$who,\n\t\t$where.\n";
執行後,會產生下列結果:
My name is Yon Yonson, I work in Wisconsin, I work in a lumbermill there. Signed: Yon Yonson, Wisconsin.
Don't worry-it gets better.
數值資料的集合我們稱為陣列 array. 陣列變數會用 (@) 符號作為第一個字的開始。陣列中的元素之間會用逗號隔開,就像下面顯示的例子一樣:
@trees = ("Larch", "Hazel", "Oak");
陣列的元素可以用中括號來表示,舉例來說:$trees[0]表示這是@trees 陣列的第一個元素。注意:這裡是使用$trees[0]而不是@trees[0] 因為個別的元素是數值資料,所以前面要用$來取得元素的值。
Mixing scalar types in an array is not a problem. The code
@items = (15, '45.67', "case"); print "Take $items[0] $items[2]s at \$$items[1] each.\n";
底下是執行的結果:
Take 15 cases at $45.67 each.
Perl 裡面的陣列都是動態的,妳根本不需要去擔心記憶體的配置及管理。Perl 會為你 做好所有的工作。陣列之中又可以包含陣列,底下舉個例子來說:
@A = (1, 2, 3); @B = (4, 5, 6); @C = (7, 8, 9); @D = (@A, @B, @C);
這個程式執行的結果:陣列@D包含了資料 1 到 9。底下是我們常用的陣列用法:
@Annual = (@Spring, @Summer, @Fall, @Winter);
這個範例程式將一年中的季節用一種簡潔又直接易懂的陣列來表達。而陣列中的季節又
可以轉換為包含月份的陣列,每個月的陣列又可以轉換成為包含天數的資料。
@Annual陣列可以變成由一年中的每一天的值所組成。
array then would consist of a value for each day of the year. By defining your data in
chunks such as this, you give yourself the option of handling it on a daily, monthly, or
annual basis.
NOTE |
|
Many of Perl's built-in functions take arrays as arguments. One example is sort, which takes an array as an argument and returns the same array, sorted alphabetically. The code
print sort ( 'Beta', 'Gamma', 'Alpha' );
prints AlphaBetaGamma.
You can make this code neater by using another built-in function, called join. This function takes two arguments: a string to connect with, and an array of strings to connect. join returns a single string that consists of all elements in the array joined with the connecting string. The code
print join ( ' : ', 'Name', 'Address', 'Phone' );
returns the string Name : Address : Phone.
Because sort returns an array, you can feed its output straight into join. The code
print join( ', ', sort ( 'Beta', 'Gamma', 'Alpha' ) );
prints Alpha, Beta, Gamma.
Notice that this code doesn't separate the initial scalar argument of join from the array that follows it. The first argument is the string to join things with. The rest of the arguments are treated as a single argument: the array to be joined. This is true even if you use parentheses to separate groups of arguments. The code
print join( ': ', ('A', 'B', 'C'), ('D', 'E'), ('F', 'G', 'H', 'I'));
returns A: B: C: D: E: F: G: H: I.
You can use one array or multiple arrays in a context such as this because of the way
that Perl treats arrays; adding an array to an array gives you one larger array, not two
arrays. In this case, all three arrays are bundled into one.
TIP |
|
Associative arrays have a certain elegance that makes experienced Perl programmers a little snobbish about their language of choice. Rightly so! Associative arrays give Perl a degree of database functionality at a very low, yet useful, level. Many tasks that would otherwise involve complex programming can be reduced to a handful of Perl statements by means of associative arrays.
Arrays of the type that you've already seen are lists of values indexed by subscripts. In other words, to get an individual element of an array, you supply a subscript as a reference, as follows:
@fruit = ( "Apple", "Orange", "Banana" ); print $fruit[2];
This example yields Banana, because subscripts start at zero, so 2 is the subscript for the third element of the @fruit array. A reference to $fruit[7] here returns the null value, because no array element with that subscript has been defined.
Now, here's the point of all this: Associative arrays are lists of values indexed by strings. Conceptually, that's all there is to them. The implementation of associative arrays is more complex, because all the strings (keys) need to be stored in addition to the values to which they refer.
When you want to refer to an element of an associative array, you supply a string (the key) instead of an integer (the subscript). Perl returns the corresponding value. Consider the following example:
%fruit = ("Green", "Apple", "Orange", "Orange", "Yellow", "Banana" ); print $fruit{"Yellow"};
This code prints Banana, as before. The first line defines the associative array in much the same way that you have already defined ordinary arrays; the difference is that instead of listing values, you list key/value pairs. The first value is Apple, and its key is Green. The second value is Orange, which happens to have the same string for both value and key. Finally, the value Banana has the key Yellow.
On a superficial level, you can use string subscripts to provide mnemonics for array references, allowing you to refer to $Total{'June'} instead of $Total[5]. But you wouldn't even be beginning to use the power of associative arrays. Think of the keys of an associative arrays as you might think of a key that links tables in a relational database, and you're closer to the idea. Consider this example:
%Folk = ( 'YY', 'Yon Yonson', 'TC', 'Terra Cotta', 'RE', 'Ron Everly' ); %State = ( 'YY', 'Wisconsin', 'TC', 'Minnesota', 'RE', 'Bliss' ); %Job = ( 'YY', 'work in a lumbermill', 'TC', 'teach nuclear physics', 'RE', 'watch football'); foreach $person ( 'TC', 'YY', 'RE' ) { print "My name is $Folk{$person},\n", "I live in $State{$person},\n", "I $Job{$person} there.\n\n"; }
這個例子裡,我們偷偷的使用了 foreach 這個結構式。這個結構式在本章稍 後的流程控制部分會再詳細的解釋。 For now, you'll just have to take it on trust that foreach makes Perl execute the three print statements for each of the people in the list after the foreach keyword. Otherwise, you could try executing the code in the sample and see what happens.
You also can treat the keys and values of an associative array as separate (ordinary) arrays by using the keys and values keywords, respectively. The code
print keys %Folk; print values %State;
prints the string YYRETCWisconsinBlissMinnesota.
Looks as though we need to do some more work on string handling. That task is best left
until after we cover some flow-control mechanisms, however.
NOTE |
|
This chapter finishes discussing Perl data types by discussing file handles. A file handle is not really a data type at all, but a special kind of literal string. A file handle behaves like a variable in many ways, however, so this section is a good place to cover them. (Besides, you won't get very far in Perl without them.)
You can regard a file handle as being a pointer to a file from which Perl is to read or to which it will write. (C programmers are familiar with the concept.) The basic idea is that you associate a handle with a file or device, and then refer to the handle in the code whenever you need to perform a read or write operation.
File handles generally are written in uppercase. Perl has some useful predefined file
handles, as Table 1.2 shows.
Table 1.2 Perl's Predefined File Handles
File Handle | Points to… |
STDIN | Standard input (normally, the keyboard) |
STDOUT | Standard output (normally, the console; in many Web applications, the browser) |
STDERR | Device where error messages should be written (normally, the console; in a Web server environment, normally, the server-error log file) |
The print statement can take a file handle as its first argument, as follows:
print STDERR "Oops, something broke.\n";
Notice that no comma appears after the file handle in this example. That helps Perl figure out that the STDERR is not something to be printed. If you're uneasy with this implicit list syntax, you can put parentheses around all the print arguments, as follows:
print (STDERR "Oops, something broke.\n");
You still have no comma after the file handle, however.
TIP |
|
You can use the open function to associate a new file handle with a file, as follows:
open (INDATA, "/etc/stuff/Friday.dat"); open (LOGFILE, ">/etc/logs/reclaim.log"); print LOGFILE "Log of reclaim procedure\n";
By default, open opens files for reading only. If you want to override this
default behavior, add to the file name one of the special direction symbols listed in
Table 1.3. (The > at the start of the file name in the second output
statement of the preceding example, for example, tells Perl that you intend to write to
the named file.)
Table 1.3 Perl File-Access Symbols
Symbol | Meaning |
< | Open the file for reading (the default action) |
> | Open the file for writing |
>> | Open the file for appending |
+< | Open the file for both reading and writing |
+> | Open the file for both reading and writing |
| (before file name) | Treat file as command into which Perl is to pipe text |
| (after file name) | Treat file as command from which input is to be piped to Perl |
To take a more complex example, here's one way to feed output to the mypr printer on a UNIX system:
open (MYLPR, "|lpr -Pmypr"); print MYLPR "A line of output\n"; close MYLPR;
A special Perl operator for reading from files consists of two angle brackets-<>-around the file handle of the file from which you want to read. This operator returns the next line or lines of input from the file or device, depending on whether the operator is used in a scalar or an array context. When no more input remains, the operator returns false.
A construct such as
while (<STDIN>) { print; }
simply echoes each line of input back to the console until Ctrl+D (Ctrl+Z in Windows NT) is pressed, because the print function takes the current default argument here: the most recent line of input. For an explanation, see "Special Variables" later in this chapter.
If the user types
A Bb Ccc ^D
the screen looks like this:
A A Bb Bb Ccc Ccc ^D
Notice that in this case, <STDIN> is in a scalar context, so one line of standard input is returned at a time. Compare that example with the following example:
print <STDIN>;
In this case, because print expects an array of arguments (it can be a single-element array, but it's an array as far as print is concerned), the <> operator obligingly returns all the contents of STDIN as an array, and then print prints it. Because the array is fully built before it is printed, nothing is written to the console until the user presses Ctrl+D:
A Bb Ccc ^D A Bb Ccc
This script prints out the contents of the file .SIGNATURE, double-spaced:
open (SIGFILE, ".signature"); while ( <SIGFILE> ) { print; print "\n"; }
The first print here has no arguments, so it takes the current default argument and prints it. The second print has an argument, so it prints that instead. Perl's habit of using default arguments extends to the <> operator; if that operator is used with no file handle, Perl assumes that <ARGV> is intended. <ARGV> expands to each line in turn of each file listed in the command line.
If no files are listed in the command line, Perl instead assumes that STDIN is intended. The following code, therefore, keeps printing more as long as something other than Ctrl+D appears in standard input:
while (<>) { print "more.... "; }
NOTE |
|
Like all languages, Perl has its special hieroglyphs, which are laden with meaning. This section briefly examines some of the most common and useful variables, and provides some examples of typical Perl idioms in which you might find them.
You have already seen one special variable: the environment-variable associative array %ENV. This special associative array allows you to easily use the value of any environment variable within your Perl scripts:
print "Looking for files along the path ($ENV{'PATH'}) \n";
The %ENV array is quite useful in CGI programming, in which parameters are passed from the browser to CGI programs as environment settings.
Any arguments specified in the Perl command line are passed to the Perl script in
another special array: @ARGV.
CAUTION |
|
The following code prints the command-line arguments one per line, sorted alphabetically:
print join("\n", sort @ARGV);
The command-line arguments are of limited use in CGI scripts, in which arguments are passed via the environment rather than the command line. These arguments are quite useful in normal Perl work, of course.
The special variable $_ is often used to store the current line of input. This situation is true when the <> input operator is used. The following code, for example, prints a numbered listing of the file pointed to by SOMEFILE:
$line=0; while ( <SOMEFILE> ) { ++$line; print "Line $line : ", $_; }
You occasionally need to store the contents of $_ somewhere, as in the following example:
$oldvalue = $_;
But the opposite operation-setting the value of $_ manually-is rarely appropriate, as in this example:
$_ = $oldvalue;
Pattern matching and substitution take place on the contents of this variable unless you specify otherwise. These topics are covered in "Regular Expressions" later in this chapter.
The special variable $! contains the current system-error number (errno, on UNIX systems) or system-error string, depending on whether it is evaluated in a numeric or string context. This variable may not contain anything meaningful; it should be used only if an error occurred.
This example reports failure if the open call failed:
open ( INFILE, "./missing.txt") || die "Couldn't open \"./missing.txt\" ($!).\n";
The || here is the Boolean or operator, which is covered in "Flow Control" later in this chapter. die causes Perl to terminate after printing the string given to die as an argument.
If the file does not exist, Perl terminates after displaying something like this:
Couldn't open "./missing.txt" (No such file or directory).
The form and content of error messages vary from one system to the next.
The examples that you have seen so far have been quite simple, with little or no logical structure beyond a linear sequence of steps. We managed to sneak in the occasional while and foreach; think of those as being sneak previews. Perl has all the flow-control mechanisms that you'd expect to find in a high-level language, and this section takes you through the basics of each mechanism.
Two operators-|| (or) and && (and)-are used like glue to hold Perl programs together. They take two operands and return either true or false, depending on the operands. In the following example, if either $Saturday or $Sunday is true, $Weekend will be true, too:
$Weekend = $Saturday || $Sunday;
In the next example, $Solvent is true only if $income is greater than 3 and $debts is less than 10:
$Solvent = ($income > 3) && ($debts < 10);
Now consider the logic of evaluating one of these expressions. It isn't always necessary to evaluate both operands of either an && or a || operator. In the first example earlier in this section, if $Saturday is true, you know that $Weekend will be true, regardless of whether $Sunday is also true (the midnight condition, perhaps?).
This means that when the left side of an or expression is evaluated as true, the right side is not evaluated. Combine this with Perl's easy way with data types, and you can say things like the following:
$value > 10 || print "Oops, low value \n";
If $value is greater than 10, the right side of the expression is never evaluated, so nothing is printed. If $value is not greater than 10, Perl needs to evaluate the right side, too, so as to decide whether the expression as a whole is true or false. That means that Perl evaluates the print statement, printing out the message.
OK, it's a trick, but it's a very useful one.
Something analogous applies to the && operator. In this case, if the left side of an expression is false, the expression as a whole is false, so Perl does not evaluate the right side. The && operator can, therefore, be used to produce the same kind of effect as the || trick, but with the opposite sense, as in the following example:
$value > 10 && print "OK, value is high enough \n";
As is true of most Perl constructs, the real power of these tricks comes when you apply a little creative thinking. Remember that the left and right sides of these expressions can be any Perl expressions; think of them as being conjunctions in a sentence rather than logical operators, and you'll get a better feel for how to use them. Expressions such as the following give you a little of the flavor of creative Perl:
$length <= 80 || die "Line too long.\n"; $errorlevel > 3 && warn "Hmmm, strange error level ($errorlevel) \n"; open ( LOGFILE, ">install.log") || &bust("Log file");
The &bust in this example is a subroutine call, by the way. Refer to "Subroutines" later in this chapter for more information.
The most basic kind of flow control is a simple branch. A statement is either executed or not, depending on whether a logical expression is true or false. You can do this by following the statement with a modifier and a logical expression, as follows:
open ( INFILE, "./missing.txt") if $missing;
The execution of the statement is contingent upon both the evaluation of the expression and the sense of the operator.
The expression is evaluated as either true or false and can contain any of the relational operators listed in Table 1.4 (although it need not). Following are a few examples of valid expressions:
$full $a == $b <STDIN>
Table 1.4 Perl's Relational Operators
Operator | Numeric Context |
String Context |
Equality | == |
eq |
Inequality | != |
ne |
Inequality with signed result | <=> |
cmp |
Greater than | > |
gt |
Greater than or equal to | >= |
ge |
Less than | < |
lt |
Less than or equal to | <= |
le |
NOTE |
|
Perl has four modifiers, each of which behaves the way that you might expect from the corresponding English word:
Notice that the logical expression is evaluated only one time in the case of if and unless, but multiple times in the case of while and until. In other words, the first two are simple conditionals, and the last two are loop constructs.
The syntax changes when you want to make the execution of multiple statements contingent on the evaluation of a logical expression. The modifier comes at the start of a line, followed by the logical expression in parentheses, followed by the conditional statements in braces. Notice that the parentheses around the logical expression are required, although they are not required in the single statement branching described in the preceding section.
The following example is somewhat similar to C's if syntax:
if ( ( $total += $value ) > $limit ) { print LOGFILE "Maximum limit $limit exceeded. Offending value was $value.\n"; close (LOGFILE); die "Too many! Check the log file for details.\n"; }
The if statement is capable of a little more complexity, with else and elsif operators, as in the following example:
if ( !open( LOGFILE, "install.log") ) { close ( INFILE ); die "Unable to open log file!\n"; } elsif ( !open( CFGFILE, ">system.cfg") ) { print LOGFILE "Error during install: Unable to open config file for writing.\n"; close ( LOGFILE ); die "Unable to open config file for writing!\n"; } else { print CFGFILE "Your settings go here!\n"; }
The loop modifiers (while, until, for, and foreach) are used with compound statements in much the same way, as the following example shows:
until ( $total >= 50 ) { print "Enter a value: "; $value = scalar (<STDIN>); $total += $value; print "Current total is $total\n"; } print "Enough!\n";
The while and until statements are described in "Conditional Expressions" earlier in this chapter. The for statement resembles the one in C. for is followed by an initial value, a termination condition, and an iteration expression, all enclosed in parentheses and separated by semicolons, as follows:
for ( $count = 0; $count < 100; $count++ ) { print "Something"; }
The foreach operator is special; it iterates over the contents of an array and executes the statements in a statement block for each element of the array. Following is a simple example:
@numbers = ("one", "two", "three", "four"); foreach $num ( @numbers ) { print "Number $num \n"; }
The variable $num first takes on the value one, then two, and so on. That example looks fairly trivial, but the real power of this operator lies in the fact that it can operate on any array, as follows:
foreach $arg ( @ARGV ) { print "Argument: \"$arg\".\n"; } foreach $namekey ( sort keys %surnames ) { print REPORT "Surname: $value{$namekey}.\n", "Address: $address{$namekey}.\n"; }
You can use labels with the next, last, and redo statements to provide more control of program flow through loops. A label consists of any word, usually in uppercase, followed by a colon. The label appears just before the loop operator (while, for, or foreach) and can be used as an anchor for jumping to from within the block. The following code snippet prints all the odd-numbered records in INFILE:
RECORD: while ( <INFILE> ) { $even = !$even; next RECORD if $even; print; }
The three label-control statements are:
Subroutines in Perl are defined with the sub keyword, as follows:
sub Usage { print "Usage: \n", "twiddle [-args] infile outfile\n"; print "Copyleft 1996, Jonathan F. Squirmsby."; }
Subroutines are called with &, as follows:
sub bust { print "Oops, some kind of error seems to have occurred.\n"; die "Fatal error, terminating.\n"; } open ( LOGFILE, ">install.log") || &bust;
In this example, the subroutine was defined before it was called. You can define and call subroutines in any order in Perl; the convention is to define them after the main routine.
Passing Arguments You can pass arguments to a subroutine in the usual way, as follows:
open ( LOGFILE, ">install.log") || &bust("Failed to open log file \"install.log\".");
But here is where Perl's subroutine syntax starts to get a little strange; C programmers may want to take a seat before reading on.
All Perl subroutines receive their arguments as an arbitrarily long array of scalars with the special name of @_. There is no mechanism for declaring the arguments when the subroutine is declared. There is no fixed number of arguments. Also, the calling function can pass any mixture of scalars and arrays; they are all treated as one big @_ array when they get to the subroutine.
In the example earlier in this section, in which bust is called with a single argument, you can pick it up in the subroutine and use it to provide a more sensible error message, as in the following example:
sub bust { ($errortext) = @_; print "Oops, an error occurred ($errortext).\n"; die "Fatal error, terminating.\n"; }
Notice that we went to the trouble of assigning the scalar $errortext to the argument array @_. This assignment may seem to be unnecessary; in fact, we could have simply used @_ instead of $errortext in the print statement. Explicitly assigning variables to the contents of the @_ array is much clearer, though, especially when the subroutine takes multiple arguments. Compare the example
print "Error $_[0] opening file $_[1].\n";
with this one:
($errfile, $errtext) = @_; print "Error $errtext opening file $errfile.\n";
Notice, too, that when we assigned the single value $errortext to the @_ array in the bust example, we placed it in parentheses. We did so to force an array context, so that what gets assigned to $errortext is the first (and only) value of the @_ array, not the number of values in @_. In effect, we're telling Perl to treat $errortext as a single-element array. The earlier example that uses $errfile and $errtext is a clearer example of an array-to-array assignment.
In "Variable Scope" later in this chapter, you learn how to protect local variables such as $errortext in subroutines by using the local and my keywords.
Passing Arrays Perl's grouping of all subroutine arguments makes it impossible to pass more than one array to a Perl subroutine. Suppose that you have a subroutine call of the following form:
&PrintRes( "alpha", (1, 3, 5, 7), "beta", (2, 4, 6, 8) );
Try to unpack these arguments into the following values as they come into the subroutine:
$p1 = "alpha"; @p2 = (1, 3, 5, 7); $p3 = "beta"; @p4 = (2, 4, 6, 8);
A statement like
( $p1, @p2, $p3, @p4 ) = @_;
won't get beyond the second parameter. The following list explains what happens:
There's no point in trying to specify subarrays, as in the following example, because Perl expands the array on the left to the same thing as before:
( $p1, (@p2), $p3, (@p4) ) = @_;
The moral of the story is: Don't pass more than one array into a subroutine. And if you do pass an array, make sure that it's the last argument.
Returning Values Perl is just as casual about returning values from subroutines as it is about passing arguments to them. A subroutine returns a single value: the value of the last assign-ment made in the subroutine. If you pass (4, 3) to this subroutine, the value 7 is returned:
sub AddIt { ( $a, $b ) = @_; $a + $b; }
That means that the value 7 is substituted for the subroutine call after evaluation. The code
print "Summing 4 and 3 yields ", &AddIt(4, 3), ".\n";
prints the following:
Summing 4 and 3 yields 7.
Notice that we had to keep the subroutine call outside the quotes to allow Perl to recognize & as a subroutine invocation.
It isn't always clear which statement is the last to be executed in a subroutine, particularly if it contains loops or conditional statements. One way to ensure that the correct value is returned is to place a reference to the variable on a line by itself at the end of the subroutine, as follows:
sub Maybe { # Various loops and conditionals here which set the value of "$result" $result; }
CAUTION |
|
The return value can be a scalar, an array, or an associative array. Listing 1.1 shows a complete example in which a subroutine builds an associative array of names keyed by initials and then returns the associative array. The keys of this array-the initials-are then printed in sorted order. Take your time reading through this example; a lot is going on in there, but it's comprehensively commented.
Listing 1.1 INITIALS.PL: Returning an Associative Array from a Subroutine
#!/usr/local/bin/perl -w # Pass the names into the subroutine. # Store the results in an associative array called "keyedNames". %keyedNames = &GetInitials("Jane Austen", "Emily Bronte", "Mary Shelley" ); # Print out the initials, sorted: print "Initials are ", join(', ', sort keys %keyedNames), ".\n"; # The GetInitials subroutine. sub GetInitials { # Let's store the arguments in a "names" array for clarity. @names = @_; # Process each name in turn: foreach $name ( @names ) { # The "split" function is explained in Chapter 15, "Function List". # In this statement, we're getting split to look for the ' ' in the name; # It returns an array of chunks of the original string (i.e. $name) which were # separated by spaces, i.e. the forename and surname respectively in our case. # The variables "$forename" and "$surname" are then assigned to this array # using parentheses to force an array assignment. ( $forename, $surname ) = split( ' ', $name ); # OK, now we have the forename and surname. We use the "substr" function, # also explained in chapter 15, to extract the first character from each of these. # The "." operator concatenates two strings (for example, "aa"."bb" is "aabb") # so the variable "$inits" takes on the value of the initials of the name: $inits = substr( $forename, 0, 1 ) . substr( $surname, 0, 1 ); # Now we store the name in an associative array using the initials as the key: $NamesByInitials{$inits} = $name; } # Having built the associative array, we simply refer to it at the end of the # subroutine so that it's value is the last thing evaluated here. It will then # be passed back to the calling function. %NamesByInitials; }
Perl uses separate name spaces to store scalars, arrays, associative arrays, and so on. As a result, you can use the same name for variables of different types without fear of confusion (at least on Perl's part; for your own sake, use unique names). This example uses three different kinds of variables, each called name:
$name = "Dana"; @name = ("Donna", "Dana", "Diana"); %name = ("Donna", "Elephants", "Dana", "Finches", "Diana", "Parakeets"); print "I said $name{$name}, not $name{$name[0]}!\n";
The bad news is that by default, Perl uses just one name space for each data type, for all functions. So if you have a variable called $temp in the main function, and you call a routine that uses another variable called $temp, the value of $temp in the main function gets clobbered. The references to the two variables are in fact two references to the same variable, as far as Perl is concerned.
That's where the local (Perl 4 and 5) and my (Perl 5 only) functions come in. These functions force Perl to treat variables as though they are local to the current code block, whether that block is a loop, an if-block, or a subroutine.
The following example uses two variables called $temp (one outside and one inside a while loop):
$temp = "Still here!\n"; print "Enter a few words at a time, Ctrl+D to terminate:\n"; while (<>) { local( $temp, @etc ) = split(' ', $_ ); print "You said $temp"; @etc && print " and then you said @etc"; print ". Enter some more, or press Ctrl+D to end:\n"; } print $temp;
The difference between Perl 4's local() and Perl 5's my() is that local variables are local to the current package, whereas my variables are really local.
We'll finish this overview of Perl by discussing its pattern-matching capabilities. The capability to match and replace patterns is vital to any scripting language that claims to be capable of useful text manipulation. By this stage, you probably won't be surprised to read that Perl matches patterns better than any other general-purpose language does. Perl 4's pattern matching is excellent, but Perl 5 introduces some significant improvements, including the capability to match on even more arbitrary strings than before.
The basic pattern-matching operations discussed in this section are:
The patterns referred to here are more properly known as regular expressions, and we'll start by looking at them.
A regular expression is a set of rules that describes a generalized string. If the characters that make up a particular string conform to the rules of a particular regular expression, the regular expression is said to match that string.
A few concrete examples usually help after an overblown definition like that one. The regular expression b. matches the strings bovine, above, Bobby, and Bob Jones, but not the strings Bell, b, or Bob. That's because the expression insists that the letter b (lowercase) must be in the string and must be followed immediately by another character.
The regular expression b+, on the other hand, requires the lowercase letter b at least once. This expression matches b and Bob in addition to the example matches for b. in the preceding paragraph. The regular expression b* requires zero or more bs, so it matches any string. That seems to be fairly useless, but it makes more sense as part of a larger regular expression. Bob*y, for example, matches all of Boy, Boby, and Bobby but not Boboby.
Assertions Several so-called assertions are used to anchor parts of
the pattern to word or string boundaries. The ^ assertion matches the start of a
string, so the regular expression ^fool matches fool and foolhardy but
not tomfoolery or April fool. Table 1.5 lists the assertions.
Table 1.5 Perl's Regular-Expression Assertions
Assertion |
Matches | Example | Matches | Doesn't Match |
^ |
Start of string | ^fool | foolish | tomfoolery |
$ |
End of string | fool$ | April fool | foolish |
\b |
Word boundary | be\bside | be side | beside |
\B |
Nonword boundary | be\Bside | beside | be side |
Atoms The . (period) that you saw in b. earlier in this chapter is an
example of a regular-expression atom. Atoms are, as the name suggests, the
fundamental building blocks of a regular expression. A full list of atoms appears in Table
1.6.
Table 1.6 Perl's Regular-Expression Atoms
Atom | Matches | Example | Matches | Doesn't Match |
period (.) | Any character except new line | b.b | bob | bb |
List of characters in brackets | Any one of those characters | ^[Bb] | Bob, bob | Rbob |
Regular expression in parentheses | Anything that regular expression matches | ^a(b.b)c$ | abobc | abbc |
Quantifiers A quantifier is a modifier for an atom. It can be
used to specify that a particular atom must appear at least once, as in b+. The atom
quantifiers are listed in Table 1.7.
Table 1.7 Perl's Regular-Expression Atom Quantifiers
Quantifier | Matches | Example | Matches | Doesn't Match |
* | Zero or more instances of the atom | ab*c | ac, abc | abb |
+ | One or more instances of the atom | ab+c | abc | ac |
? | Zero or one instances of the atom | ab?c | ac, abc | abbc |
{n} | n instances of the atom | ab{2}c | abbc | abbbc |
{n,} | At least n instances of the atom | ab{2,}c | abbc, abbbc | abc |
{nm} | At least n, most m instances of the atom | ab{2,3}c | abbc | abbbbcat |
Special Characters Several special characters are denoted by
backslashed letters, with \n being especially familiar to C programmers, perhaps.
Table 1.8 lists the special characters.
Table 1.8 Perl's Regular-Expression Special Characters
Symbol |
Matches | Example |
Matches |
Doesn't Match |
\d |
Any digit | b\dd |
b4d |
bad |
\D |
Nondigit | b\Dd |
bdd |
b4d |
\n |
New line |
|
|
|
\r |
Carriage return |
|
|
|
\t |
Tab |
|
|
|
\f |
Form feed |
|
|
|
\s |
White-space character |
|
|
|
\S |
Non-white-space character |
|
|
|
\w |
Alphanumeric character | a\wb |
a2b |
a^b |
\W |
Nonalphanumeric character | a\Wb |
aa^b |
aabb |
Backslashed Tokens It is essential that regular expressions be capable of using all characters, so that all possible strings that occur in the real word can be matched. With so many characters having special meanings, a mechanism is required that allows you to represent any arbitrary character in a regular expression.
This mechanism is a backslash (\), followed by a numeric quantity. This quantity can take any of the following formats:
Now you're ready to start putting all that information together with some real pattern matching. The match operator normally consists of two forward slashes with a regular expression in between, and it normally operates on the contents of the $_ variable. So if $_ is serendipity, /^ser/, /end/, and /^s.*y$/ are all true.
Matching on $_ The $_ operator is special; see Chapter 13, "Special Variables," for full details. In many ways, $_ is the default container for data that is being read in by Perl. The <> operator, for example, gets the next line from STDIN and stores it in $_. So the following code snippet allows you to type lines of text and tells you when your line matches one of the regular expressions:
$prompt = "Enter some text or press Ctrl+D to stop: "; print $prompt; while (<>) { /^[aA]/ && print "Starts with a or A. "; /[0-9]$/ && print "Ends with a digit. "; /perl/ && print "You said it! "; print $prompt; }
Bound Matches Matching doesn't always have to operate on $_, although this default behavior is quite convenient. A special operator, =~, evaluates to either true or false, depending on whether its first operand matches on its second operand. So $filename =~ /dat$/ is true if $filename matches on /dat$/. You can use =~ in conditionals in the usual way, as follows:
?$filename =~ /dat$/ && die "Can't use .dat files.\n";
A corresponding operator, !~, has the opposite sense. !~ is true if the first operator does not match on the second, as follows:
$ENV{'PATH'} !~ /perl/ && warn "Not sure if perl is in your path ";
Alternative Delimiters The match operator can use characters other than //-a useful point if you're trying to match a complex expression that involves forward slashes. A more general form of the match operator than // is m//. If you use the leading m, you can use any character to delimit the regular expression. For example,
$installpath =~ m!^/usr/local! || warn "The path you have chosen is odd.\n";
warns that "The path you have chosen is odd.\n" if the variable $installpath starts with /usr/local.
Match Options You can apply several optional switches to the match
operator (either // or m//) to alter its behavior. These options are
listed in Table 1.9.
Table 1.9 Perl's Match-Operator Optional Switches
Switch |
Meaning |
g |
Perform global matching |
i |
Perform case-insensitive matching |
o |
Evaluate the regular expression one time only |
The g switch continues matching even after the first match has been found. This switch is useful when you are using backreferences to examine the matched portions of a string, as described in the "Backreferences" section later in this chapter.
The i switch forces a case-insensitive match.
Finally, the o switch is used inside loops in which a great deal of pattern matching is taking place. This switch tells Perl that the regular expression (the match operator's operand) is to be evaluated one time only. The switch can improve efficiency when the regular expression is fixed for all iterations of the loop that contains it.
Backreferences As we mentioned in the "Backslashed Tokens" section earlier in this chapter, pattern matching produces quantities that are known as backreferences. These quantities are the parts of your string in which the match succeeded. You need to tell Perl to store them by surrounding the relevant parts of your regular expression with parentheses, and you can refer to them after the match as \1, \2, and so on. The following example determines whether the user typed three consecutive four-letter words:
while (<>) { /\b(\S{4})\s(\S{4})\s(\S{4})\b/ && print "Gosh, you said $1 $2 $3!\n"; }
The first four-letter word lies between a word boundary (\b) and some white space (\s), and consists of four non-white-space characters (\S). If there is a match on the expression \b(\S{4})\s-if a four-letter word is found-the matching substring is stored in the special variable \1, and the search continues. When the search is complete, you can refer to the backreferences as $1, $2, and so on.
What if you don't know in advance how many matches to expect? Perform the match in an array context; Perl returns the matches in an array. Consider this example:
@hits = ("Yon Yonson, Wisconsin" =~ /(\won)/g); print "Matched on ", join(', ', @hits), ".\n";
We'll start at the right side and work backward. The regular expression (\won) means that we match any alphanumeric character followed by on and store all three characters. The g option after the // operator means that we want to do this for the entire string, even after we find a match. The =~ operator means that we carry out this operation on a given string (Yon Yonson, Wisconsin). Finally, the whole thing is evaluated in an array context, so Perl returns the array of matches, and we store it in the @hits array. Following is the output from this example:
Matched on Yon, Yon, son, con.
When you get the hang of pattern matching, you'll find that substitutions are quite straightforward and very powerful. The substitution operator is s///, which resembles the match operator but has three rather than two slashes. Just as you can do with the match operator, you can substitute any other character for the forward slashes, and you can use the optional i, g, and o switches.
The pattern to be replaced goes between the first and second delimiters, and the replacement pattern goes between the second and third delimiters. This simple example changes $house from henhouse to doghouse:
$house = "henhouse"; $house =~ s/hen/dog/;
Notice that it isn't possible to use the =~ operator with a literal string as you can when matching, because you can't modify a literal constant. Instead, store the string in a variable and modify that variable.
You have reached the end of your whirlwind tour of Perl. You saw how Perl's deceptively simple constructs can be used to write deceptively simple programs, and you got a brief look at the basic elements of the language. At minimum, you should have a clear idea of how the language works, and you should know where to go for more information on Perl as the need arises throughout the rest of this book.
This book now moves on to Web matters, but look in the following places for more information about Perl: