第一章 Perl 概述

by Paul Doyle翻譯：劉智漢

Perl
PERL 程式
資料型態
特殊變數
流程控制（Flow Control）
Patterns
From Here

當你要在 www servers 上使用 Perl 之前，你必須要花費點時間來了解 Perl 這個程式語言。

本章提供了 Perl 語言的概觀。雖然它不是 Perl 語言的核心，但也足夠去使用 Perl 了。就像你使用語言一樣，當你使用 Perl設計了一段時間之後，你便會想要去鑽研更深的東西。

The "Camel Book"

當你準備學習更多時，你可能會想要去購買Programming Perl 這本由 Larry Wall and Randal L. Schwartz 所著的書。(O'Reilly & Associates, Inc.). 本書是目前使用 Perl 最可靠的書他的可讀性及幽默性到目前為止仍然有足夠的技術可供在任何 Perl 的程式設計上使用。

由於使用一隻單峰駱駝作為封面，因此本書便有了"Camel book" 稱號。由於很多學習Perl的人使用這本書作為學習手冊，因此這隻動物已經變成了 Perl 語言的象徵標誌了（有點像 Linux 的企鵝）。

本章節我們不會去研究太深的東西，所有深入的課題我們會在本書的第五部分來探討。本章結束之後，, 你應該知道要去哪裡找到一些特定問題的答案。如果你對 Perl已經有些了解，你可以跳過本章去尋找你想要學習的部分。如果你根本就不熟悉任何一種程式語言，本書並不適合作為初學者學習程式語言的入門書籍。

Perl

根據原作者的自述，Perl 會被發展出來起因其實是因為某人過度懶惰的結果。

NOTE

這個部分應該可以隨便帶過就可以了，為什麼我要浪費時間在這裡呢？事實上， Perl 是一種特殊的語言，他不能和一般的以技術為主程式語言來比較。我們不能只把它當作是一種程式語言來看。所以我們要花一些時間來研究他是如何被研發出來，為什麼要發展 Perl？

起源

讓我們回到 1986年，當時有個名叫 Larry Wall 的 UNIX 程式設計師，發現到他每天都得和為數眾多又複雜的文件報告為伍，因此他開始使用 awk 來處理這些文件。不久，他發現到 awk 不能符合他的需求，在找不到其它合用的工具的情況下，他決定自己寫一些程式，來解決這些困擾他許久問題。

Larry 寫了一些公用程式來管理他工作上的特殊需求。但是不久這些工作特性又變了，於是他又必須要重寫一些工具來應付新的工作需求。為了不浪費時間，他發明了新的程式語言，並且為這個語言寫了個直譯器。他發明的語法看起來像是 paradox，但事實上卻不然。

新的語言加強了對系統的管理及文字的處理，經過幾次改版之後，新的語言可以處理正規的語法，符號，以及網路通信協定。這就是現在聞名的 Perl 語言。

借用的概念

Perl 從其它的工具借了許多東西來使用，特別是 sed 及 awk.雖然 Perl 作的許多事情用sed,awk 或是 UNIX shell scripting 語言來作，都可以得到相同的結果，但是 Perl 會作的比較好。

NOTE

究竟要用大寫的 Perl 或是小寫的 perl呢？根據 Larry Wall 的定義：這根本是一件不重要的事。但是許多程式設計師卻比較喜歡使用 Perl。通常 UNIX 系統上的程式都傾向於使用小寫。不管如何，選擇你喜歡的方式去稱呼他，畢竟 Perl 是一個工具，不是教條。

如果你想要賣弄學問，在閱讀完本章的〔正規稱法〕這個部分後，你可能會稱他為 [Pp]erl

Perl 可以將低階的東西處理的非常好，特別是 Perl 5。他有許多處理方式和 C 是相通的，但是 Perl 處理資料型態，記憶體配置，卻比 C 來的更自動化。

Perl 程式碼也呈現出和 C 的程式碼相似。也許這是因為 Perl 是用 C 寫成的，也可能是 Larry 發現用 C 的語法來作表達會更容易上手。雖是如此，Perl 卻有著比 C 更簡潔的語法。

費用及授權

Perl 是一套免費的工具。全部的 source 以及文件都可供人免費複製，編譯或是列印。當你散播軟體時你並不需要付給任何人版稅，也沒有任何的法律約束。

Perl 仍然是一個保有著作權的產品，假如原始程式碼被完全公開，某人可能就會對程式作少部分的修改，然後重新編譯接著便用其他的產品名稱去販賣牟利。

GNU 組織是一個可以保護散播的免費軟體被竄改牟利的組織。在他的授權之下，原始碼可以被任何人免費的使用，但是任何根據原始碼所發展出來的程式也必須用同樣的方式來散播出去。換句話說，如果你用來發展妳程式的原始碼是在 GNU 授權之下取得的，那你也必須將你的原始碼給任何有需要的人。

這種方式可以提供對軟體發展有興趣的人一項充分的保護，但卻可能產生出過多的類似程式出來，這可能會造成喧賓奪主的情況發生。也可能會造成版本控制上的問題，可能某些特定的程式需要在某些特定的版本上執行，但卻用了不同的版本，如此一來，對使用者便會造成相當大的困擾。

這便是為什麼 Perl會在 Artistic License 這個機構下發表的原因。這個機構的理念不同於 GNU。Artistic License 表示如果有人使用了 Perl 的原始碼來發展他們自己的程式，他必須明白地表示他所發表的軟體並不是 Perl。所有更改的部分都必須明白的指出來，而且執行檔不能和被更改的程式同名。如果有必要的話，原始程式碼必須和更改過的程式一起被散播出去。這種結果會讓原作者被清楚的認同為此套軟體的原作者。

軟體散播

新版本的 Perl 可以在網路上或是各 FTP 站上找到。Perl的原始碼以及文件可以在非 UNIX 系統上執行。UNIX 的二位元檔通常在網路上找不到，這是因為幾乎所有的 UNIX 系統都有 C 語言編譯器，你可以用內附的 C 編輯器在你的機器上重新組譯一次，以確保 Perl 能在妳的系統上執行，這種方式比在別人機器上組譯 Perl 程式後，再移到你的電腦上會好的多。

Perl 套件通常都會伴隨著一套稱為 Configure 的公用程式，它可以根據不同的電腦主機狀況自動幫你設定好原始碼以及 makefile。他會自動偵測系統內的組態，然後依據不同的狀況幫你把組譯 Perl 的需要訊息都設定好，當然，你也可以自行去設定這些訊息。

Perl 程式設計（Perl Programs）

當你安裝完 perl 之後，你要如何使用這個奇妙的工具來使網站更多采多姿呢？什麼是 Perl 的程式設計？你要如何來使用它呢？

呼叫 Perl

我們打算在本章節的最後才來回答前面的兩個問題。所以我們直接來探討第三個問題，使用 perl 是一件極簡單的事，但是程序變數（procedure varies）卻會因為不同的系統而有些許的變化。

首先假設 perl 已經正確的安裝在你的電腦上了。執行 perl 的最簡單的方法就是去啟動perl 直譯器，就像下面一樣：

perl sample.pl

在這個範例中，SAMPLE.PL 是 perl 檔案的名稱，而 perl 則是 perl 直譯器。

這個範例假設 perl 是目前的執行路徑，如果不是的話，你必須要給定 perl 的所

在路徑，就像下面一樣：

 /usr/local/hin/perl sample.pl

這個語法是執行 perl 時的最好語法。因為他排除了你可能會呼叫到其它版本 perl 而不是你想要執行的版本。因為我們將會在 Web servers 上執行我們的程式，因此，最好還是使用完整的路徑以避免出錯。

幾乎每一種系統都有命令列的介面（ command-line interface）。在使用 Windows NT 時有個小訣竅：


c:\NTperl\perl sample.pl

在 UNIX 中執行 perl： UNIX 系統有另外一種執行方法便是可以在 script 檔中執行perl程式。你可以在 perl 檔案中的第一行放置下列的敘述，讓程式自動去執行perl程式：


#!/usr/local/bin/perl

這一行告訴 UNIX 這個程式是由這個 /USR/LOCAL/BIN/PERL 路徑的 perl 程式去翻譯底下的命令，如此妳就可以讓程式自行去執行，但要作以下的修改：


chmod +x sample.pl

如此，便可以直接執行這個程式，此程式的第一行便會告訴作業系統到哪裡去找到執行這個 perl的程式。

在 Windows NT中執行PERL程式 前面所提到的東西在 UNIX 中可以很正常的執行，但是在 Windows NT 中就不見得可以執行。你可以使用 NT 的檔案管理員來建立附檔名為 .PL 的檔案和 Perl 的檔案關聯，這樣 Perl 就可以在 NT中執行。只要是附檔名為 .Pl NT 就知道這是 Perl 程式，就會自動去呼叫 Perl 來執行這個附檔名為 .PL 的程式。

備註

通常我們還需要幾個步驟才能讓 Web server 自動去執行 Perl 程式。參閱附錄 A "Perl 取得及安裝"－在特定的平台上建立 scripts 及直譯器的關聯性。

命令列參數

Perl 提供了各種針對不同目的而使用的參數（參閱 Table 1.1，大部分的參數都列在這裡）。 the -t switch in particular is de rigueur in Web-based Perl scripts.

Table 1.1 Perl 命令列參數

參數	功能	使用目的	備註
`-0`	Octal character code	Specify record separator	Default is new line (`\n`)
`-a`		Automatically split records	Used with `-n` or `-p`
`-c`		Check syntax only do not execute
`-d`		Run script, using Perl debugger	If Perl debugger is installed
`\-D`	Flags	Specify debugging behavior	Refer to the PERLDEBUG man page on the CD-ROM that comes with this book
`-e`	Command	Pass a command to Perl from the command line	Useful for quick operations; see tip after this table for an example
`-F`	Regular expression	Expression to split by if `-a` is used	Default is white space
`-i`	Extension	Replace original file with result	Useful for modifying contents of files; see tip after this table for an example
`-I`	Directory	Specify location of include files
`-l`	Octal character code	Drop new lines when used with `-n` and `-p`, and use designated character as line-termination character
`-n`		Process the script, using each specified file as an argument	Used for performing the same set of actions on a set of files
`-p`		Same as `-n`, but each line is printed
`-P`		Run the script through the C preprocessor before Perl compiles it
`-s`		Enable passing of arbitrary switches to Perl	Use `-s -what -ever` to have the Perl variables `$what` and `$ever` defined within your script
`-S`		Tell Perl to look along thepath for the script
`-T`		Use taint checking; don't evaluate expressions supplied in the command line	Very important for Web use
`-u`		Makes Perl dump core after compiling your script; intended to allow for generation of Perl executables	Very messy; wait for the Perl compiler
`-U`		不安全模式; overrides Perl's natural caution.	盡量不要使用！
`-v`		顯示 Perl 版本
`-w`		印出語法錯誤的警告語	非常有用，特別是在發展階段

TIP

The -e option is handy for quick Perl operations from the command line. Want to change all the foos in WIFFLE.BAT to bars? Try this:

perl -i.old -p -e "s/foo/bar/g" wiffle.bat

This code says, "Take each line of WIFFLE.BAT (-p), store the original in WIFFLE.OLD (-i), replace all instances of foo with bar (-e), and write the result (-p) to the original file (-i)."

你可以在 UNIX 中使用 Perl命令列參數，底下即為範例：


#!/usr/local/bin/perl -w -T

注意事項

The -w switch is best omitted in versions of Perl older than 5.002, because it may produce spurious warnings.

Also, take care when you use the -w switch in scripts that send data to Web browsers. Warning messages sent before the browser receives a content-type line may result in an error message.

程式的佈局（Program Layout）

一個 Perl 程式是由包含了不同的 perl 命令的文字檔所組成。這些命令是由一些看起來非常像 C, shell script, 以及英文所組成。

Perl 程式可以是非常自由的形式，因為他的語法規則非常寬鬆：

開頭的空間會被程式忽略。你可以在任何你喜歡的地方開始 PERL 的程式。只要你喜歡，你可以在第一行開始，或是使用傳統的階層式語法（建議使用），或是自創的形式（只要你看的懂）。
程式是用分號（；）作為結束。
一個空格和一百個空格是一樣好的。這也就是說你可以將太長的敘述式分割成數行以方便閱讀。
在這個符號 (#)後方的任何東西都會被忽略不去執行。

底下就是一個 Perl 表達式：


print "My name is Andy Liu\n;

當 Perl 執行上面這個程式時，會在螢幕上印出： My name is Andy Liu。當 perl 程式執行到 \n 時，程式會自動換行。（用其他的話來說，就是跳到下一行的開頭）。

列印更多行的方法，就是將上面的例子執行多次的結果。


print "My name is Yon Yonson,\n";

print "I live in Wisconsin,\n",

      "I work in a lumbermill there.\n";

完整的 Perl 程式看起來像什麼呢？底下是一個 UNIX 的範例，一開始指出了 perl程式的所在地以及幾行註解：


#!/usr/local/bin/perl -w                    # 顯示警告訊息


print "My name is Andy Liu,\n";           # Let's introduce ourselves

print "I live in Taiwan,\n",

      "I work in a lumbermill there.\n";    # 別忘了以分號最為結束

資料結構（Data Types）

Perl 的資料種類非常地少。如果你習慣 C 語言這種連字元都分為 unsigned 及 signed 那你使用 Perl 將會非常愉快。基本上 Perl 只有兩種資料結構： 數值 scalars 以及 陣列 arrays。 Perl 同樣也有 聯合陣列 associative arrays這是一種較特殊的陣列，幾乎可自成一類。

數值 Scalars

所有的數字及字串都稱為 數值資料 scalars。所有的 Scalar-variable 前面都有一個 ($).

註解

Perl 的所有變數名稱，包含數值資料都有大小寫之分。比如說：$Name 和$name就被視為完全不同的東西。

Perl 會自己將 Scalars 轉換成數值資料或是數字型態。


$a = 2;

$b = 6;

$c = $a . $b;  # "." 運算元會將兩個字串連起來。
$d = $c / 2;

print $d;

執行的結果為：

上面這個例子會將兩個整數轉成字串，然後將兩個字串連結成一個新的字串變數，然後將新字串轉換成整數型態，再用2去除，再將結果轉換成字串並印出結果到螢幕上。

This situation might be a problem if Perl were regularly used for tasks in which explicit memory offsets were used, for example, and data types were critical. But for the type of task for which Perl is normally used-and certainly for the types of tasks that we'll be using it for in this book-these automatic conversions are smooth, intuitive, and generally a Good Thing.

We can develop the earlier example script with some string variables, as follows:


#!/usr/local/bin/perl -w                    #顯示警告訊息 



$who = 'Yon Yonson';

$where = 'Wisconsin';

$what = 'in a lumbermill';



print "My name is $who,\n";                 #介紹自己 

print "I live in $where,\n",

      "I work $what there.\n";



print "\nSigned: \t$who,\n\t\t$where.\n";

執行後，會產生下列結果：


My name is Yon Yonson,

I work in Wisconsin,

I work in a lumbermill there.



Signed:	Yon Yonson,

		Wisconsin.

Don't worry-it gets better.

陣列 Arrays

數值資料的集合我們稱為陣列 array. 陣列變數會用 (@) 符號作為第一個字的開始。陣列中的元素之間會用逗號隔開，就像下面顯示的例子一樣：


@trees = ("Larch", "Hazel", "Oak");

陣列的元素可以用中括號來表示，舉例來說：$trees[0]表示這是@trees陣列的第一個元素。注意：這裡是使用$trees[0]而不是@trees[0]因為個別的元素是數值資料，所以前面要用$來取得元素的值。

Mixing scalar types in an array is not a problem. The code


@items = (15, '45.67', "case");

print "Take $items[0] $items[2]s at \$$items[1] each.\n";

底下是執行的結果：


Take 15 cases at $45.67 each.

Perl 裡面的陣列都是動態的，妳根本不需要去擔心記憶體的配置及管理。Perl 會為你做好所有的工作。陣列之中又可以包含陣列，底下舉個例子來說：


@A = (1, 2, 3);

@B = (4, 5, 6);

@C = (7, 8, 9);

@D = (@A, @B, @C);

這個程式執行的結果：陣列@D包含了資料 1 到 9。底下是我們常用的陣列用法：


@Annual = (@Spring, @Summer, @Fall, @Winter);

這個範例程式將一年中的季節用一種簡潔又直接易懂的陣列來表達。而陣列中的季節又可以轉換為包含月份的陣列，每個月的陣列又可以轉換成為包含天數的資料。 @Annual陣列可以變成由一年中的每一天的值所組成。 array then would consist of a value for each day of the year. By defining your data in chunks such as this, you give yourself the option of handling it on a daily, monthly, or annual basis.

NOTE

An aspect of Perl that often confuses newcomers (and occasionally old hands, too) is the context-sensitive nature of evaluations. Perl keeps track of the context in which an expression is being evaluated and can return a different value in an array context than in a scalar context. In this example, the array @B contains 1-4, whereas $C contains 4 (the number of values in the array):

@A = (1, 2, 3, 4); @B = @A; $C = @A;

This context sensitivity becomes more of an issue when you use functions and operators that can take either a single argument or multiple arguments. The function or argument behaves one way when it is passed a single scalar argument and another when it is passed multiple arguments, which it may interpret as a single array argument.

Many of Perl's built-in functions take arrays as arguments. One example is sort, which takes an array as an argument and returns the same array, sorted alphabetically. The code


print sort ( 'Beta', 'Gamma', 'Alpha' );

prints AlphaBetaGamma.

You can make this code neater by using another built-in function, called join. This function takes two arguments: a string to connect with, and an array of strings to connect. join returns a single string that consists of all elements in the array joined with the connecting string. The code


print join ( ' : ', 'Name', 'Address', 'Phone'  );

returns the string Name : Address : Phone.

Because sort returns an array, you can feed its output straight into join. The code


print join( ', ', sort ( 'Beta', 'Gamma', 'Alpha' ) );

prints Alpha, Beta, Gamma.

Notice that this code doesn't separate the initial scalar argument of join from the array that follows it. The first argument is the string to join things with. The rest of the arguments are treated as a single argument: the array to be joined. This is true even if you use parentheses to separate groups of arguments. The code


print join( ': ', ('A', 'B', 'C'), ('D', 'E'), ('F', 'G', 'H', 'I'));

returns A: B: C: D: E: F: G: H: I.

You can use one array or multiple arrays in a context such as this because of the way that Perl treats arrays; adding an array to an array gives you one larger array, not two arrays. In this case, all three arrays are bundled into one.

TIP

For even more powerful string-manipulation capabilities, refer to the splice function in Chapter 15, "Function List."

組合陣列Associative Arrays

Associative arrays have a certain elegance that makes experienced Perl programmers a little snobbish about their language of choice. Rightly so! Associative arrays give Perl a degree of database functionality at a very low, yet useful, level. Many tasks that would otherwise involve complex programming can be reduced to a handful of Perl statements by means of associative arrays.

Arrays of the type that you've already seen are lists of values indexed by subscripts. In other words, to get an individual element of an array, you supply a subscript as a reference, as follows:


@fruit = ( "Apple", "Orange", "Banana" );

print $fruit[2];

This example yields Banana, because subscripts start at zero, so 2 is the subscript for the third element of the @fruit array. A reference to $fruit[7] here returns the null value, because no array element with that subscript has been defined.

Now, here's the point of all this: Associative arrays are lists of values indexed by strings. Conceptually, that's all there is to them. The implementation of associative arrays is more complex, because all the strings (keys) need to be stored in addition to the values to which they refer.

When you want to refer to an element of an associative array, you supply a string (the key) instead of an integer (the subscript). Perl returns the corresponding value. Consider the following example:


%fruit = ("Green", "Apple", "Orange", "Orange", "Yellow", "Banana" );

print $fruit{"Yellow"};

This code prints Banana, as before. The first line defines the associative array in much the same way that you have already defined ordinary arrays; the difference is that instead of listing values, you list key/value pairs. The first value is Apple, and its key is Green. The second value is Orange, which happens to have the same string for both value and key. Finally, the value Banana has the key Yellow.

On a superficial level, you can use string subscripts to provide mnemonics for array references, allowing you to refer to $Total{'June'} instead of $Total[5]. But you wouldn't even be beginning to use the power of associative arrays. Think of the keys of an associative arrays as you might think of a key that links tables in a relational database, and you're closer to the idea. Consider this example:


%Folk =   ( 'YY', 'Yon Yonson',

            'TC', 'Terra Cotta',

            'RE', 'Ron Everly' );



%State = ( 'YY', 'Wisconsin',

           'TC', 'Minnesota',

           'RE', 'Bliss' );



%Job = ( 'YY', 'work in a lumbermill',

         'TC', 'teach nuclear physics',

         'RE', 'watch football');



foreach $person ( 'TC', 'YY', 'RE' )  {

        print "My name is $Folk{$person},\n",

              "I live in $State{$person},\n",

              "I $Job{$person} there.\n\n";

        }

這個例子裡，我們偷偷的使用了 foreach 這個結構式。這個結構式在本章稍後的流程控制部分會再詳細的解釋。 For now, you'll just have to take it on trust that foreach makes Perl execute the three print statements for each of the people in the list after the foreach keyword. Otherwise, you could try executing the code in the sample and see what happens.

You also can treat the keys and values of an associative array as separate (ordinary) arrays by using the keys and values keywords, respectively. The code


print keys %Folk;

print values %State;

prints the string YYRETCWisconsinBlissMinnesota.

Looks as though we need to do some more work on string handling. That task is best left until after we cover some flow-control mechanisms, however.

NOTE

A special associative array called %ENV stores the contents of all environment variables, indexed by variable name. $ENV{'PATH'}, for example, returns the current search path. Following is a way to print the current values of all environment variables, sorted by variable name for good measure:

foreach $var (sort keys %ENV ) { print "$var: \"$ENV{$var}\".\n"; }

The foreach clause sets $var to each of the environment-variable names in turn (in alphabetical order), and the print statement prints each name and value. The backslash-quote (\") in there produces quotation marks around the values.

File Handles

This chapter finishes discussing Perl data types by discussing file handles. A file handle is not really a data type at all, but a special kind of literal string. A file handle behaves like a variable in many ways, however, so this section is a good place to cover them. (Besides, you won't get very far in Perl without them.)

You can regard a file handle as being a pointer to a file from which Perl is to read or to which it will write. (C programmers are familiar with the concept.) The basic idea is that you associate a handle with a file or device, and then refer to the handle in the code whenever you need to perform a read or write operation.

File handles generally are written in uppercase. Perl has some useful predefined file handles, as Table 1.2 shows.

Table 1.2 Perl's Predefined File Handles

File Handle	Points to…
`STDIN`	Standard input (normally, the keyboard)
`STDOUT`	Standard output (normally, the console; in many Web applications, the browser)
STDERR	Device where error messages should be written (normally, the console; in a Web server environment, normally, the server-error log file)

The print statement can take a file handle as its first argument, as follows:


print STDERR "Oops, something broke.\n";

Notice that no comma appears after the file handle in this example. That helps Perl figure out that the STDERR is not something to be printed. If you're uneasy with this implicit list syntax, you can put parentheses around all the print arguments, as follows:


print (STDERR "Oops, something broke.\n");

You still have no comma after the file handle, however.

TIP

Use the standard file handles explicitly, especially in complex programs. Redefining the standard input or output device for a while is convenient sometimes; make sure that you don't accidentally wind up writing to a file what should have gone to the screen.

You can use the open function to associate a new file handle with a file, as follows:


open (INDATA, "/etc/stuff/Friday.dat");

open (LOGFILE, ">/etc/logs/reclaim.log");

print LOGFILE "Log of reclaim procedure\n";

By default, open opens files for reading only. If you want to override this default behavior, add to the file name one of the special direction symbols listed in Table 1.3. (The > at the start of the file name in the second output statement of the preceding example, for example, tells Perl that you intend to write to the named file.)

Table 1.3 Perl File-Access Symbols

Symbol	Meaning
`<`	Open the file for reading (the default action)
`>`	Open the file for writing
`>>`	Open the file for appending
`+<`	Open the file for both reading and writing
`+>`	Open the file for both reading and writing
`\|` (before file name)	Treat file as command into which Perl is to pipe text
`\|` (after file name)	Treat file as command from which input is to be piped to Perl

To take a more complex example, here's one way to feed output to the mypr printer on a UNIX system:


open (MYLPR, "|lpr -Pmypr");

print MYLPR "A line of output\n";

close MYLPR;

A special Perl operator for reading from files consists of two angle brackets-<>-around the file handle of the file from which you want to read. This operator returns the next line or lines of input from the file or device, depending on whether the operator is used in a scalar or an array context. When no more input remains, the operator returns false.

A construct such as


while (<STDIN>) {

print;

}

simply echoes each line of input back to the console until Ctrl+D (Ctrl+Z in Windows NT) is pressed, because the print function takes the current default argument here: the most recent line of input. For an explanation, see "Special Variables" later in this chapter.

If the user types


A

Bb

Ccc

^D

the screen looks like this:


A

A

Bb

Bb

Ccc

Ccc

^D

Notice that in this case, <STDIN> is in a scalar context, so one line of standard input is returned at a time. Compare that example with the following example:


print <STDIN>;

In this case, because print expects an array of arguments (it can be a single-element array, but it's an array as far as print is concerned), the <> operator obligingly returns all the contents of STDIN as an array, and then print prints it. Because the array is fully built before it is printed, nothing is written to the console until the user presses Ctrl+D:


A

Bb

Ccc

^D

A

Bb

Ccc

This script prints out the contents of the file .SIGNATURE, double-spaced:


open (SIGFILE, ".signature");

while ( <SIGFILE> )  {

	print; print "\n";

	}

The first print here has no arguments, so it takes the current default argument and prints it. The second print has an argument, so it prints that instead. Perl's habit of using default arguments extends to the <> operator; if that operator is used with no file handle, Perl assumes that <ARGV> is intended. <ARGV> expands to each line in turn of each file listed in the command line.

If no files are listed in the command line, Perl instead assumes that STDIN is intended. The following code, therefore, keeps printing more as long as something other than Ctrl+D appears in standard input:


while (<>) {

print "more.... ";

}

NOTE

Perl 5 allows array elements to be references to any data type. As a result, you can build arbitrary data structures of the kind used in C and other high-level languages, but with all the power of Perl. You can have an array of associative arrays, for example.

Special Variables

Like all languages, Perl has its special hieroglyphs, which are laden with meaning. This section briefly examines some of the most common and useful variables, and provides some examples of typical Perl idioms in which you might find them.

Environment Variables

You have already seen one special variable: the environment-variable associative array %ENV. This special associative array allows you to easily use the value of any environment variable within your Perl scripts:


print "Looking for files along the path ($ENV{'PATH'})   \n";

The %ENV array is quite useful in CGI programming, in which parameters are passed from the browser to CGI programs as environment settings.

Program Arguments

Any arguments specified in the Perl command line are passed to the Perl script in another special array: @ARGV.

CAUTION

C programmers, beware: The first element of this array is the first actual argument, not the name of the program. The special variable $0 contains the name of the Perl script that is being executed.

The following code prints the command-line arguments one per line, sorted alphabetically:


print join("\n", sort @ARGV);

The command-line arguments are of limited use in CGI scripts, in which arguments are passed via the environment rather than the command line. These arguments are quite useful in normal Perl work, of course.

Current Line

The special variable $_ is often used to store the current line of input. This situation is true when the <> input operator is used. The following code, for example, prints a numbered listing of the file pointed to by SOMEFILE:


$line=0;

while ( <SOMEFILE> )  {

	++$line;

	print "Line $line : ", $_;

	}

You occasionally need to store the contents of $_ somewhere, as in the following example:


$oldvalue = $_;

But the opposite operation-setting the value of $_ manually-is rarely appropriate, as in this example:


$_ = $oldvalue;

Pattern matching and substitution take place on the contents of this variable unless you specify otherwise. These topics are covered in "Regular Expressions" later in this chapter.

System Error Messages

The special variable $! contains the current system-error number (errno, on UNIX systems) or system-error string, depending on whether it is evaluated in a numeric or string context. This variable may not contain anything meaningful; it should be used only if an error occurred.

This example reports failure if the open call failed:


open ( INFILE, "./missing.txt") || die "Couldn't open \"./missing.txt\" ($!).\n";

The || here is the Boolean or operator, which is covered in "Flow Control" later in this chapter. die causes Perl to terminate after printing the string given to die as an argument.

If the file does not exist, Perl terminates after displaying something like this:


Couldn't open "./missing.txt" (No such file or directory).

The form and content of error messages vary from one system to the next.

流程控制

The examples that you have seen so far have been quite simple, with little or no logical structure beyond a linear sequence of steps. We managed to sneak in the occasional while and foreach; think of those as being sneak previews. Perl has all the flow-control mechanisms that you'd expect to find in a high-level language, and this section takes you through the basics of each mechanism.

邏輯運算元

Two operators-|| (or) and && (and)-are used like glue to hold Perl programs together. They take two operands and return either true or false, depending on the operands. In the following example, if either $Saturday or $Sunday is true, $Weekend will be true, too:


$Weekend = $Saturday || $Sunday;

In the next example, $Solvent is true only if $income is greater than 3 and $debts is less than 10:


$Solvent = ($income > 3) && ($debts < 10);

Now consider the logic of evaluating one of these expressions. It isn't always necessary to evaluate both operands of either an && or a || operator. In the first example earlier in this section, if $Saturday is true, you know that $Weekend will be true, regardless of whether $Sunday is also true (the midnight condition, perhaps?).

This means that when the left side of an or expression is evaluated as true, the right side is not evaluated. Combine this with Perl's easy way with data types, and you can say things like the following:


$value > 10 || print "Oops, low value   \n";

If $value is greater than 10, the right side of the expression is never evaluated, so nothing is printed. If $value is not greater than 10, Perl needs to evaluate the right side, too, so as to decide whether the expression as a whole is true or false. That means that Perl evaluates the print statement, printing out the message.

OK, it's a trick, but it's a very useful one.

Something analogous applies to the && operator. In this case, if the left side of an expression is false, the expression as a whole is false, so Perl does not evaluate the right side. The && operator can, therefore, be used to produce the same kind of effect as the || trick, but with the opposite sense, as in the following example:


$value > 10 && print "OK, value is high enough   \n";

As is true of most Perl constructs, the real power of these tricks comes when you apply a little creative thinking. Remember that the left and right sides of these expressions can be any Perl expressions; think of them as being conjunctions in a sentence rather than logical operators, and you'll get a better feel for how to use them. Expressions such as the following give you a little of the flavor of creative Perl:


$length <= 80 || die "Line too long.\n";

$errorlevel > 3 && warn "Hmmm, strange error level ($errorlevel)   \n";

open ( LOGFILE, ">install.log") || &bust("Log file");

The &bust in this example is a subroutine call, by the way. Refer to "Subroutines" later in this chapter for more information.

Conditional Expressions

The most basic kind of flow control is a simple branch. A statement is either executed or not, depending on whether a logical expression is true or false. You can do this by following the statement with a modifier and a logical expression, as follows:


open ( INFILE, "./missing.txt") if $missing;

The execution of the statement is contingent upon both the evaluation of the expression and the sense of the operator.

The expression is evaluated as either true or false and can contain any of the relational operators listed in Table 1.4 (although it need not). Following are a few examples of valid expressions:


$full

$a == $b

<STDIN>

Table 1.4 Perl's Relational Operators

Operator	Numeric Context	String Context
Equality	`==`	`eq`
Inequality	`!=`	`ne`
Inequality with signed result	`<=>`	`cmp`
Greater than	`>`	`gt`
Greater than or equal to	`>=`	`ge`
Less than	`<`	`lt`
Less than or equal to	`<=`	`le`

NOTE

When we're comparing strings, less than means lexically less than. If $left comes before $right when the two are sorted alphabetically, $left is less than $right.

Perl has four modifiers, each of which behaves the way that you might expect from the corresponding English word:

if. The statement is executed if the logical expression is true and is not executed otherwise. Examples:
$max = 100 if $min < 100; print "Empty!\n" if !$full;
unless. The statement is not executed if the logical expression is true and is executed otherwise. Examples:
open (ERRLOG, "test.log") unless $NoLog; print "Success" unless $error>2;
while. The statement is executed repeatedly until the logical expression is false. Examples:
$total -= $decrement while $total > $decrement; $n=1000; "print $n\n" while $n- > 0;
until. The statement is executed repeatedly until the logical expression is true. Examples:
$total += $value[$count++] until $total > $limit; print RESULTS "Next value: $value[$n++]" until $value[$n] = -1;

Notice that the logical expression is evaluated only one time in the case of if and unless, but multiple times in the case of while and until. In other words, the first two are simple conditionals, and the last two are loop constructs.

Compound Statements

The syntax changes when you want to make the execution of multiple statements contingent on the evaluation of a logical expression. The modifier comes at the start of a line, followed by the logical expression in parentheses, followed by the conditional statements in braces. Notice that the parentheses around the logical expression are required, although they are not required in the single statement branching described in the preceding section.

The following example is somewhat similar to C's if syntax:


if ( ( $total += $value ) > $limit )  {

   print LOGFILE "Maximum limit $limit exceeded. Offending value was $value.\n";

   close (LOGFILE);

  die "Too many! Check the log file for details.\n";

   }

The if statement is capable of a little more complexity, with else and elsif operators, as in the following example:


if ( !open( LOGFILE, "install.log") )   {

   close ( INFILE );

   die "Unable to open log file!\n";

   }

elsif ( !open( CFGFILE, ">system.cfg") )  {

   print LOGFILE "Error during install: Unable to open config file for writing.\n";

   close ( LOGFILE );

   die "Unable to open config file for writing!\n";

   }

else  {

   print CFGFILE "Your settings go here!\n";

   }

Loops

The loop modifiers (while, until, for, and foreach) are used with compound statements in much the same way, as the following example shows:


until ( $total >= 50 )  {

   print "Enter a value: ";

   $value = scalar (<STDIN>);

   $total += $value;

   print "Current total is $total\n";

   }

print "Enough!\n";

The while and until statements are described in "Conditional Expressions" earlier in this chapter. The for statement resembles the one in C. for is followed by an initial value, a termination condition, and an iteration expression, all enclosed in parentheses and separated by semicolons, as follows:


for ( $count = 0; $count < 100; $count++ )  {

   print "Something";

   }

The foreach operator is special; it iterates over the contents of an array and executes the statements in a statement block for each element of the array. Following is a simple example:


@numbers = ("one", "two", "three", "four");

foreach $num ( @numbers )   {

   print "Number $num   \n";

   }

The variable $num first takes on the value one, then two, and so on. That example looks fairly trivial, but the real power of this operator lies in the fact that it can operate on any array, as follows:


foreach $arg ( @ARGV )   {

   print "Argument: \"$arg\".\n";

   }

foreach $namekey ( sort keys %surnames )  {

   print REPORT "Surname: $value{$namekey}.\n",

                "Address: $address{$namekey}.\n";

   }

Labels

You can use labels with the next, last, and redo statements to provide more control of program flow through loops. A label consists of any word, usually in uppercase, followed by a colon. The label appears just before the loop operator (while, for, or foreach) and can be used as an anchor for jumping to from within the block. The following code snippet prints all the odd-numbered records in INFILE:


RECORD:  while ( <INFILE> )  {

   $even = !$even;

   next RECORD if $even;

   print;

   }

The three label-control statements are:

next. Jumps to the next iteration of the loop marked by the label or to the innermost enclosing loop, if no label is specified.
last. Immediately breaks out of the loop marked by the label or out of the innermost enclosing loop, if no label is specified.
redo. Jumps back to the loop marked by the specified label or to the innermost enclosing loop, if no label is specified. redo causes the loop to execute again with the same iterator value.

Subroutines

Subroutines in Perl are defined with the sub keyword, as follows:


sub Usage {

   print "Usage: \n",

         "twiddle [-args] infile outfile\n";

   print "Copyleft 1996, Jonathan F. Squirmsby.";

 }

Subroutines are called with &, as follows:


sub bust  {

   print "Oops, some kind of error seems to have occurred.\n";

   die "Fatal error, terminating.\n";

   }

open ( LOGFILE, ">install.log") || &bust;

In this example, the subroutine was defined before it was called. You can define and call subroutines in any order in Perl; the convention is to define them after the main routine.

Passing Arguments You can pass arguments to a subroutine in the usual way, as follows:


open ( LOGFILE, ">install.log") || &bust("Failed to open log file \"install.log\".");

But here is where Perl's subroutine syntax starts to get a little strange; C programmers may want to take a seat before reading on.

All Perl subroutines receive their arguments as an arbitrarily long array of scalars with the special name of @_. There is no mechanism for declaring the arguments when the subroutine is declared. There is no fixed number of arguments. Also, the calling function can pass any mixture of scalars and arrays; they are all treated as one big @_ array when they get to the subroutine.

In the example earlier in this section, in which bust is called with a single argument, you can pick it up in the subroutine and use it to provide a more sensible error message, as in the following example:


sub bust  {

   ($errortext) = @_;

   print "Oops, an error occurred ($errortext).\n";

   die "Fatal error, terminating.\n";

   }

Notice that we went to the trouble of assigning the scalar $errortext to the argument array @_. This assignment may seem to be unnecessary; in fact, we could have simply used @_ instead of $errortext in the print statement. Explicitly assigning variables to the contents of the @_ array is much clearer, though, especially when the subroutine takes multiple arguments. Compare the example


print "Error $_[0] opening file $_[1].\n";

with this one:


($errfile, $errtext) = @_;

print "Error $errtext opening file $errfile.\n";

Notice, too, that when we assigned the single value $errortext to the @_ array in the bust example, we placed it in parentheses. We did so to force an array context, so that what gets assigned to $errortext is the first (and only) value of the @_ array, not the number of values in @_. In effect, we're telling Perl to treat $errortext as a single-element array. The earlier example that uses $errfile and $errtext is a clearer example of an array-to-array assignment.

In "Variable Scope" later in this chapter, you learn how to protect local variables such as $errortext in subroutines by using the local and my keywords.

Passing Arrays Perl's grouping of all subroutine arguments makes it impossible to pass more than one array to a Perl subroutine. Suppose that you have a subroutine call of the following form:


&PrintRes( "alpha", (1, 3, 5, 7), "beta", (2, 4, 6, 8) );

Try to unpack these arguments into the following values as they come into the subroutine:


$p1 = "alpha";

@p2 = (1, 3, 5, 7);

$p3 = "beta";

@p4 = (2, 4, 6, 8);

A statement like


( $p1, @p2, $p3, @p4 ) = @_;

won't get beyond the second parameter. The following list explains what happens:

The first variable in the list, $p1, is assigned the value of the first scalar in the @_ argument array, which is alpha.
Then the next variable in the list, @p2, is assigned the value of the next argument in the @_ argument array. This is an array assignment because @p2 is an array, so the entire @_ array, from its second element on, is assigned to @p2-(1, 3, 5, 7, "beta", 2, 4, 6, 8), in other words.
The next variable to be assigned is $p3. This variable is assigned the value of the next element in the @_ argument array-but there aren't any left, because they've all been slurped by @p2. $p3, therefore, is null.
The final variable, @p4, suffers the same fate and is also null.

There's no point in trying to specify subarrays, as in the following example, because Perl expands the array on the left to the same thing as before:


( $p1, (@p2), $p3, (@p4) ) = @_;

The moral of the story is: Don't pass more than one array into a subroutine. And if you do pass an array, make sure that it's the last argument.

Returning Values Perl is just as casual about returning values from subroutines as it is about passing arguments to them. A subroutine returns a single value: the value of the last assign-ment made in the subroutine. If you pass (4, 3) to this subroutine, the value 7 is returned:


sub AddIt  {

   ( $a, $b ) = @_;

   $a + $b;

   }

That means that the value 7 is substituted for the subroutine call after evaluation. The code


print "Summing 4 and 3 yields ", &AddIt(4, 3), ".\n";

prints the following:


Summing 4 and 3 yields 7.

Notice that we had to keep the subroutine call outside the quotes to allow Perl to recognize & as a subroutine invocation.

It isn't always clear which statement is the last to be executed in a subroutine, particularly if it contains loops or conditional statements. One way to ensure that the correct value is returned is to place a reference to the variable on a line by itself at the end of the subroutine, as follows:


sub Maybe  {

   # Various loops and conditionals here which set the value of "$result"   

   $result;

   }

CAUTION

Take care not to add seemingly innocuous statements near the end of a subroutine. A print statement returns a value of 1 (if successful) for example, so a subroutine that prints something just before it returns always returns 1.

The return value can be a scalar, an array, or an associative array. Listing 1.1 shows a complete example in which a subroutine builds an associative array of names keyed by initials and then returns the associative array. The keys of this array-the initials-are then printed in sorted order. Take your time reading through this example; a lot is going on in there, but it's comprehensively commented.

Listing 1.1 INITIALS.PL: Returning an Associative Array from a Subroutine


#!/usr/local/bin/perl -w



# Pass the names into the subroutine.

# Store the results in an associative array called "keyedNames".

%keyedNames = &GetInitials("Jane Austen", "Emily Bronte", "Mary Shelley" );



# Print out the initials, sorted:

print "Initials are ", join(', ', sort keys %keyedNames), ".\n";



# The GetInitials subroutine.

sub GetInitials  {



   # Let's store the arguments in a "names" array for clarity.

   @names = @_;



   # Process each name in turn:

   foreach $name ( @names )  {



      # The "split" function is explained in Chapter 15, "Function List".

      # In this statement, we're getting split to look for the ' ' in the name;

      # It returns an array of chunks of the original string (i.e. $name) which were

      # separated by spaces, i.e. the forename and surname respectively in our case.

      # The variables "$forename" and "$surname" are then assigned to this array

      # using parentheses to force an array assignment.

      ( $forename, $surname ) = split( ' ', $name );



      # OK, now we have the forename and surname. We use the "substr" function,

      # also explained in chapter 15, to extract the first character from each 

of these.

      # The "." operator concatenates two strings (for example, "aa"."bb" is "aabb")

      # so the variable "$inits" takes on the value of the initials of the name:

      $inits = substr( $forename, 0, 1 ) . substr( $surname, 0, 1 );



      # Now we store the name in an associative array using the initials as the key:

      $NamesByInitials{$inits} = $name;

      }



   # Having built the associative array, we simply refer to it at the end of the

   # subroutine so that it's value is the last thing evaluated here. It will then 

   # be passed back to the calling function.

   %NamesByInitials;

   }

Variable Scope

Perl uses separate name spaces to store scalars, arrays, associative arrays, and so on. As a result, you can use the same name for variables of different types without fear of confusion (at least on Perl's part; for your own sake, use unique names). This example uses three different kinds of variables, each called name:


$name = "Dana";

@name = ("Donna", "Dana", "Diana");

%name = ("Donna", "Elephants", "Dana", "Finches", "Diana", "Parakeets");

print "I said $name{$name}, not $name{$name[0]}!\n";

The bad news is that by default, Perl uses just one name space for each data type, for all functions. So if you have a variable called $temp in the main function, and you call a routine that uses another variable called $temp, the value of $temp in the main function gets clobbered. The references to the two variables are in fact two references to the same variable, as far as Perl is concerned.

That's where the local (Perl 4 and 5) and my (Perl 5 only) functions come in. These functions force Perl to treat variables as though they are local to the current code block, whether that block is a loop, an if-block, or a subroutine.

The following example uses two variables called $temp (one outside and one inside a while loop):


$temp = "Still here!\n";

print "Enter a few words at a time, Ctrl+D to terminate:\n";

while (<>)  {

   local( $temp, @etc ) = split(' ', $_ );

   print "You said $temp";

   @etc && print " and then you said @etc";

   print ".  Enter some more, or press Ctrl+D to end:\n";

   }

print $temp;

The difference between Perl 4's local() and Perl 5's my() is that local variables are local to the current package, whereas my variables are really local.

Patterns

We'll finish this overview of Perl by discussing its pattern-matching capabilities. The capability to match and replace patterns is vital to any scripting language that claims to be capable of useful text manipulation. By this stage, you probably won't be surprised to read that Perl matches patterns better than any other general-purpose language does. Perl 4's pattern matching is excellent, but Perl 5 introduces some significant improvements, including the capability to match on even more arbitrary strings than before.

The basic pattern-matching operations discussed in this section are:

Matching, in which we want to know whether a particular string matches a pattern
Substitution, in which we want to replace portions of a string based on a pattern

The patterns referred to here are more properly known as regular expressions, and we'll start by looking at them.

Regular Expressions

A regular expression is a set of rules that describes a generalized string. If the characters that make up a particular string conform to the rules of a particular regular expression, the regular expression is said to match that string.

A few concrete examples usually help after an overblown definition like that one. The regular expression b. matches the strings bovine, above, Bobby, and Bob Jones, but not the strings Bell, b, or Bob. That's because the expression insists that the letter b (lowercase) must be in the string and must be followed immediately by another character.

The regular expression b+, on the other hand, requires the lowercase letter b at least once. This expression matches b and Bob in addition to the example matches for b. in the preceding paragraph. The regular expression b* requires zero or more bs, so it matches any string. That seems to be fairly useless, but it makes more sense as part of a larger regular expression. Bob*y, for example, matches all of Boy, Boby, and Bobby but not Boboby.

Assertions Several so-called assertions are used to anchor parts of the pattern to word or string boundaries. The ^ assertion matches the start of a string, so the regular expression ^fool matches fool and foolhardy but not tomfoolery or April fool. Table 1.5 lists the assertions.

Table 1.5 Perl's Regular-Expression Assertions

Assertion	Matches	Example	Matches	Doesn't Match
`^`	Start of string	`^fool`	`foolish`	`tomfoolery`
`$`	End of string	`fool$`	`April fool`	`foolish`
`\b`	Word boundary	`be\bside`	`be side`	`beside`
`\B`	Nonword boundary	`be\Bside`	`beside`	`be side`

Atoms The . (period) that you saw in b. earlier in this chapter is an example of a regular-expression atom. Atoms are, as the name suggests, the fundamental building blocks of a regular expression. A full list of atoms appears in Table 1.6.

Table 1.6 Perl's Regular-Expression Atoms

Atom	Matches	Example	Matches	Doesn't Match
period (.)	Any character except new line	`b.b`	`bob`	`bb`
List of characters in brackets	Any one of those characters	`^[Bb]`	`Bob, bob`	`Rbob`
Regular expression in parentheses	Anything that regular expression matches	`^a(b.b)c$`	`abobc`	`abbc`

Quantifiers A quantifier is a modifier for an atom. It can be used to specify that a particular atom must appear at least once, as in b+. The atom quantifiers are listed in Table 1.7.

Table 1.7 Perl's Regular-Expression Atom Quantifiers

Quantifier	Matches	Example	Matches	Doesn't Match
`*`	Zero or more instances of the atom	`ab*c`	`ac, abc`	`abb`
`+`	One or more instances of the atom	`ab+c`	`abc`	`ac`
`?`	Zero or one instances of the atom	`ab?c`	`ac, abc`	`abbc`
`{n}`	`n` instances of the atom	`ab{2}c`	`abbc`	`abbbc`
`{n,}`	At least `n` instances of the atom	`ab{2,}c`	`abbc, abbbc`	`abc`
`{nm}`	At least `n`, most `m` instances of the atom	`ab{2,3}c`	`abbc`	`abbbbc`at

Special Characters Several special characters are denoted by backslashed letters, with \n being especially familiar to C programmers, perhaps. Table 1.8 lists the special characters.

Table 1.8 Perl's Regular-Expression Special Characters

Symbol	Matches	Example	Matches	Doesn't Match
`\d`	Any digit	`b\dd`	`b4d`	`bad`
`\D`	Nondigit	`b\Dd`	`bdd`	`b4d`
`\n`	New line
`\r`	Carriage return
`\t`	Tab
`\f`	Form feed
`\s`	White-space character
`\S`	Non-white-space character
`\w`	Alphanumeric character	`a\wb`	`a2b`	`a^b`
`\W`	Nonalphanumeric character	`a\Wb`	`aa^b`	`aabb`

Backslashed Tokens It is essential that regular expressions be capable of using all characters, so that all possible strings that occur in the real word can be matched. With so many characters having special meanings, a mechanism is required that allows you to represent any arbitrary character in a regular expression.

This mechanism is a backslash (\), followed by a numeric quantity. This quantity can take any of the following formats:

Single or double digit: matched quantities after a match. These matched quantities are called backreferences and are explained in the following section.
Two-or three-digit octal number: the character with that number as character code, unless it's possible to interpret it as a backreference.
x, followed by two hexadecimal digits: the character with that number as its character code. \x3e, for example, is >.
c, followed by a single character: the control character. \cG, for example, matches Ctrl+G.
Any other character: the character itself. \&, for example, matches the & character.

Matching

Now you're ready to start putting all that information together with some real pattern matching. The match operator normally consists of two forward slashes with a regular expression in between, and it normally operates on the contents of the $_ variable. So if $_ is serendipity, /^ser/, /end/, and /^s.*y$/ are all true.

Matching on $_ The $_ operator is special; see Chapter 13, "Special Variables," for full details. In many ways, $_ is the default container for data that is being read in by Perl. The <> operator, for example, gets the next line from STDIN and stores it in $_. So the following code snippet allows you to type lines of text and tells you when your line matches one of the regular expressions:


$prompt = "Enter some text or press Ctrl+D to stop: ";

print $prompt;

while (<>)  {

   /^[aA]/ && print "Starts with a or A.  ";

   /[0-9]$/ && print "Ends with a digit.  ";

   /perl/ && print "You said it!   ";

   print $prompt;

   }

Bound Matches Matching doesn't always have to operate on $_, although this default behavior is quite convenient. A special operator, =~, evaluates to either true or false, depending on whether its first operand matches on its second operand. So $filename =~ /dat$/ is true if $filename matches on /dat$/. You can use =~ in conditionals in the usual way, as follows:


?$filename =~ /dat$/ && die "Can't use .dat files.\n";

A corresponding operator, !~, has the opposite sense. !~ is true if the first operator does not match on the second, as follows:


$ENV{'PATH'} !~ /perl/ && warn "Not sure if perl is in your path   ";

Alternative Delimiters The match operator can use characters other than //-a useful point if you're trying to match a complex expression that involves forward slashes. A more general form of the match operator than // is m//. If you use the leading m, you can use any character to delimit the regular expression. For example,


$installpath =~ m!^/usr/local! || warn "The path you have chosen is odd.\n";

warns that "The path you have chosen is odd.\n" if the variable $installpath starts with /usr/local.

Match Options You can apply several optional switches to the match operator (either // or m//) to alter its behavior. These options are listed in Table 1.9.

Table 1.9 Perl's Match-Operator Optional Switches

Switch	Meaning
`g`	Perform global matching
`i`	Perform case-insensitive matching
`o`	Evaluate the regular expression one time only

The g switch continues matching even after the first match has been found. This switch is useful when you are using backreferences to examine the matched portions of a string, as described in the "Backreferences" section later in this chapter.

The i switch forces a case-insensitive match.

Finally, the o switch is used inside loops in which a great deal of pattern matching is taking place. This switch tells Perl that the regular expression (the match operator's operand) is to be evaluated one time only. The switch can improve efficiency when the regular expression is fixed for all iterations of the loop that contains it.

Backreferences As we mentioned in the "Backslashed Tokens" section earlier in this chapter, pattern matching produces quantities that are known as backreferences. These quantities are the parts of your string in which the match succeeded. You need to tell Perl to store them by surrounding the relevant parts of your regular expression with parentheses, and you can refer to them after the match as \1, \2, and so on. The following example determines whether the user typed three consecutive four-letter words:


while (<>)  {

   /\b(\S{4})\s(\S{4})\s(\S{4})\b/ && print "Gosh, you said $1 $2 $3!\n";

   }

The first four-letter word lies between a word boundary (\b) and some white space (\s), and consists of four non-white-space characters (\S). If there is a match on the expression \b(\S{4})\s-if a four-letter word is found-the matching substring is stored in the special variable \1, and the search continues. When the search is complete, you can refer to the backreferences as $1, $2, and so on.

What if you don't know in advance how many matches to expect? Perform the match in an array context; Perl returns the matches in an array. Consider this example:


@hits = ("Yon Yonson, Wisconsin" =~ /(\won)/g);

print "Matched on ", join(', ', @hits), ".\n";

We'll start at the right side and work backward. The regular expression (\won) means that we match any alphanumeric character followed by on and store all three characters. The g option after the // operator means that we want to do this for the entire string, even after we find a match. The =~ operator means that we carry out this operation on a given string (Yon Yonson, Wisconsin). Finally, the whole thing is evaluated in an array context, so Perl returns the array of matches, and we store it in the @hits array. Following is the output from this example:


Matched on Yon, Yon, son, con.

Substitution

When you get the hang of pattern matching, you'll find that substitutions are quite straightforward and very powerful. The substitution operator is s///, which resembles the match operator but has three rather than two slashes. Just as you can do with the match operator, you can substitute any other character for the forward slashes, and you can use the optional i, g, and o switches.

The pattern to be replaced goes between the first and second delimiters, and the replacement pattern goes between the second and third delimiters. This simple example changes $house from henhouse to doghouse:


$house = "henhouse";

$house  =~ s/hen/dog/;

Notice that it isn't possible to use the =~ operator with a literal string as you can when matching, because you can't modify a literal constant. Instead, store the string in a variable and modify that variable.

From Here...

You have reached the end of your whirlwind tour of Perl. You saw how Perl's deceptively simple constructs can be used to write deceptively simple programs, and you got a brief look at the basic elements of the language. At minimum, you should have a clear idea of how the language works, and you should know where to go for more information on Perl as the need arises throughout the rest of this book.

This book now moves on to Web matters, but look in the following places for more information about Perl:

Refer to Part V of this book for comprehensive information on Perl special variables, operators, and built-in functions.
Also refer to Part V to learn how to use modules and libraries to compartmentalize your code for greater robustness and extensibility.
Consider buying a book that deals in detail with the Perl language. The definitive work is the "Camel book," cited at the beginning of this chapter.