Adam Blum
Microsoft Consulting Services
Perl, the Practical Extraction and Reporting Language, has attracted a lot of attention in the computer community lately since it became the de facto tool for building Web server scripts. With the release of Windows NT Perl on the NT 3.51 Resource Kit, this phenomenon has begun to extend to the Windows NT world, and for tasks beyond Web gateways. Perl is one of the few administrator level scripting and reporting tools available for Windows NT. There will be an Windows NT scripting language, but the details of this are still being determined. It is likely that this language will have many of the attributes and features of Perl that we describe here.
In the meantime, those of you with pressing automation, analysis, and reporting tasks to perform on NT platforms are often in immediate need of such a tool. Windows NT Perl now available for Windows NT in several forms. Windows NT Perl is included with the Microsoft Windows NT 3.51 Resource Kit. It is also available on the Internet at http://www.perl.hip.com. This version of Perl can control OLE automation servers, making it a scripting tool for Win32 applications in general, a status previously reserved primarily for Visual Basic. It has built in access to the Windows NT Event Log, making it an excellent tool for log analysis and reporting, tools for which are still relatively rare. It also features support for the ISAPI standard for Web server add-ins (ISAPI is supported by Microsoft's Internet Information Server among other Web servers.) In this paper, we'd like to show you some example uses of Perl for BackOffice and Internet tasks, and present enough information about the language to begin to use it productively.
|
|
 |
|
I'd like to discuss some examples of how Perl can be used as a Windows NT back office and Internet scripting tool, before going on to describe the language.
USPTO has a large, complex Microsoft Mail environment, with many postoffices and several MTAs. It would be useful if we knew exact amounts of traffic between each postoffice combination, so that postoffices could be assigned to MTAs so that as much traffic as possible stayed within the same MTA. All of this information is available in the mail logs, but they have no way of accessing this data within the huge volume of data represented by these log files. The log analysis task begun for this purpose can be extended to answer other questions about their environment, such as worst and average case end to end delivery times.
Perl was written to solve this type of problem (remember, Practical Extraction and Reporting Language). It has many primitives to automatically parse each line in the log file, quickly determine through pattern matching if the line is relevant, and keep arrays and totals of significant statistics. Along the way, we trained their administrative staff in how to maintain, extend, and write their own Perl scripts. This is actually what triggered this paper. There administrators do not want to read the "camel book" (Larry Wall's authoritative reference on Perl). They want a quicker introduction to the language, which this paper supplies.
The primary problem solved was to determine the amount of traffic at each postoffice and how much traffic occurred between pairs of postoffices. This was used to determine which postoffices to co-located on the same MTA and which postoffices needed to be split or have users moved. The specific method of determining this was to analyze each postoffice's RECV.LOG and SENT.LOG to determine this. The first example in the example uses section shows the Perl scripts used to analyze these logs and summarize the information gathered.
The Boeing engagement involved assisting them in planning their deployment of Microsoft Exchange as the electronic mail system for the Reserve Component Automation System for the U.S. Army Reserve and National Guard. Exchange provides an excellent tool, LoadSim, for analyzing end user response time. However no end to end performance benchmarking is done by LoadSim.
These statistics need to be gleaned from the server log. We could certainly do this in Visual Basic, but this is overkill and does not match the skill set of Boeing personnel. Almost all of them, however, have a strong Unix background. The Perl scripts that we develop to gather this information and report on it will be eminently maintainable by them. However, once again, those less familiar with Perl did asked for a quick introduction to the subject.
For many clients (too numerous to mention here), we have built Perl scripts to gather data from Web-based forms, and store them in text files, databases, or Excel spreadsheets. It is very easy to write Perl code to gather HTML form information, probably easier than with any other language. Code for parsing form information from an order form and storing to a CSV file is presented as the last example script.
The code involved in creating simple Web server scripts for form data collection, or even HTML page generation, is simple enough that a brief introduction to those language features necessary to maintain and enhance these scripts would be worthwhile.
Covering all of the capabilities and nuances of Perl is far beyond the scope of this article. What we want to do here is teach you just enough Perl to be able to modify existing Perl scripts, and write your own. We also want to give you a sense of the power and flexibility of the language, and in some cases its limitations, to help you to decide between Windows NT Perl, Visual C++, and Visual Basic for various tasks.
An excellent book on Perl is O'Reilly's Programming Perl by Larry Wall (the creator of Perl) and Randal Schwartz. As good as the book is, it really takes its time in giving you the knowledge you need to program productively or modify Perl scripts. To get the overview of all of Perl's capabilities, you have to wade through an extensive set of scenarios of Job using Perl to automate his camel farming operation. While these examples are well presented and entertaining, it may at times seem to knowledgeable programmers like a slow moving caravan across a desert of less than relevant problems. So informally we subtitle our treatment "Mastering Perl Without the Trials of Job".
In this section, we present the significant elements of the language that you are likely to find useful and bypass the more obscure portions of the language as well as in depth nuances not necessary to get your code written. Hopefully this introduction will let you be productive in Perl very quickly. If you decide you need to know everything about the language you can pick up Programming Perl to expand your knowledge on individual topics.
Typically we will be supplying Perl with scripts that are written and stored in files. We then invoke Perl with an argument of the file name. All programming tutorials have to begin with printing "Hello, world!", and who are we to buck tradition? So (presuming you have installed a Perl of your choice on your system), create a file called hello.pl. Place in the file the following text:
print "Hello, world\n";
Now run the program by invoking perl hello.pl from the command line. This assumes you have made your Perl directory part of your path, which you should do anyway. The Windows NT Perl distribution describes how to set up Perl, but there really isn't much more to it than extract the files into a directory and making the directory part of your Windows NT System Path.
|
|
 |
|
Now lets look at the various data types that are available so that we can make our programs start to process data. The first example shows the most common data type used in Perl: strings. Strings enclosed in double quotes (")as shown above may have escape sequences operate on them and embedded variables will be replaced with their content. Strings inside apostrophes (') will have no substitutions done on them. Perl also has numbers in the form of integers (a string of digits with no decimal point), floating point (a string of digits containing a decimal point or in scientific notation), hex (prefixed with "0x"), or octal (digits with a leading zero). Examples include:
'hello'
|
literal string
|
"hello\n"
|
string allowing escapes (such as "\n") and "variable interpolation"
|
100
|
integer
|
100.5
|
floating point
|
Complex data types include arrays which are represented as comma delimited sequences of values or variables inside parentheses. For example,
(1,2,3)
File handles in Perl are represented as a literal or variable inside angle brackets, e.g. <FILE>. Predefined file handles include <STDIN> the input supplied to the program on the command, <STDOUT> the output of the program, <STDERR> the error output of the program which always goes to the screen even if the output is redirected, and <ARGV> a special file handle that combines all files mentioned on the command line. For example, in the following invocation
PERL PROGRAM.PL ARG1.DAT ARG2.DAT <INPUT1.DAT >OUTPUT.DAT
<STDIN> contains the contents of the INPUT.DAT file. <STDOUT> writes out to OUTPUT.DAT and <STDERR> writes to the console. <ARGV> reads from ARG1.DAT until exhausted and then reads from ARG2.DAT. If input is not redirected to the program on the command line (e.g. "<INPUT1.DAT") input is supplied by prompting the user.
In order to build useful programs, you'll need to use variables as well as data type literals. Variables are prefixed with a dollar sign (e.g. "$var1"). All data types use the same notation for variable references. The data type of the variable is determined by its contents. For example,
$var1=1
makes the $var1 variable an integer. It can become a string as quickly as it is assigned
$var1="hello"
An array variable is referred to by prefixing the name with an at sign ("@"). Array items are referenced by subscripting the variable name in square brackets. The last item of an array can be accessed using the built-in value, #<arrayname>. In the example below, #array is 2, since an array is indexed beginning with zero.
# simple array manipulation example - note that # begins a comment line
@array=(5,10,15);
$firstitem=@array[0];
$lastitem=@array[#array];
The values of an array can also be treated as a list and Perl has several functions available to operate on the array as a list, which we will present shortly. Perl has a built-in array variable called @ARGV that contains all of the arguments to Perl that are present on the command line.
Perl also introduces a concept known as an associative array. That is an array that is retrieved by supplying the value of a key rather than an index. An associative array is referred to with a variable name prefixed with "%". Individual elements are accessed by including the key value inside braces (e.g. "{"key"}"). The contents of an associative array are specified as a list of key, value pairs. For example
%capitals=("Israel", "Jerusalem",
"Egypt", "Cairo",
"Saudi Arabia", "Riyadh");
$city=%capitals{"Egypt"};
print $city;
This will print out "Cairo". Perl has a built-in associative array called %ENV that has all of the environment variables available to the Perl program. Since environment variables play a prominent role in communication of information to CGI gateways, this is a very useful feature. The following code will retrieve the contents of the QUERY_STRING environment variable.
$query_string=%ENV("QUERY_STRING");
The following operators are available in Perl to work on the data and variable types presented above.
- Numeric Operators
These operators work on numeric data types. You will want to restrict the bitwise and shift operators to work on integer data or variables. There are of course traditional arithmetic operators (+ - / *) and the modulo division remainder operator (%). Perl includes the exponentation operator (**) to raise the left hand argument to the right hand number power. Bitwise operators for or (|), and (&), and exclusive-OR (^), and for left and right shift (<< and >>) are also available.
Appending any of the preceding operators with "=" assigns the resulting value to the left-side operand. For example
$a=1; $a += 2; print $a;
This will print out "3".
There are operators to perform comparison on numeric data types. ==, !=, <, >, <=, >=, <=>. The last operator returns a number depending on whether the first operand is less than (-1), equal to (0), or greater than (1) than the second operand. The ++ and -- operators take a variable as argument and increment or decrement that variable. For example, $I++ increments the variable $I.
- string operators
String operators include the period (".") operator for string concatenation. The .= operator concatenates the right hand side to the value on the left hand side. There are many string comparison operators: eq,ne,lt,gt,le,ge,cmp. Like <=>, cmp returns -1, 0, or 1 based on the comparison between the strings on the left and right side of the operator. The assignment operator = works for strings as it does for numeric variables. The following code uses the assignment (=) operator, the concatenation assignment (.=) operator, and the string equivalence operator (eq).
# word guessing game
# demonstrating string comparison operators
$word="camel\n";
while (TRUE) {
# user input, presented in later section
print "Enter guess: ";
$guess=<>;
last if $guess eq $word;
if ($guess lt $word){
print "Word is greater\n";
}
else{
print "Word is less\n";
}
}
print "Correct!\n";
Its important to remember that string variables and data types must use their own operators for comparison. For example, using == instead of eq is a common problem in beginning Perl programmer scripts.
Perl includes many built-in functions, though not nearly as many as those of you that are C and C++ may be used to. There are time functions: time, taking no arguments, returns number of seconds since January 1, 1970. localtime() converts the value return by time into a nine element array: $sec, $min, $hour, $mday, $mon, $year, $wday, and $yday, analyzed for the local time zone. An example use is:
@ltime=localtime(time);
$wday=@ltime[6];
There are trig functions (sin(), cos(), atan2()), logarithmic functions (exp(), log(), and sqrt()), random number generation (srand(), rand()). All of these functions take one argument, whose meaning is usually obvious from the purpose of the function.
There are functions to operate on strings which include a length() function, taking one argument and returning the number of characters in the expression. The chop() function takes off the last character of all elements of the supplied list, which may be just one variable. The index() functions returns the position of the second string argument in the first string argument at or after the position specified in the optional third argument (which is the first element if left unspecified). The substr() function returns the string given by an offset specified in the second argument, into the string specified in the first argument.
These can be used as in the following code:
$string="Hello, world.";
$indx=index($string,",");
$substring=substr($string,$indx);
print $len=length(chop $substring);
This will display the length of ", world" or 7.
There is also a special string function called eval(). It evaluates the argument and executes it as if it were Perl code. An alternate form of eval is invoked as eval followed by Perl code surrounded by braces ("{"). If there are runtime errors the $@ variable contains the error message.
There are several functions to operate on arrays and lists. As we mentioned, they are represented the same way in Perl, as array variables, but you can operate on them as either arrays, using subscripting to access individual elements or set individual elements, or setting all of the values of the array variable by setting the variable equal to a parenthesized list of values.
But there is also a set of functions that treat an array more abstractly as a list. This allows us to perform list oriented operations, such as adding to the end of a list or to the beginning of a list, accessing and/or removing the first element of the list, reversing the order of the list, and sorting the list. This level of abstraction saves us a lot of code from having to do all of these operations with raw arrays.
The list functions that operate on array variables are as follows (square brackets indicate optional arguments):
An example of using various list operations appears below.
@fruits=split(',',"apple,orange,lemon,kumquat,mango");
push(@fruits,("tangerine",'kiwi',"lime"));
$lostfruit=pop(@fruits);
@fruits=reverse(sort(@fruits));
$firstfruit=@fruits(#fruits);
The first statement splits the literal string of fruit names and puts it into the @fruits list. The second statement pushes three fruits to the end of the list, by listing the three fruits in a parenthesized literal list. Note that the use of apostrophes or quotes in that list are interchangeable. The third statement pops the last item ("lime") off of the list and puts the contents into the $lostfruit variable. The list is then sorted in reverse alphabetic order with the reverse() and sort() functions. Finally the $firstfruit variable is assigned the last item of the list, "apple", using the #fruits subscript on the @fruits array.
Perl statements are delimited by semicolons and can be conditionally executed or repeatedly executed using a modifier such as if, unless, while, or until. For example,
$a=1 unless $b<>0;
Multiple statements can be combined by surrounding them with braces. The same modified can then precede the block of statements, and more looping modifiers are available. These include:
- while (<expression>) { <block of statements> }
Executes the specified block of statements while the expression evaluates to TRUE.
- until (<expression>) { <block> }
- for ( <expression>;<expression>;<expression>) { <block> }
The for statement operates just like its C equivalent. The first expression is executed only once when the statement is first encountered, and is typically used to initialize variables. The second expression is used to test whether to continue the loop. The block of statements will only be executed again if it returns true. The third expression is executed after each time through the block of statements. It is typically used to increment index variables for the next pass through the loop.
- foreach <variable> ( <array reference> ) { <block>}
This construct loops over all items in the specified <array reference>. The <variable> is used to place each element of the array into on successive iterations.
- do { <block> } while <expression>
This will execute the block of statements until <expression> is true.
- do { <block> } until <expression>
This will execute the block of statement until <expression> is false.
There are several statements that can be used to control exit from the loop and modify the loops normal flow control.
- last
The last statement exists the surrounding loop. It is typically modified with an if or unless statement. It works similar to the C language break statement. For example the following code might look like an infinite loop with the "while(TRUE)" statement, but in fact it will exit with the last statement when $i becomes zero.
$i=10;
while (TRUE){
last if $i==0;
print $i;
$i-;
}
- next
This jumps to the top of the loop without executing subsequent code. Similar to the C language continue. It is also typically executed conditionally.
- redo
Jumps to the beginning of the statements inside the loop, skipping reevaluation of the loop condition. It is useful in situations such as handling errors, where you do not want the loop condition reevaluated or the for loop variables incremented - you just want to re-execute the loop body code.
|
|
 |
|
As stated earlier, Perl gives you several file handles "for free". The most common one to use is <ARGV>. This file handle will supply all of the contents of files specified on the Perl command line successively. Another file handle is <STDIN> which will read either from the user or from a file that is supplied to Perl as standard input. How do you read from a file handle? Just assign a variable to the input operator for the file handle (e.g. "$arg=<FILEHANDLE>"). Or evaluate the file handle reference by itself, as in the condition of a while loop. This will read the next line from the file and put it into the variable on the left hand side of the assignment, or just test if another line is available.
For example, the following code reads successive strings from the user until they enter a null string.
while ($_=<STDIN>){
print;
}
The print with no argument actually prints $_, which we have assigned to be the line supplied from standard input. The $_ variable is a special variable that is supplied by default to many operations that expect an argument or a variable to assign their result to. The operations that work with $_ include input from file handle. So the above loop can be written even shorter as:
while (<STDIN>) { print;}
Or even:
print while <STDIN>;
If the input operator is assigned to an array, an array will be built that contains the rest of the file handle contents, one line to an array item. For example, the following code will read the entire contents of all files supplied on the command line and make each line an item in @array.
@array=<ARGV>
This is often done inadvertently, and can lead to huge arrays that use up gobs of memory in places where its not anticipated or needed. On Windows NT and UNIX, this will work, but your system may slow significantly as it pages to make enough virtual memory available for the full contents of all the files.
There is also a special file handle called the null filehandle whose notation is <>. <> reads from the files specified on the command line, if there are any present. Otherwise it reads from standard input. So, assuming that no files are supplied on the command line, we can shorten the code presented above ("print while <STDIN>;") to:
print while <>;
To use the input operator on other files besides <ARGV> and <STDIN>, you should open the file with:
open(filehandle, filename);
If filename is not specified, it is obtained from a variable named $filehandle (whatever the filehandle name may be). Prefixing the filename (either in the variable contents or in the open statement) with a greater than sign (">") opens a file for output. Placing two greater than's (">>") in front of the name opens the file for append. For example the following code opens a log file to add information to it.
$LOG=">>RESULTS.LOG";
open(LOG);
Output can be written to an output filehandle with any of the following commands:
- print( filehandle, list)
If the filehandle is not specified, then STDOUT is used by default.
- printf( filehandle, list)
This assumes that the first element of the list is a C printf style format string, and allows you to create more formatted output. For example:
open(OUT,">OUTPUT.TXT");
@list=('Value in hex is: %x');
push(@list,16);
printf(OUT,@list);
The "%x" flag prints out a digit in hexadecimal format. For details of the available C printf format flags see any C manual.
- write(filehandle)
Writes out a formatted record. The format is set for a filehandle with the format statement.
format <filehandle> = <formlist>
<filehandle> specifies which filehandle the format statement affects. If unspecified, it applies to STDOUT. <formlist> consists of a sequence of "picture lines" to format the output, and argument lines which supply values or variables to insert into the previous picture line. As in other Perl code, comment lines can be created beginning with #. The picture line will contain multiple @ codes, one for each value to be placed on the line. The @ code will be followed by a number of "justification" characters, the number of which determines the width of the field. Less than ("<") indicates left justification, vertical bar ("|") indicates centering, and greater than (">") indicates right justification. An example format statement is:
format STDOUT =
@<<<<<<<<<<<<<<<<<<< @<<<<<<<<<<<<<<< @<<<<<<<<< @>> @>>>>>>>>> @>>>>>>>>>>>>
$name, $address, $city, $st,$zip, $phone
Note that since STDOUT was the default its presence was not necessary in the format statement above.
Perl's pattern matching capabilities for searching, text substitution, parsing input formats, and translation are one of its greatest strengths. You can write a parser for any text oriented file format with a small fraction of the code that it would require in C or C++.
Perl incorporates all of the regular expression-based pattern matching and substitution capabilities that were present in previously in various UNIX tools such as awk, flex, sed, and vi. Among the regular expressions available are:
.
|
any character
|
[a-z]
|
any character of set
|
[^a-z]
|
any character not in set
|
\d
|
digit
|
\D
|
non digit
|
\w
|
alphanumeric
|
\W
|
non-alphanumeric
|
\s
|
whitespace
|
\S
|
non-whitespace
|
\n
|
newline (any C backslash escape sequence is valid)
|
\t
|
tab
|
\0
|
null
|
\nnn
|
ASCII character of octal value
|
\xnn
|
ASCII character of hex value
|
\<character>
|
character itself (e.g. \( is left paren, \. is period)
|
(pattern)
|
any parenthesized portion of expression saved to be referred to later with $n, where n is order of subexpression in overall pattern
|
x?
|
0 or 1 x's
|
x*
|
0 or more x's
|
x+
|
1 or more x;s
|
this|that
|
matches first pattern or second pattern
|
\b
|
word boundary
|
\B
|
non-word boundary
|
^
|
beginning of line
|
$
|
end of line
|
There are two pattern matching operators: the matching operator, m// which can also be invoked with just //, and the substitution operator, s///. The matching operator lets you test for matches to a regular expression, and if the match occurs, put matching parenthesized subexpressions into variables (number $1 through $n, where n is the order of the subexpression) known as "backreferences" that you can then access and manipulate. The matching operator can be invoked with the expression to be searched followed by the =~ operator followed by the matching operator (with or without the preceding "m". If operating on the $_ default variable the =~ is unnecessary.
For example the following code matches the $_ variable containing an email address.
if ( /(\w*)@([a-z\.]+)/ )
{
$user=$1;
$domain=$2;
}
This expression will match any string will alphanumerics followed by an at ("@") sign followed by a sequence of letters or periods (the "\." specifies the literal period). The backreferences are assigned in the order the parentheses are encountered. .
The replacement operator works very similarly except that an additional pattern is specified after the second slash ("/") which is to be used to replace the matching pattern in the original expression. If we wanted to modify a string matching the pattern in the above example calling out the name and domain, the following code would suffice.
$string ~= s/(\w*)@([a-z\.]+)/Name: $1, Domain: $2/g;
This will replace all strings in $string matching the pattern inside the first pair of slashes with the text between the second and third slash, and the trailing "g" will make it do so globally across $string's contents.
Perl creates subroutines with a SUB keyword followed by the name of the subroutine followed by the block of statements enclosed in braces. For example,
# prints contents of filename specified in $_ to stdout
sub print_file_contents
{
open (INPUT,$_);
# note that we don't use $_ default so we leave $_ intact
while ($arg=<INPUT>)
{
print $arg;
}
}
To invoke a subroutine, just supply the subroutine name prefixed with ampersand ("&"). For example, &print_file_contents would display the contents of the file specified in the $_ variable. You can also call subroutines with the do command but this is less common.
To pass arguments to subroutines by value, make a local copy of the arguments. You should assign the argument list variable, @_, containing the list of actual arguments, to a list of variables representing the formal parameters using the local function. For example, the local statement in the source below assigns the arguments from the call to the $src and $dest variables.
sub file_copy
{
local($src,$dest)=@_;
open(SRC,$src);
open(DEST,">".$dest);
while (<SRC>)
{
print DEST $_ ;
}
close SRC;
close DEST;
}
Call by value arguments only let us use the values locally in our procedure. If you want the ability to modify the arguments that you pass, you will need to "call by name". The exact way that Perl does this is slightly different than "call by reference" , which Perl also supports, but we don't need to drill into those details here, since use of true call by reference is deprecated in Perl. This done by assigning the argument list (@_) to a "type glob", which is an asterisk ("*") followed by a name. The type glob can than be referenced later in the subroutine with at sign ("@") followed by the name. For example, the following subroutine removes all trailing whitespace from the supplied argument, which is passed by name with the code "local(*arg)=@_;".
# removes all trailing whitespace
sub rtrim
{
local(*arg) = @_;
while ($arg~=/.*\s/)
chop $arg;
}
This function is invoked by passing the name of the variable used to the rtrim subroutine. The name is passed by supplying an asterisk preceding the variable to the subroutine call.
$x="a string with trailing whitespace ";
&rtrim(*x);
print $x." chopped.\n";
In order to successfully reuse subroutines (which is the main purpose of writing them), we will need to have some way of including them from existing files. This mechanism is the Perl require statement. require takes an argument of a filename, and then executes all of the code found in the specified file. In that sense, its very similar to the eval statement except that it works on a filename rather than a literal string, and it is smart enough not to executed code that has been "required" earlier. For example, if we include the subroutine above and other related ones into a set of string subroutines, we could use them in other programs with the statement:
require "string.pl";
Perl has even more sophisticated tools for reuse and abstraction: packages. Packages provide a separate namespace for variables. Packages are created by simply invoking the package statement with the package name, as in:
package string_handling;
This provides a rudimentary form of data hiding: the variables associated with a package are not accessible to other routines unless they explicitly set their package to string_handling (which of course they shouldn't do). It also provides a tool that helps in building abstract data types, or that overused term objects. That is, building a related set of routines that all operate on the same data. The data is hidden from outside code (that is, well-behaved outside code), and is accessible only through the defined set of routines, usually called methods. The subroutines may each invoke the package statement, or are in a file with the package statement at the top. Data shared among package routines are referenced as if it was global data. This falls short of the protection mechanisms available in most object-oriented programming languages: you can't actually package all of the data into its own explicit object, and invoke the methods on the object directly. But its more support for abstraction and reusability than you would expect in a scripting-oriented language.
|
|
 |
|
|
|
 |
|
Windows NT Perl 5.001 as released on the Microsoft Windows NT Resource Kit allows Perl to control and utilize OLE Automation servers. This warrants coverage here, since you will not find this documented elsewhere at this point.
As you've been reading this paper, you may have started to see the usefulness of Perl for many scripting applications - a category that controlling an OLE automation server would probably fall into. Wouldn't it be great if Perl could be an OLE automation controller? Well, with NT Perl 5.001 for NT, this is now possible. In fact, NT Perl ships out of the box with Perl packages to control Excel, Word Basic, and Netscape. The Word Basic and Excel packages are particularly relevant, you could use the Excel package to grab selected parts of an Excel spreadsheet that is updated frequently and dump the contents to a file that you analyze with Perl in other ways.
These Perl packages are created by a script called MkOLEx.BAT. MkOLEx can be supplied with just the object class of an OLE automation server that has been installed on your NT system, or with a type library. What does that mean? Well, the object class is the name of the class associated with the particular OLE automation server whose services you wish to access.
How do you find this out? Typically it can be found in your product documentation, but you can also use the Registry Editor to find it. In your \WINNT\SYSTEM32 directory is a program called REGEDT32.EXE. Invoke this program and you'll be surfing the NT registry. You'll want to use the Windows NT Resource Kit, or Richter's Advanced Windows, or another Windows NT-oriented book to understand all aspects of the registry. For now, we can help you find the class name of the application which you want to control from Perl. Go the HKEY_CLASSES_ROOT portion of the registry (using the Window command if necessary). Scroll down past all of the extensions listed (e.g. ".ASC", ".AVI", etc.). Then you'll see a bunch of application names. For example, if you have Microsoft Office installed, you'll see "Access.Database.2", "Excel.Application", and "Word.Basic". You can run MkOLEx.BAT and supply it with any of the names of the classes in the registry, and it will determine the methods available for that program and create a Perl package that exposes the capabilities of that automation server. At the same time it creates an HTML document that describes the available methods of the package.
MkOLEx can also create Perl packages from type libraries. In this case, if you're not familiar with that means, you probably can't use one. A type library is a file (typically with a .TLB extension, although groups of type libraries can appear in .OLB files) containing information about the methods and properties available for a class object. Most commercial software that provides an OLE automation server interface still does not ship with type libraries. Once installed however the information that MkOLEx needs is available in the registry, and the script can then be run with the class name as described. When would you use a type library? Possibly on your own OLE automation servers, if you wanted to make manual changes to the .TLB files rather than the registry, and then quickly regenerate the Perl packages to correspond with this.
MkOLEx does not support every type of method, property or parameter. However, it graciously reports those methods and properties which it cannot convert. MkOLEx is a very exciting capability: giving Perl the ability to control any OLE automation server. This truly opens up the door to access to all types of information without the necessity of writing a full-fledged special-purpose program for each type of data you need to access.
Microsoft Internet Information Server (IIS) has the capability of running Perl scripts with a specific extension, usually .pl, directly without directly embedding references to PERL.EXE in your HTML pages. This is done with IIS' script mapping, which in effect tells IIS to run a specific executable when it sees a file of a particular extension.
To enable this for the .pl extension, use the Registry Editor to add the value .pl to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W3SVC\Parameters\Script Map key. The data value should refer to where your PERL.EXE is installed. For example, C:\PERL\PERL.EXE %s %s %s. The example Perl script and HTML for form processing presented below assumes that script mapping has been established.
Microsoft Internet Information Server introduces a new standard for Web server to gateway program communication, known as the Internet Services API (ISAPI). With ISAPI the Web server (e.g. IIS) does not fire off a complete program as it does with the Common Gateway Interface (CGI), but merely loads a DLL and calls a function. Subsequent invocations of the ISAPI script just call the function from the loaded DLL. ISAPI scripts scale much better than CGI scripts since they do not require loading and executing an entire program.
It would be great if Perl scripts could be written with ISAPI, and they now can. The Windows NT Resource Kit does not yet include the ISAPI version of Perl, but it is available on the Hip Communications Web site http://www.perl.hip.com. The ISAPI version of Perl is just a DLL called perlis.dll. Copy this DLL to a directory where IIS can run it (often the cgi-bin directory which is mapped to /Scripts). Then either change your mapping of .PL files, as shown in the previous section, to refer to PERLIS.DLL. Or just embed references in your HTML to the PerlIS DLL. For example, the following code invokes the Perl script order.pl to process the HTML form.
<FORM ACTION="perlis.dll?order.pl">
The ISAPI version of Perl (the PerlIS DLL) has a few limitations such as the inability to run executables from within the Perl script. In other words, shell escape, the system call, and exec won't work. Use the ISAPI version if you can for performance, but if you need to run other programs or shell to the operating system, use the ordinary PERL.EXE version of Windows NT Perl.
The following script is invoked successively on multiple postoffices within a batch file to analyze the number of messages received from each postoffice. Note that it only looks at the last two months of data to insure that all postoffices have an equal number of days analyzed.
#################################################
# recvpo.pl
# Perl script to analyze MSMail RECV.LOG on a
# particular postoffice
# and identify how many messages from
# each postoffice
#
# Usage:
# perl recvpo.pl <log file name>
#
# Adam Blum, Microsoft Consulting Services, 1995
#
#################################################
$to=shift @ARGV;
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =gmtime(time);
$month=$mon+1;
while (<>){
next unless /^[0-1]/;
# the split() function takes apart the input line and puts it into a @flds array
@flds=split(/\s+/,$_);
# parse out from mm-dd-yy
$flds[0]=~/([0-9]+)\-([0-9])+\-([0-9])+/;
# now $1 is mm, $2 is dd, $3 is yy
next unless (($1==$mon+1)||($1==$mon));
$from=uc $flds[3];
next unless $from=~/(\w+\/\w+)/;
($count{$1})++;
}
# ok. The arrays are built. Now display
@counts=sort {$count{$b} <=> $count{$a}} keys %count;
foreach (@counts) {
print "$to,RECV,$_,$count{$_}\n";
$totcount+=$count{$_};
}
The sentpo.pl script is very similar but writes out SENT and looks at the recipient postoffice instead of the originator postoffice. The following batch file shows how this would be executed against successive postoffices to create a perpo.csv file (which may be loaded into Excel) that summarizes traffic between postoffices.
del perpo.csv
: connect to postoffice on Netware volume
net use t: /delete
net use t: \\pto1\vol1
perl recvpo.pl PTO1 t:\maildata\log\recv.log>> perpo.csv
perl sentpo.pl PTO1 t:\maildata\log\sent.log>> perpo.csv
net use t: /delete
net use t: \\pto2\vol1
perl recvpo.pl PTO2 t:\maildata\log\recv.log>> perpo.csv
perl sentpo.pl PTO2 t:\maildata\log\sent.log>> perpo.csv
The results of this analysis can be used to answer a number of questions:
- which postoffices have too much load and should be split or have users moved to other postoffices (another Perl script could be used to analyze which users to move)
- which postoffices (based on high levels of traffic between them) should be co-located on the same hub to improve performance
Specifically, at USPTO we found that a number of postoffices with high levels of traffic between them were situated several MTA hops away from each other.
Recall from our discussion of the Boeing project that we needed to determine average and worst-case message delivery times for a variety of delivery scenarios. To do that what we'd really like is a Perl script that zips through a log file, finds all delivered messages, reports the send and delivery times for each message.
This should be a good example Perl script (as well as being essential to this engagement) since it leverages so many of Perl's features. Perl's default input operator feeds us each line, and parses each line for us easily. Perl's associative array features allow us to easily store the message origination time, and then add the delivery time to the record when its found later in the log.
The Event Log Application log has many, many entries generated into it, even when the MTA is configured in basic mode, so we won't present them all here. Below is an entry for when a message is delivered:
9/24/95,2:43:58 PM,MSExchangeIS Private,Information,Transport ,2029,N/A,ADAMBLUM_P90,Deliver Message to /o=Microsoft/ou=MCS/cn=RECIPIENTS/cn=Adamblum, FID=1-715, MID=1-2638, MTS_IDENTIFIER=c=US;a= ;p=ECS;l=Private MDB-950924184358Z-2.
This is a Local Delivery Of a Message
Below is the entry for when the message is actually delivered:
9/24/95,2:45:23 PM,MSExchangeIS Private,Information,Transport ,2054,N/A,ADAMBLUM_P90,Submitted Message To MDB recipients with MID=1-742, MTS_IDENTIFIER=c=US;a= ;p=ECS;l=Private MDB-950924184358Z-2 at 9/24/95 6:45:23 PM
OK. We know what we're looking for. Now lets write the script. Note that the actual code below is quite short (and would be much more Visual Basic or Visual C++ code), but is lengthened a bit by all of the comments.
# xchgperf.pl
#
# analyzes message delivery times and summarizes performance
# from the Event Log Application log
#
# loop through all the lines in the log
while (<>){
# ... but only consider relevant lines
next unless /;l=Private MDB (\S+)/
# the message ID field is in $1 due to the parentheses in the pattern above
$id=$1;
# the split() function takes apart the input line and puts it into a @flds array
@flds=split(/,/);
# second field is time
$time=$flds[2];
# third field is log event type
$type=$flds[3];
# is it a message sent log event?
if ($type == 2029)
# put the time into the associative array
$msgsends{$id}=$time;
else
# its a message delivered log event
$msgdelivs{$id}=$time;
}
# ok. The arrays are built. Now display them
foreach (sort keys %msgends)
print "Sent: $msgsends{$_}, Delivered: $msgdelivs{$_}\n";
What this script does is traverse through the log only picking up records with message IDs. Then it adds the message ID and the time associated with it to an associative of array of either send or delivery times, based on the type of the record (2029 means a send). Once the associative arrays are built it reports back on the send and delivery times of all messages. Obviously we can, and will, go much further and compute average and worst case elapsed delivery times, in a real program. This simple example, however, demonstrates some of the great strengths of Perl for analysis and reporting.
To execute this script and analyze server performance, run through the performance testing scenario on the various Exchange client workstations, making sure MTA logging is set to basic. Export the Event Log Application log to a text file. Now you're ready to run the Perl script to analyze your server performance.
perl xchgperf.pl eventlog.txt
Collecting data from Web forms requires a fair amount of parsing through the submitted data, breaking it into field names and values, and then "URL-decoding" the values. This is a fairly significant amount of code in either C, C++, or Visual Basic. Due to Perl's pattern matching and associative array capabilities, its very little Perl code. Here is the code to read the data from standard input (which assumes the POST method in the HTML form action).
read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
# Split the name-value pairs
@pairs = split(/&/, $buffer);
foreach $pair (@pairs){
($name, $value) = split(/=/, $pair);
$value =~ tr/+//;
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/
pack("C", hex($1))/eg;
$form{$name}=$value;
}
At the end of this code, the %form associative array will contain all of the field values. The value of a particular field can be retrieved with $form{<fieldname>}. For example, to retrieve the product cost, use $form{'Cost'}. If you wish to store the data as a single record, to a comma-separated value (CSV) file, here's the code to do that.
open(ORDERS,">>orders.csv");
foreach $value (sort values(%form))
print ORDERS $value, ",";
print ORDERS "\n"
Although HTML form writing is beyond the scope of this article, just for completeness here's how this would be embedded into an HTML form (it does assume that Perl script mapping has been performed on the server).
<FORM ACTION="store.pl">
<PRE>
<INPUT NAME="Product">
<INPUT NAME="PartNo">
<INPUT NAME="Cost">
<INPUT TYPE="submit">
</PRE>
</FORM>
Due to Perl's long history (since around 1989) on the Internet, there are many Internet resources on Perl including the comp.lang.perl newsgroup. There are also lots of canned Perl scripts for many tasks. A good resource for this is the Yahoo Perl directory: http://www.yahoo.com/Computers_and_Internet/Languages/Perl/.
The Hip Communications Web site for Perl is also a good place to check for new versions: http://www.perl.hip.com. Hip also hosts several mailing list devoted to NT Perl including perl-win32 for issues with porting Perl scripts to Win32 and perl-win32-users for general NT Perl usage issues. You can join either list from Hip's Web site.
Formal references on the subject appear below.
- Programming Perl, Larry Wall and Randal Schwartz. O'Reilly & Associates, 1991. Also known as "the camel book".
- Learning Perl. Larry Wall and Randal Scharz. O'Reilly & Associates, 1993. Also known as "the llama book".
- "The Perl Faq", ftp://ftp.cis.ufl.edu/pub/perl/doc/FAQ
- Microsoft Windows NT 3.51 Resource Kit, Microsoft Press. 1995.