Perl: Difference between revisions
Line 1,056: | Line 1,056: | ||
More complex examples: |
More complex examples: |
||
< |
<source lang=bash> |
||
#Remove duplicate words from a white-space separated list |
#Remove duplicate words from a white-space separated list |
||
WORDS="foo bar baz bar bar foo baz" |
WORDS="foo bar baz bar bar foo baz" |
Revision as of 23:08, 23 May 2012
Reference
- Perldoc on local computer
% perldoc -q duplicate
"How can I remove duplicate elements from a list or array?"
% perldoc -f split
split /PATTERN/,EXPR,LIMIT
split /PATTERN/,EXPR
split /PATTERN/
...
% perldoc -f join
join EXPR,LIST
- Links
- FAQ
- The FAQ is the primary source of answer to questions like How can I do....
- Manpages - List of highly recommended perldoc manpages (from the FAQ).
Basics perldata, perlvar, perlsyn, perlop, perlsub Execution perlrun, perldebug Functions perlfunc Objects perlref, perlmod, perlobj, perltie Data Structures perlref, perllol, perldsc Modules perlmod, perlmodlib, perlsub Regexes perlre, perlfunc, perlop, perllocale Moving to perl5 perltrap, perl Linking w/C perlxstut, perlxs, perlcall, perlguts, perlembed Various http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz (not a man-page but still useful, a collection of various essays on Perl techniques)
- Command-Line - Useful command-line options
- -e expression
- specififies perl expressions.
- -p
- loops over and prints input.
- -n
- loops over and does not print input.
- -l
- strip newlines on input, and adds them on output. Use this option by default, unless the newlines need special handling, or for efficiency reasons.
Quick Introduction
Program Structure
Example of a simple Hello World program:
#!/usr/bin/perl
use strict; # Immediately stops on potential problem - highly recommended for simplified debugging
use warning; # Warnings - highly recommended for simplified debugging
print "Hello, World!\n";
exit 0;
Data Types
$ |
for scalar values (number, string or reference) |
@ |
for arrays |
% |
for hashes (associative arrays) |
& |
for subroutines (aka functions, procedures, methods) |
* |
for all types of that symbol name. In version 4 you used them like pointers, but in modern perls you can just use references. |
<> |
are used for inputting a record from a filehandle. |
\ |
takes a reference to something. |
Note that the last 2 are not really type specifiers.
Arrays
Some example
my @array1 = ("titi","tutu"); # (...) is an array constructor
my @array2 = ("tata","toto");
push(@array1,"tete"); # Append an element to an array
push(@array1,@array2); # Append another array to an array
print $array1[1]; # 2nd element of array array1
print $#array1; # The last index of array array1
@array1 = (); # Truncate array
$#array = -1; # ... same effect
print scalar( @array1 ); # The length of the array array1
print $#array1 + 1; # ... same effect (since perl 5, $[ deprecated)
Arrays can be easily constructed through autovivification. Below we create a hash of arrays
my %Projects; # Projects is a hash, but we say nothing on the types of its elements...
foreach my $VOBName (keys %VOBs)
{
my $ProjectName = $VOBs{$VOBName}{'ProjectName'};
push(@{$Projects{$ProjectName}}, $VOBName); # <-- we dereference value returned by $Projects{$ProjectName} as
} # an array, hence creating automatically an array if undef
Below some difference of handling @
in SCALAR or LIST context:
# RESULT CONTEXT EXPLANATION
my @a = ("titi","tutu");
my $varnoquote=@a; print "$varnoquote\n"; # "2" (SCALAR - @_ is evaluated in scalar context)
my $varquote="@a"; print "$varquote\n"; # "titi tutu" (EXPAND - @_ is quote-expanded, each item being separated by space)
print @a; print"\n"; # "tititutu" (LIST - $, is empty)
print(@a); print"\n"; # "tititutu" (LIST - $, is empty)
printf @a; print"\n"; # "titi" (LIST - 1st element in list is interpreted as the format string)
printf(@a); print"\n"; # "titi" (LIST - 1st element in list is interpreted as the format string)
print @a,"\n"; # "tititutu" (LIST - $, is empty)
printf "%s\n",@a; # "titi" (LIST - only 1st element is read)
Set variable $,
to modify the list separator used when printing arrays
my @a = ("titi","tutu");
$,="\n";
print @a;
Hashes
Some example of hashes:
my %cities = ( # (...) is a hash constructor
"US" => "Washington",
"GB" => "London"
);
print $cities{"US"},"\n";
my %hashofhash = ( # This is actually a hash of references to hash (note the {...})
"address" => {name => "US",
city => "Washington" },
"identity" => {firstname => "smith",
lastname => "Smith" } );
print $hashofhash{"address"}{"name"},"\n";
print $hashofhash{"address"}->{"name"},"\n";
keys(%users) = 1000; # allocate 1024 buckets
print scalar(keys %users),"\n"; # length of the hash
Note that in LIST context, a hash is transformed into an array containing both the keys and values in the hash!
my %myhash = ( key1 => "value1", key2 => "value2" );
my @myarray= ( "element1", "element2" );
push (@myarray, %myhash);
$, = ",";
print @myarray; # outputs "element1, element2, key2, value2, key1, value1"
References
my $VOBAttrRef = $VOBs{'AdminMask'}; # This return a reference to a Hash
my %VOBAttr = %$VOBAttrRef; # This dereference the reference above and return a Hash
print $VOBAttr{'ProjectName'},"\n"; # We can use our new Hash variable
print $$VOBAttrRef{'ProjectName'},"\n"; # ... or we can dereference our reference variable using the $$ construct
print $VOBAttrRef->{'ProjectName'},"\n"; # ... but -> can also be used to dereference
print $VOBs{'AdminMask'}->{'ProjectName'},"\n"; # We can also skip altoghether the reference variable
print $VOBs{'AdminMask'}{'ProjectName'},"\n"; # ... This notation is also available as a shortcut, -> can be omitted
Passing reference to sub-routines:
$tab{'somekey'} = '...';
process(\$tab);
sub process ()
{
my $tab = $_[0];
$tab->{'somekey'} = '...';
}
Using Anonymous Hash References:
#!/usr/bin/perl
use strict;
my @myarray;
foreach my $iter ( 1..10 )
{
my $value1 = "value1_".$iter;
my $value2 = "value2_".$iter;
print "Creating our \$hashref... ";
my $hashref = { index1 => $value1, index2 => $value2 }; # { key1 => value1, ... } creates a REFERENCE to an anonymous hash.
# Since reference are SCALAR, we assign it to a scalar variable
print "Done.\n",
" \$hashref: ",$hashref,"\n";
print " content: ",$$hashref{'index1'},",",$$hashref{'index2'},"\n";
print "Adding \$hashref to our array... ";
push( @myarray, $hashref );
print "Done. There are currently ", scalar(@myarray), " elements in \@myarray.\n";
print "Accessing last element of our array...";
print " content: @myarray[$#myarray], ${@myarray[$#myarray]}{'index1'} or better yet @myarray[$#myarray]->{'index2'}\n";
}
print "\n\nNow we will traverse our array again...\n";
foreach ( @myarray )
{
print "$_ containing ",
"index1 => $$_{'index1'},",
"index2 => $$_{'index2'}\n";
print "... or using -> operator: ",
"index1 => $_->{'index1'},",
"index2 => $_->{'index2'}\n";
}
String
# Concat 2 strings
$stringC = $stringA . ucfirst($stringB);
$stringC = "$stringA$stringB";
$stringC = join('', ($stringA, ucfirst($stringB)));
Statement modifiers
The statement modifiers are
if EXPR
unless EXPR
while EXPR
until EXPR
when EXPR
for LIST
foreach LIST
Examples:
print "Basset hounds got long ears" if length $ear >= 10;
go_outside() and play() unless $is_raining;
print "Hello $_!\n" foreach qw(world Dolly nurse);
Compound Statements
The following compound statements can be used to control flow:
if (EXPR) BLOCK
if (EXPR) BLOCK else BLOCK
if (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK
unless (EXPR) BLOCK
unless (EXPR) BLOCK else BLOCK
unless (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK
LABEL while (EXPR) BLOCK
LABEL while (EXPR) BLOCK continue BLOCK
LABEL until (EXPR) BLOCK
LABEL until (EXPR) BLOCK continue BLOCK
LABEL for (EXPR; EXPR; EXPR) BLOCK
LABEL foreach VAR (LIST) BLOCK
LABEL foreach VAR (LIST) BLOCK continue BLOCK
LABEL BLOCK continue BLOCK
Examples:
foreach ( @myarray ) { print "There is $_ in my array\n"; }
foreach my $iter ( 1..10 ) { print "value1_".$iter."\n"; }
foreach my $keyname (keys %myHash) { print "key is $keyname\n"; }
Operators
Quote and quote-like operators
See perldoc for detailed information.
Customary | Generic | Meaning | Interpolates |
---|---|---|---|
'' |
q{} |
Literal | no |
"" |
qq{} |
Literal | yes |
`` |
qx{} |
Command | yes(*) |
|
qw{} |
Word list | no |
// |
m{} |
Pattern match | yes(*) |
|
qr{} |
Pattern | yes(*) |
|
s{}{} |
Substitution | yes(*) |
|
tr{}{} |
Transliteration | no (but see below) |
|
<<EOF |
here-doc | yes(*) |
- (*) unless the delimiter is
''
.
- (*) unless the delimiter is
Interpolates means that variables like $VAR
are expanded, and that escaped sequence like \n
are processed.
Also other delimiters can be used. For instance:
#Use any brackets
print q{Hello World};
print q(Hello World);
print q[Hello World];
print q<Hello World>;
#Brackets delimiters nest correctly, like
print q{Hello {my} World}; # Equivalent to 'Hello {my} World!
#We can use any non-whitespace character
print q!Hello World!;
print q|Hello World|;
print q#Hello World#;
Beware of some caveats:
$s = q{ if($a eq "}") ... }; # WRONG - } inside "}" is not nested, so quoting will stop there
$s = q #Hello World# # WRONG - Because of the whitespace, #Hello World# is taken as a comment
Regular expressions
Use /regex/
or m!regex!
(where !
can be any quoting character).
Use =~
to match a given variable, otherwise $_
is used. Use !~
to reverse the meaning of the match (i.e. must not match).
Finding matches
In SCALAR context, /regex/
returns true/false if matching is found
$myvar =~ /World/ #scalar context, returns true if $myvar contains World
/World/ #scalar context, same as above except that now it is $_ that is matched
"Hello World" =~ /World/ #scalar contect, same as above, to show that left member doesn't need to be an L-Value
Extracting matches
The grouping metacharacters ()
also allow the extraction of the parts of a string that matched. For each grouping, the part that matched inside goes into the special variables $1
, $2
... They can be used just as ordinary variables:
# extract hours, minutes, seconds
$time =~ /(\d\d):(\d\d):(\d\d)/; # match hh:mm:ss format
$hours = $1;
$minutes = $2;
$seconds = $3;
In LIST context, /regex/
with groupings will return the list of matched values ($1,$2,...) . So we could rewrite the above as:
($hours, $minutes, $second) = ($time =~ /(\d\d):(\d\d):(\d\d)/);
If the groupings in a regex are nested, $1
gets the group with the leftmost opening parenthesis, $2
the next opening parenthesis... For example, here is a complex regex and the matching variables indicated below it:
/(ab(cd|ef)((gi)|j))/; 1 2 34
Using back-references
Associated with the matching variables $1
, $2
... are the backreferences \1
, \2
... Backreferences are matching variables that can be used inside a regex:
/(\w\w\w)\s\1/; # find sequences like 'the the' in string
Note that $1
, $2
.... should only be used outside of a regex, and \1
, \2
... only inside a regex.
Search & Replace
Use s/regex/replacement/modifiers
. Use =~
to match a given variable, otherwise $_
is used.
In SCALAR context, s///
returns the number of matches, or false if no match.
$x = "Time to feed the cat!";
$x =~ s/cat/hacker/; # $x contains "Time to feed the hacker!"
Note that the matching variablle $1
, $2
can be used in the replacement string.
Some modifiers:
g
- Find all matchese
- wraps aneval{...}
around the replacement string and the evaluated result is substituted for the matched substring. Example:
# reverse all the words in a string
$x = "the cat in the hat";
$x =~ s/(\w+)/reverse $1/ge; # $x contains "eht tac ni eht tah"
The split operator
split /regex/, string
splits string into a list of substrings and returns that list. The regex determines the character sequence that string is split with respect to. For example, to extract a comma-delimited list of numbers, use
$x = "1.618,2.718, 3.142";
@const = split /,\s*/, $x; # $const[0] = '1.618', $const[1] = '2.718', $const[2] = '3.142'
If the empty regex //
is used, the string is split into individual characters. If the regex has groupings, then the list produced contains the matched substrings from the groupings as well:
$x = "/usr/bin";
@parts = split m!(/)!, $x; # $parts[0] = '' Since the first character of $x matched the regex, an initial element was prepended.
# $parts[1] = '/' The delimiter is also in the list because of the grouping (/)
# $parts[2] = 'usr'
# $parts[3] = '/' Yet a delimiter because of the grouping
# $parts[4] = 'bin'
Lookahead / Lookbehind
The lookahead and lookbehind assertions are generalizations of the anchor concept. Lookahead and lookbehind are zero-width assertions that let us specify which characters we want to test for. The lookahead assertion is denoted by (?=regexp)
and the lookbehind assertion is denoted by (?<=fixed-regexp)
. Some examples are
$x = "I catch the housecat 'Tom-cat' with catnip";
$x =~ /cat(?=\s+)/; # matches 'cat' in 'housecat'
@catwords = ($x =~ /(?<=\s)cat\w+/g); # matches, $catwords[0] = 'catch' $catwords[1] = 'catnip'
$x =~ /\bcat\b/; # matches 'cat' in 'Tom-cat'
$x =~ /(?<=\s)cat(?=\s)/; # doesn't match; no isolated 'cat' in middle of $x
Grep / Map
Use grep
on a list to return the element of that list for which the expression is true. For instance
@foo = grep(!/^#/, @bar); # Only returns line that are not comments
my @array = ("el1","gel2","el3","gel1","gel2");
my @array2 = grep {s/(.*el)/reverse $1/e} @array; # grep may also modify the elements in the returned list
Use map
on a list to apply a given expression on all elements in the list.
@chars = map(chr, @nums); # Returns the list of character corresponding to the list of of numbers
File and I/O
Basics
See perlopentut for more details.
Use open
to open a file. Special variable $!
contains the status of last operation:
open (my $in, "<", "input.txt") or die "can't open input.txt: $!";
open (my $out,">", "output.txt") or die "can't open output.txt: $!";
open (my $log,">>", "my.log") or die "can't open my.log: $!";
Read from the file using operator <>
:
my $line = <$in>; # Read one line
my @lines = <$in>; # Read all lines
while (<$in>) { # assigns each line in turn to $_
print "Just read in this line: $_";
}
TRAP!!! — while(<$in>)
≠ for(<$in>)
! See below:
while(<$in>) { #(scalar context) Read line-by-line, stop when <$in> returns undef
print "_=$_";
print "<in>=".<$in>
}
for(<$in>) { #(list) read whole file, then foreach on the resulting list
print "_=$_";
print "<in>=".<$in> # ... <$in> is always undefined (because EOF reached already!)
}
print
accepts an optional first argument specifying which filehandle to print to:
print STDERR "This is your final warning.\n";
print $out $record;
print $log $logmessage;
Finally close the file:
close $in or die "$in: $!";
There are 2 advantages to using an indirect filehandles such as my $in
:
- it eases namespace management (filehandle like
INFO
are global to the package). - an indirect filehandle automatically closes when it goes out of scope or when you undefine it.
The old way has some caveats:
open INFO, "< info.txt" or die "can't open info.txt: $!"; # Leading/trailing whitespace are stripped!
open (INFO, "<", "info.txt") || die "can't open info.txt: $!"; # Using regular FILEHANDLE - name clashes in current package !!!
Common functions / modules
Some commonly used functions / modules:
Name | Description |
---|---|
chdir (function) | Change the current working directory |
-X (function) | Various test on files, directories... pretty much like in Bash scripts. |
CWD (module) | get pathname of current working directory (provides getcwd and abs_path ).
|
File::Basename (module) | Parse file paths into directory, filename and suffix |
System / STDIN / STDOUT / STDERR
Some examples related to handling of system calls, STDIN, STDOUT and STDERR.
Read something from standard input | $line = <STDIN>;
$line = readline(*STDIN); # same thing
chomp($line = <STDIN>); # remove trailing newline
|
Read one character from STDIN | print "Press RETURN...";
$key = getc();
|
System calls | system "echo hello world!";
system qq(echo hello world!);
system $MYCMD, qw(param1), 'the name is'.getname($index);
|
Discard STDERR on Windows / Linux. Note that on Windows, we use \nul because each folder as a nul handler and we want to reduce the number of used handle | my $STDERRNULL = "2>\\nul"; #use this on windows
my $STDERRNULL = "2>/dev/null"; #use this on unix
my @Results = qx(ls somedirectory $STDERRNULL);
|
Capture STDOUT | my @ouput = `ls`;
my @ouput = qx(ls);
system("ls >output.txt");
|
Capture command exit status | my $exit_status = system("del file.txt");
|
Temporarily disable STDERR and restore it afterwards | open(SAVE_STDERR, '>&STDERR');
close(STDERR) unless $ENV{CLEARCASE_TRACE_TRIGGERS};
$exe = qx(file $ptmp) =~ /executable|bourne|commands text|\bscript/i;
open(STDERR, '>&SAVE_STDERR');
close(SAVE_STDERR);
|
Processes, Pipes, IPC
References
See
- http://perldoc.perl.org/perlipc.html
- http://docstore.mik.ua/orelly/perl/prog3/ch16_03.htm
- http://perldoc.perl.org/IO/Pipe.html
See also the page on Unix's pipes.
exec
source: perldoc
# exec - executes a system command and never returns (except if failed)
exec ('foo') or die ("Can't execute foo: $!");
exec "sort $outfile | uniq" # parsed by system's command shell - support pipes and redirect
exec '/bin/echo', 'Your arguments are: ', @ARGV; # not parsed
system
source: perldoc
# system - same as exec, but forks first
system ('foo') or die ("Can't execute foo: $?");
system "sort $outfile | uniq" # parsed by system's command shell - support pipes and redirect
system '/bin/echo', 'Your arguments are: ', @ARGV; # not parsed
open
source: perldoc
open(my $fh, "input.txt") or die $!; # for input
open(my $fh, '<', "input.txt") or die $!; # for input
open(my $fh, '>', "output.txt") or die $!; # for output
open(my $fh, '>>', "output.txt") or die $!; # for appending
open(my $fh, '+<', "output.txt") or die $!; # for input / output (open existing file)
open(my $fh, '+>', "output.txt") or die $!; # for input / output (create/rewrite new file)
open(my $fh, '+>', undef) or die $!; # anonymous temporary file
open(my $fh, '| sort') or die $!; # Equiv to fork, exec, then a waitpid when closing $fh (for the parent)
open(my $fh, 'ps aux |') or die $!;
my $childpid = open(my $fh, '|-', 'sort') or die "Can't fork: $!"; # implicit fork - child reads from STDIN
my $childpid = open(my $fh, '-|', 'ps aux') or die "Can't fork: $!"; # implicit fork - child writes to STDOUT
open my $oldstdin, "<&STDIN" # Duplicate STDIN handle
open STDINBAK, "<&", \*STDIN # Same
open STDIN, "<&READER" # Redirect STDIN from an existing handle READER
open ($fh,'>', \$variable) or die "$!"; #Since 5.8.0, PerlIO is enabled by default.
close STDOUT;
open STDOUT, '>', \$variable or die "Can't open STDOUT: $!"; #Need to close STDOUT first...
IO::Pipe
source: perldoc
use IO::Pipe;
$pipe = new IO::Pipe;
if($pid = fork()) { # Parent
$pipe->reader();
while(<$pipe>) {
...
}
}
elsif(defined $pid) { # Child
$pipe->writer();
print $pipe ...
}
use IO::Pipe;
$pipe = new IO::Pipe;
$pipe->reader(qw(ls -l));
while(<$pipe>) {
...
}
Pipes
source: O'Reilly - Programming Perl
open SPOOLER, "| cat -v | lpr -h 2>/dev/null" or die "can't fork: $!";
local $SIG{PIPE} = sub { die "spooler pipe broke" };
print SPOOLER "stuff\n";
close SPOOLER or die "bad spool: $! $?";
So to page your program's output, you'd use:
if (-t STDOUT) { # only if stdout is a terminal
my $pager = $ENV{PAGER} || 'more';
open(STDOUT, "| $pager") or die "can't fork a pager: $!";
}
END {
close(STDOUT) or die "can't close STDOUT: $!"
}
open STATUS, "netstat -an 2>/dev/null |" or die "can't fork: $!";
while (<STATUS>) {
next if /^(tcp|udp)/;
print;
}
close STATUS or die "bad netstat: $! $?";
Using pipes to talk to self. From parent to child:
if (open(TO, "|-")) {
print TO $fromparent;
}
else {
$tochild = <STDIN>;
exit; #Don't forget this!
}
From child to parent:
if (open(FROM, "-|")) {
$toparent = <FROM>;
}
else {
print STDOUT $fromchild;
exit; #Don't forget this!
}
Perlipc
source: perldoc
Using open
for IPC:
open(SPOOLER, "| cat -v | lpr -h 2>/dev/null")
|| die "can't fork: $!";
local $SIG{PIPE} = sub { die "spooler pipe broke" };
print SPOOLER "stuff\n";
close SPOOLER || die "bad spool: $! $?";
open(STATUS, "netstat -an 2>&1 |")
|| die "can't fork: $!";
while (<STATUS>) {
next if /^(tcp|udp)/;
print;
}
close STATUS || die "bad netstat: $! $?";
Safe Pipe Opens
source: perldoc
Doing a safe pipe for reading from child:
# add error processing as above
$pid = open(KID_TO_READ, "-|");
if ($pid) { # parent
while (<KID_TO_READ>) {
# do something interesting
}
close(KID_TO_READ) || warn "kid exited $?";
} else { # child
($EUID, $EGID) = ($UID, $GID); # suid only
exec($program, @options, @args)
|| die "can't exec program: $!";
# NOTREACHED
}
Doing a safe pipe for writing to child:
# add error processing as above
$pid = open(KID_TO_WRITE, "|-");
$SIG{PIPE} = sub { die "whoops, $program pipe broke" };
if ($pid) { # parent
for (@data) {
print KID_TO_WRITE;
}
close(KID_TO_WRITE) || warn "kid exited $?";
} else { # child
($EUID, $EGID) = ($UID, $GID);
exec($program, @options, @args)
|| die "can't exec program: $!";
# NOTREACHED
}
Avoiding Pipe Deadlocks
Source: perlipc (perldoc)
A child process that reads from STDIN
only exits when it gets an EOF
. This EOF
is sent only when any processes on the other side closes the handle! If there are multiple processes connected to the same pipe, having one process closes it is not enough; the last process with the pipe open must close it for the child to read EOF
.
In the example below, the grand-parent waits until child and grand-child terminate their communication. This is because WRITER
was opened using open WRITER, "|-"
, which has a special behaviour: closing it will call waitpid()
, which waits for the sub-process to exit. If the child happens to wait for the parent to do something before exiting, we have a deadlock.
$pid = open WRITER, "|-";
defined $pid or die "fork failed; $!";
if ($pid) {
if (my $sub_pid = fork()) {
close WRITER; #!!! will call waitpid(), so will wait until child exits...
# do something else...
}
else {
# write to WRITER...
close WRITER;
exit;
}
}
else {
# do something with STDIN...
exit;
}
A 1st solution is to build the pipe manually with pipe()
, fork()
, and the form of open()
which sets one file descriptor to another, as below:
pipe(READER, WRITER);
$pid = fork();
defined $pid or die "fork failed; $!";
if ($pid) {
close READER;
if (my $sub_pid = fork()) {
close WRITER; #! will not call waitpid() !
# do something else...
}
else {
# write to WRITER...
close WRITER;
exit;
}
}
else {
open STDIN, "<&READER";
close WRITER;
# do something...
exit;
}
Another solution is simply not to fork open in the first parent (i.e. don't create an handle in the process that does not need it):
$pid = fork();
if ($pid)
{
# do something else...
}
else
{
$sub_pid = open WRITER,"|-";
defined $sub_pid or die "fork failed; $!";
if ($sub_pid)
{
# write to WRITER...
close WRITER; # wait for child to exits...
}
else
{
# do something with STDIN...
exit;
}
exit;
}
Avoiding Zombie Processes
Source: Perl Cookbook
- If you don't need to record the children that have terminated (<defunct>), use:
$SIG{CHLD} = 'IGNORE';
- To keep better track of deceased children, install a SIGCHLD handler to call waitpid :
use POSIX ":sys_wait_h";
$SIG{CHLD} = \&REAPER;
sub REAPER {
my $stiff;
while (($stiff = waitpid(-1, &WNOHANG)) > 0) {
# do something with $stiff if you want
}
$SIG{CHLD} = \&REAPER; # install *after* calling waitpid
}
Sub-routines
Declaration and definition syntax:
sub NAME[(PROTO)] [: ATTRS]; # A "forward" declaration
sub NAME[(PROTO)] [: ATTRS] BLOCK # A declaration and definition
$subref = sub (PROTO) : ATTRS BLOCK; # An anonymous sub-routine, called with &$subref
Importing a sub-routine:
use MODULE qw(NAME1 NAME2 NAME3);
Calling a sub-routine:
NAME(LIST); # & is optional with parentheses.
NAME LIST; # Parentheses optional if predeclared/imported.
&NAME(LIST); # Circumvent prototypes.
&NAME; # Makes current @_ visible to called subroutine.
Examples:
sub mySub1
{
my ($param1, $param2) = @_
return $param1.$param2;
}
sub mySub2
{
my $param1 = shift
my $param2 = shift
return $param1.$param2;
}
Using default value for sub-routine parameters: sub myfunc { my($suffix) = @_ ? "@_" : "defaultvalue"; }
Modules
See Perl Module Primer.
Functions
See [1] for a detailed list of Perl functions.
Chop / Chomp
chop
removes the last character of a string. It also works on lists.
chop( my $userinput=<STDIN> ); #Chop the trailing "\n" in user input
chop( my @list=qx(ls); #Chop the trailing "\n" in the command output
chomp
removes the trailing record separator (typically \n
) of a string. It also works on lists.
chomp( my $userinput=<STDIN> ); #Chomp the trailing "\n" in user input IF PRESENT
chomp( my @list=qx(ls); #Chomp the trailing "\n" in the command output IF PRESENT
-X
The function -X can be used for various test on the files, directories... similar to the test command in Bash:
print "The file exists\n" if -e "../somefile";
print "The directory exists\n" if -d "../some/directory";
Use _ to save a system call, like in:
stat($filename);
print "Readable\n" if -r _;
print "Writable\n" if -w _;
print "Executable\n" if -x _;
print "Text\n" if -T _;
print "Binary\n" if -B _;
Since 5.9.1, operators can be stacked:
print "writable and executable\n" if -f -w -x $file; # same as -x $file && -w _ && -f _
Modules
See Core Modules for a detailed list of Perl modules. Here a list of frequently used ones:
Benchmark
See also [2].
use Benchmark;
&Benchmark::timethis(500000,'$match=qq/filename.txt/;$match=~/.*\.(.*)/');
use Benchmark;
use Time::HiRes;
sub mySubroutine {
my $time0 = new Benchmark;
...
my $time1 = new Benchmark;
print "mySubroutine benchmark: ".timestr(timediff($t1,$t0));
}
(CWD) getcwd / abs_path
The function getcwd returns the current working directory. abs_path transforms a given relative path into its equivalent canonical absolute form.
use Cwd qw(getcwd abs_path);
my $dir = getcwd();
my $abs_path = abs_path($file);
File::Basename
use File::Basename;
(name,$path,$suffix) = fileparse($fullname,@suffixlist);
$name = fileparse($fullname,@suffixlist);
$basename = basename($fullname,@suffixlist);
$dirname = dirname($fullname);
File::Find
File:Find
provides functions similar to the Unix find command for searching through directory trees doing work on each file.
use File::Find;
find(\&wanted, @directories_to_search); #depth-first search - preorder traversal - no options
sub wanted { ... }
use File::Find;
find({ wanted => \&process, follow => 1 }, '.'); #With options
sub process { ... }
use File::Find;
finddepth(\&wanted, @directories_to_search); #depth-first search - post-order traversal - no options
sub wanted { ... }
Example:
find (\&wanted, $directory);
sub wanted {
/(\.c|\.cpp|\.cxx|\.h|\.hpp|\.hxx)$/ &&
print "$_\n";
}
HTML::TagParser
An easy module to parse HTML files. See example below. More at [3].
#!/usr/bin/perl -w
use strict;
use HTML::TagParser;
my $html = HTML::TagParser->new("test.html");
my @addr = $html->getElementsByTagName("a");
foreach (@addr) {
my $attr = $_->attributes;
print "<a href=\"$attr->{'href'}\">", $_->innerText(), "</a>\n";
}
XML::DOM
Again an example to parse an XML file (from [4]). See also documentation on CPAN:
#!/usr/bin/perl -w
use strict;
use XML::DOM;
my $parser = new XML::DOM::Parser;
my $doc = $parser->parse( \*DATA ) or die "Unable to parse document";
my $root = $doc->getDocumentElement(); # safer than just getting the first
# child, in case the document has a
# DTD or start with comments
scanner($root);
sub scanner
{
my ($rt) = @_;
my $i=0;
foreach my $nde ( $rt->getChildNodes()) # yes it is anarray!
{
if ( ($nde->getNodeType() == TEXT_NODE )
&& ($nde->getData()=~ /\S/s) ) # to avoid extra white spaces
{
print $i++," TEXT /", $nde->getData(), "/\n";
}
if ($nde->getNodeType == ELEMENT_NODE)
{
print $i++, " ELEMENT ", $nde->getNodeName(), "\n";
}
scanner( $nde );
}
}
__DATA__
<methodCall>Level1 Text
<Level2a>Text at Level2a</Level2a>
<Level2b>Text at Level2b</Level2b>
</methodCall>
Examples
One-Liners
perl -ne 'print unless /^$/../^$/' input # print lines, unless blank
perl -ne 'print if ! /^$/../^$/' input # reduce runs of blank lines to a single blank line
perl -lne 'print $.; close ARGV if eof' input input # $. need to be reset (by closing ARGV) between 2 input files
perl -lne 'print for m/\b(\S+)\b/g' paragraphs # print words from file paragraphs
perl -lne 'while(m/(\S+)\s+the\s+(\S+)/g){print "$1 $2"}' paragraphs # ... while loop needed when using multiple back-references
perl -lne 'print for /id is <(\d+)>/g' # match pattern and extract backreference
perl -lne 'print $2 for /id is <(\d+)> or <(\d+)>/g' # ... print 2nd matched backreference
cat oldfile | perl -pe 's/(\d+)_/sprintf("%2.2d_",$1)/e' > newfile # evaluate regex substitutions
perl -pe 'tr /A-Za-z/\n/cs' file.txt # Tokenize a text file - c=complement, s=squash duplicates
-e commandline allows to run one line of program. -l enables automatic line-ending processing. -n wrap a while(<>) { ... }
around program (but does not print lines like with -p).
More complex examples:
#Remove duplicate words from a white-space separated list
WORDS="foo bar baz bar bar foo baz"
perl -e 'print join(" ", keys %{{ map { $_ => 1 } split (/ +/,"'"$WORDS"'") }} );'
Miscellaneous
Report a complete error message when loading a script. This hack allows for printing a custom error message + file not found-like error message (given by |
do "your script.pl"
or (print "Your error message\n$!\n" and die @_);
|
uppercase / lowercase | my $lowercase = lc "My StRiNg"; #mystring
my $uppercase = uc "My StRiNg"; #MYSTRING
my $firstcharlowercase = lcfirst "My StRiNg"; #my StRiNg
my $firstcharuppercase = ucfirst "My StRiNg"; #My StRiNg
|
Split a multiline variable/output
Method 1 - using an array variable.
my @myarray=qx(ls);
foreach (@myarray)
{
chomp;
print "The file is '$_'.\n";
}
Method 2 - using a scalar variable.
my $myscalar=qx(ls);
foreach (split /\n/,$myscalar)
{
print "The file is really '$_'\n"; # No need for chomping
}
Parsing Command Line Parameters
Command line parameters are parsed through variable ARGV.
print scalar @ARGV; #number of parameters
print $#ARGV; #... idem
print "1st param: $ARGV[0]"; #positional parameters
print "2nd param: $ARGV[1]";
print "Executable name: $0"; #Name of current executable
usage() unless defined($ARGV[0]) # defined($ARGV[0]) is true if there is a parameter
Simple version
#!/usr/bin/perl
use strict;
use warnings;
my $verbose=0;
my $projectdir;
# Parse command options (-option).
while ($#ARGV>=0 && $ARGV[0] =~ m/^\-/ ) {
$verbose=1 if $ARGV[0] =~ m/^\-v/i;
shift @ARGV;
}
# Parse mandatory parameter
usage() unless defined($ARGV[0]);
$projectdir=$ARGV[0];
# Show parsing result
print "verbose=$verbose\n";
print "projectdir=$projectdir\n";
exit 0;
sub usage {
print "Usage: $0 [options] directory\n";
print "\n";
print " Options:\n";
print " -v verbose mode\n";
exit;
}
Using GetOpt
use strict;
use Getopt::Long qw(:config no_ignore_case);
my $debug=0;
my $quiet=0;
my username;
# Parse options
GetOptions ("d|debug+" => \$debug,
"q|quiet" => \$quiet,
"u|user=s" => \$username ) || usage();
# Parse remaining parameters
my $url = $ARGV[0];
Internet
#!/usr/bin/perl
use strict;
use LWP::UserAgent;
use LWP::Debug;
use HTTP::Cookies;
use HTML::TableExtract;
my $debug = 0; # Set to 1 for debug information
my $proxy = 0;
my $username;
my $password;
my $url = $ARGV[0];
LWP::Debug::level('+') if $debug;
my $ua = LWP::UserAgent->new;
# Session cookie
my $jar = HTTP::Cookies->new ();
$ua->cookie_jar($jar);
# Enable proxy...
$ua->env_proxy if $proxy;
# Fetch the articles + url
my %articles = get_articles ( $ua, $url );
# Get starting URL....
my $res = $ua->get($url);
unless ($res->is_success) {
die "Die: " . $res->status_line, "\n";
}
exit 0;
print $res->content if $debug;
my $res = $ua->post( $url,
[
'j_username' => $username,
'j_password' => $password,
'Submit' => 'Entrer'
]
);
unless ($res->is_success) {
die "Die: ".$res->status_line, "\n";
}
sub get_articles {
my $ua = shift;
my $url = shift;
my $res = $ua->get ($url);
unless ($res->is_success) {
die "http-get failed: ".$res->status_line, "\n$url\n";
}
my $te = HTML::TableExtract->new ( slice_columns => 0,
keep_html => 1,
keep_headers => 1,
subtables => 1,
headers => [qw(Matter)] );
$te->parse($res->content);
open(my $fh, ">mpe.html") || die "Cannot create file: $!\n";
print $fh $res->content;
close($fh);
foreach my $ts ( $te->tables ) {
# print "Table (", join(',', $ts->coords), "):\n";
foreach my $row ( $ts->rows ) {
print "Row: " . join (';', @$row ). "\n";
next unless $row->[0] =~ m/\/content\/(.*)\/fulltext/;
print $1."\n";
}
}
return 0;
}
Passing filehandle as sub parameters and return values
This requires the use of a reference. First as return value:
sub openTimeOut($)
{
my $filename = shift;
my $timeout=15;
while( !open(LOG,$filename) ) { sleep 1; --$timeout or die "Time out trying to open file $filename"; }
return \*LOG;
}
sub printToFile($@)
{
my $filename = shift;
my $fh = openTimeOut(">$filename");
print $fh @_;
close($fh);
}
BUT BEWARE, actually OpenTimeOut
returns a reference to the same file handle in current glob! The code below illustrate this:
my ($to,$from) = @_;
$fhto = openTimeOut(\*TO,">>$to");
$fhfrom = openTimeOut(\*FROM,"<$from"); # This returns same FILEHANDLE reference as $fhto
while (<$fhfrom>) {print $fhto $_} # Failed, because now $fhto = $fhfrom, which only open for output
close($fhfrom);
close($fhto);
The solution, pass by parameters:
sub openTimeOut2(*;$)
{
my $fh = shift;
my $filename = shift;
my $timeout=15;
while( !open($fh,$filename) ) { sleep 1; --$timeout or die "Time out trying to open file $filename"; }
}
sub printToFile($@)
{
my $filename = shift;
openTimeOut2(\*LOG,">$filename");
print LOG @_;
close(LOG);
}
Embedding a perl script in a W2K shell script
Notice how the first rem
is actually a multiline assignment to perl array variable @rem
, where the value is quoted with ' '
.
@rem= 'PERL for Windows NT - ccperl must be in search path
@echo off
ccperl %0 %1 %2 %3 %4 %5 %6 %7 %8 %9
goto endofperl
@rem ';
# Your Perl code comes here
# End of Perl section
__END__
:endofperl
Kill on ALARM signal
Here a small Perl script stopafter that kills a process after some time using alarm (similar to timeout kill):
#!/usr/bin/perl
my ($limit, $command, @args) = @ARGV;
$SIG{ALARM} = 'DEFAULT';
alarm $limit;
exec $command, @args; # Exec preserves process signal table, but fork not!
die "Couldn't run '$command': $!";
Use it as follows:
stopafter 30 command arg arg arg...
Command will be killed after 30 sec, unless the command cancelled the alarm clock or caught/ignored signal ALRM
Pitfalls
Forgetting to chomp the trailing "\n" | my $path = qx(pwd); #NOK! trailing \n will corrupt path construction
chomp( my $path = qx(pwd) ); #OK!
|
Mixing case in name of package | # Imagine a module file named Vobs.pm
use Vobs; # OK
use VOBs; # NOK &rar; Will complain about double definition
# (but will not flag the mix case problem)
|
Operator precedence and strange behaviour | chomp my @emptylist = qx("dir"); #NOK ! @emptylist will be empty
chomp ( my @list = qx("dir") ); #OK !
|
Forgetting to use ${...} to separate variable identifier | my $variable;
print "$variable_temp\n"; # NOK! Print a variable named variable_temp
print "${variable}_temp\n"; # OK! Print a $variable, followed by "_temp"
|
STDERR redirection cannot be given as a command parameter to system | system "echo hello world! 2>\\nul"; # OK
system qq(echo hello world! 2>\\nul); # OK
system "echo", "hello world!"," 2>\\nul"; # NOK - 2>\\nul taken as a parameter
|
Forgetting local in sub-routines (see [9]). In particular pay attention that $_ is assigned e.g. in while loops
|
sub localized
{
local @ARGV = ("/etc/motd"); # OK
local $/ = undef; # OK
local $_ = <>; # OK
@Fields = split /^\s*=+\s*$/;
}
|
CPAN - Perl Packages
First time launch:
$ cpan # ... OR ...
$ perl -MCPAN -e shell # --> yes auto config
To adapt config related to proxy:
cpan> o conf init /proxy/ # (to enter an empty string, simply enter 1 space char as a value)
cpan> o conf commit
To install a Perl package (eg. here package GetOpt::Long):
$ cpan
cpan>install GetOpt::Long
cpan>force install GetOpt::Long # Install even if self-tests failed
Editing the configruation:
cpan> o conf init # Reset the configuration
cpan> o conf http_proxy http://proxy:8080/ # Edit a given variable (eg. here changing proxy settings):
cpan> o conf commit # To commit changes in the configuration
cpan> o # to get o options
cpan> o conf # To get o conf option
To edit CPAN database url:
cpan> o conf /urllist/
cpan> o conf init /urllist/
cpan> o conf urllist shift
cpan> o conf urllist unshift ftp://my.new.site/
cpan> o conf commit
To update CPAN itself:
cpan> install Bundle::CPAN
cpan> reload cpan