Perl

From miki
Jump to navigation Jump to search

Reference

  • Perldoc on local computer
% perldoc -q duplicate
  "How can I remove duplicate elements from a list or array?"
% perldoc -f split
  split /PATTERN/,EXPR,LIMIT
  split /PATTERN/,EXPR
  split /PATTERN/
  ...
% perldoc -f join
  join EXPR,LIST
The FAQ is the primary source of answer to questions like How can I do....
  • Manpages - List of highly recommended perldoc manpages (from the FAQ).
Basics          perldata, perlvar, perlsyn, perlop, perlsub
Execution       perlrun, perldebug
Functions       perlfunc
Objects         perlref, perlmod, perlobj, perltie
Data            Structures   perlref, perllol, perldsc
Modules         perlmod, perlmodlib, perlsub
Regexes         perlre, perlfunc, perlop, perllocale
Moving to perl5 perltrap, perl
Linking w/C     perlxstut, perlxs, perlcall, perlguts, perlembed
Various         http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz
                (not a man-page but still useful, a collection of various essays on Perl techniques)
  • Command-Line - Useful command-line options
-e expression
specififies perl expressions.
-p
loops over and prints input.
-n
loops over and does not print input.
-l
strip newlines on input, and adds them on output. Use this option by default, unless the newlines need special handling, or for efficiency reasons.

Quick Introduction

Program Structure

Example of a simple Hello World program:

#!/usr/bin/perl

use strict;                 # Immediately stops on potential problem - highly recommended for simplified debugging
use warning;                # Warnings                               - highly recommended for simplified debugging

print "Hello, World!\n";

exit 0;

Data Types

$ for scalar values (number, string or reference)
@ for arrays
% for hashes (associative arrays)
& for subroutines (aka functions, procedures, methods)
* for all types of that symbol name. In version 4 you used them like pointers, but in modern perls you can just use references.
<> are used for inputting a record from a filehandle.
\ takes a reference to something.

Note that the last 2 are not really type specifiers.

Arrays

Some example

my @array1 = ("titi","tutu");           # (...) is an array constructor
my @array2 = ("tata","toto");
push(@array1,"tete");                   # Append an element to an array
push(@array1,@array2);                  # Append another array to an array
print $array1[1];                       # 2nd element of array array1
print $#array1;                         # The last index of array array1
@array1 = ();                           # Truncate array
$#array = -1;                           # ... same effect
print scalar( @array1 );                # The length of the array array1
print $#array1 + 1;                     # ... same effect (since perl 5, $[ deprecated)

my $arrayref = ["foo","bar"]            # [...] anonymous array constructor, returns a reference
print $arrayref                         # ARRAY(0x1e39cb8)   - print the reference
print @$arrayref                        # foobar             - dereference the reference

Arrays can be easily constructed through autovivification. Below we create a hash of arrays

my %Projects;                                       # Projects is a hash, but we say nothing on the types of its elements...
foreach my $VOBName (keys %VOBs)
{
	my $ProjectName = $VOBs{$VOBName}{'ProjectName'};
	push(@{$Projects{$ProjectName}}, $VOBName);     # <-- we dereference value returned by $Projects{$ProjectName} as
}                                                   #     an array, hence creating automatically an array if undef

Below some difference of handling @ in SCALAR or LIST context:

#                                            RESULT      CONTEXT  EXPLANATION
my @a = ("titi","tutu");
my $varnoquote=@a; print "$varnoquote\n"; # "2"         (SCALAR - @_ is evaluated in scalar context)
my $varquote="@a"; print "$varquote\n";   # "titi tutu" (EXPAND - @_ is quote-expanded, each item being separated by space)
print @a; print"\n";                      # "tititutu"  (LIST   - $, is empty)
print(@a); print"\n";                     # "tititutu"  (LIST   - $, is empty)
printf @a; print"\n";                     # "titi"      (LIST   - 1st element in list is interpreted as the format string)
printf(@a); print"\n";                    # "titi"      (LIST   - 1st element in list is interpreted as the format string)
print @a,"\n";                            # "tititutu"  (LIST   - $, is empty)
printf "%s\n",@a;                         # "titi"      (LIST   - only 1st element is read)

Set variable $, to modify the list separator used when printing arrays

my @a = ("titi","tutu");
$,="\n";
print @a;

Hashes

Some example of hashes:

my %cities = (                                        # (...) is a hash constructor
    "US"   => "Washington",
    "GB"   => "London"
);
print $cities{"US"},"\n";

my %hashofhash = (                                    # This is actually a hash of references to hash (note the {...})
    "address"  => {name      => "US",
                   city      => "Washington" },
    "identity" => {firstname => "smith",
                   lastname  => "Smith"      } );
print $hashofhash{"address"}{"name"},"\n";
print $hashofhash{"address"}->{"name"},"\n";

keys(%users) = 1000;		                      # allocate 1024 buckets
print scalar(keys %users),"\n";                       # length of the hash

Note that in LIST context, a hash is transformed into an array containing both the keys and values in the hash!

my %myhash = ( key1 => "value1", key2 => "value2" );
my @myarray= ( "element1", "element2" );
push (@myarray, %myhash);

$, = ",";
print @myarray;                                     # outputs "element1, element2, key2, value2, key1, value1"

References

Reference: http://perldoc.perl.org/perlreftut.html

Make Rule 1

Put a \ in front of a variable to get a reference

$aref = \@array;         # $aref now holds a reference to @array
$href = \%hash;          # $href now holds a reference to %hash
$sref = \$scalar;        # $sref now holds a reference to $scalar

Now, we can use any reference as a regular scalar value:

$xy = $aref;             # $xy now holds a reference to @array
$p[3] = $href;           # $p[3] now holds a reference to %hash
$z = $p[3];              # $z now holds a reference to %hash
Make Rule 2

[ ITEMS ] makes a new, anonymous array, and return a reference to that array. { ITEMS } makes a new, anonymous hash, and returns a reference to that hash.

$aref = [ 1, "foo", undef, 13 ];         # $aref now holds a reference to an array
$href = { APR => 4, AUG => 8 };          # $href now holds a reference to a hash

$aref = [ 1, 2, 3 ];                     # This is the same as ...

@array = (1, 2, 3);                      # ... this
Use Rule 1

You can use an array reference, in curly brace in place of the name of an array:

$aref = \@array;
@{$aref}                # same as @a           An array
reverse @{$aref}        # same as reverse @a   Reverse the array
${$aref}[3]             # same as $a[3]        An element of the array
${$aref}[3] = 17        # same as $a[3] = 17   Assigning an element

The same applies to hashes:

$href = \@array;
%{$href}                # same as %h              A hash
keys %{$href}           # same as keys @h         Get the keys from the hash
${$href}{'red'}         # same as $h{'red'}       An element of the hash
${$href}{'red'} = 17    # same as $h{'red'} = 17  Assigning an element
Use Rule 2

Second rule is an abbreviation of first rule above when all we want is to extract a single element. It uses the arrow -> notation:

$aref->[3]         # Same as ${$aref}[3]
$href->{red}       # Same as ${$href}{red}

# Don't confuse the following
$aref->[3]         # Fourth element of an array referred to by $aref
$aref[3]           # Fourth element of an array deceptively named @aref

# nor the following
$href->{red}       # Part of the hash referred to by $href
$href{ref}         # Part of the deceptively named %href hash
Arrow rule

In between two subscripts, the arrow is optional

$a[1][2]           # Same as $a[1]->[2], which is same as ${$a[1]}[2]
$x[1][2][3]        # Same as $x[1]->[2]->[3], which is same as ${${$x[1]}[2]}[3]
More details

Reference: http://perldoc.perl.org/perlref.html

  • In Use rule 1, you can omit the curly brackets whenever the thing inside them is an atomic scalar variable like $aref
@$aref         # same as @{$aref}
$$aref[1]      # same as ${$aref}[1]


  • Use anonmymous array constructor to make deep copies of array references:
$aref2 = $aref1      # This does NOT copy the array
$aref2 = [@$aref1]   # This DOES copy the array

The same applies for copying an anonymous hash:

$href2 = [%$href1]   # This DOES copy the hash
  • Use function ref to test if a variable contains a reference. It returns HASH for hash reference, and ARRAY for array reference, which both evaluate to true.
  • If when using a reference as a string, you get strings like ARRAY(0x80f5dec) or HASH(0x826fc0), it means you printed a reference by mistake.
A side effect is that you can use eq to see if two references refer to the same thing. But using == is much faster.
  • You can use a string as if it were a reference. If you use the string "foo" as an array reference, it's taken to be a reference to the array @foo This is called a soft reference or symbolic reference. The declaration use strict 'refs' disables this feature, which can cause all sorts of trouble if you use it by accident.
Some examples
my $VOBAttrRef = $VOBs{'AdminMask'};                # This return a reference to a Hash
my %VOBAttr = %$VOBAttrRef;                         # This dereference the reference above and return a Hash

print $VOBAttr{'ProjectName'},"\n";                 # We can use our new Hash variable
print $$VOBAttrRef{'ProjectName'},"\n";             #  ... or we can dereference our reference variable using the $$ construct
print $VOBAttrRef->{'ProjectName'},"\n";            #  ... but -> can also be used to dereference
print $VOBs{'AdminMask'}->{'ProjectName'},"\n";     # We can also skip altoghether the reference variable
print $VOBs{'AdminMask'}{'ProjectName'},"\n";       #  ... This notation is also available as a shortcut, -> can be omitted

Passing reference to sub-routines:

$tab{'somekey'} = '...';
process(\$tab);
sub process ()
{
	my $tab = $_[0];
	$tab->{'somekey'} = '...';
}

Using Anonymous Hash References:

#!/usr/bin/perl
use strict;
my @myarray;

foreach my $iter ( 1..10 )
{
	my $value1 = "value1_".$iter;
	my $value2 = "value2_".$iter;

   	print "Creating our \$hashref... ";
   	my $hashref = { index1 => $value1, index2 => $value2 }; # { key1 => value1, ... } creates a REFERENCE to an anonymous hash.
                                                            # Since reference are SCALAR, we assign it to a scalar variable
	print "Done.\n",
	      "  \$hashref: ",$hashref,"\n";
	print "  content: ",$$hashref{'index1'},",",$$hashref{'index2'},"\n";

	print "Adding \$hashref to our array... ";
	push( @myarray, $hashref );

	print "Done. There are currently ", scalar(@myarray), " elements in \@myarray.\n";
	print "Accessing last element of our array...";
	print " content: @myarray[$#myarray], ${@myarray[$#myarray]}{'index1'} or better yet @myarray[$#myarray]->{'index2'}\n";
}

print "\n\nNow we will traverse our array again...\n";
foreach ( @myarray )
{
	print 	"$_ containing ",
	 		"index1 => $$_{'index1'},",
			"index2 => $$_{'index2'}\n";
	print 	"... or using -> operator: ",
	 		"index1 => $_->{'index1'},",
			"index2 => $_->{'index2'}\n";
}

String

# Concat 2 strings
$stringC = $stringA . ucfirst($stringB);
$stringC = "$stringA$stringB";
$stringC = join('', ($stringA, ucfirst($stringB)));

Statement modifiers

The statement modifiers are

if EXPR
unless EXPR
while EXPR
until EXPR
when EXPR
for LIST
foreach LIST

Examples:

print "Basset hounds got long ears" if length $ear >= 10;
go_outside() and play() unless $is_raining;
print "Hello $_!\n" foreach qw(world Dolly nurse);

Compound Statements

The following compound statements can be used to control flow:

if (EXPR) BLOCK
if (EXPR) BLOCK else BLOCK
if (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK
unless (EXPR) BLOCK
unless (EXPR) BLOCK else BLOCK
unless (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK
LABEL while (EXPR) BLOCK
LABEL while (EXPR) BLOCK continue BLOCK
LABEL until (EXPR) BLOCK
LABEL until (EXPR) BLOCK continue BLOCK
LABEL for (EXPR; EXPR; EXPR) BLOCK
LABEL foreach VAR (LIST) BLOCK
LABEL foreach VAR (LIST) BLOCK continue BLOCK
LABEL BLOCK continue BLOCK

Examples:

foreach ( @myarray ) { print "There is $_ in my array\n"; }
foreach my $iter ( 1..10 ) { print "value1_".$iter."\n"; }
foreach my $keyname (keys %myHash) { print "key is $keyname\n"; }

Operators

Boolean

# String - see perldoc perlop - 
# See https://stackoverflow.com/questions/1175390/how-do-i-compare-two-strings-in-perl
eq     True if two strings identical
eq     True if two strings different
cmp    Return -1, 0, +1 when comparing two strings
~~     Smart match
lt, le, ge, gt    Compare using locale


Quote and quote-like operators

See perldoc for detailed information.

Customary Generic Meaning Interpolates
'' q{} Literal no
"" qq{} Literal yes
`` qx{} Command yes(*)
qw{} Word list no
// m{} Pattern match yes(*)
qr{} Pattern yes(*)
s{}{} Substitution yes(*)
tr{}{} Transliteration no (but see below)
<<EOF here-doc yes(*)
(*) unless the delimiter is ''.

Interpolates means that variables like $VAR are expanded, and that escaped sequence like \n are processed.
Also other delimiters can be used. For instance:

#Use any brackets
print q{Hello World};
print q(Hello World);
print q[Hello World];
print q<Hello World>;
#Brackets delimiters nest correctly, like
print q{Hello {my} World};    # Equivalent to 'Hello {my} World!
#We can use any non-whitespace character
print q!Hello World!;
print q|Hello World|;
print q#Hello World#;

Beware of some caveats:

$s = q{ if($a eq "}") ... }; # WRONG - } inside "}" is not nested, so quoting will stop there
$s = q #Hello World#         # WRONG - Because of the whitespace, #Hello World# is taken as a comment

Regular expressions

Use /regex/ or m!regex! (where ! can be any quoting character).
Use =~ to match a given variable, otherwise $_ is used. Use !~ to reverse the meaning of the match (i.e. must not match).

Finding matches

In SCALAR context, /regex/ returns true/false if matching is found

$myvar =~ /World/               #scalar context, returns true if $myvar contains World
/World/                         #scalar context, same as above except that now it is $_ that is matched
"Hello World" =~ /World/        #scalar contect, same as above, to show that left member doesn't need to be an L-Value

Extracting matches

The grouping metacharacters () also allow the extraction of the parts of a string that matched. For each grouping, the part that matched inside goes into the special variables $1 , $2... They can be used just as ordinary variables:

# extract hours, minutes, seconds
$time =~ /(\d\d):(\d\d):(\d\d)/;  # match hh:mm:ss format
$hours = $1;
$minutes = $2;
$seconds = $3;

In LIST context, /regex/ with groupings will return the list of matched values ($1,$2,...) . So we could rewrite the above as:

($hours, $minutes, $second) = ($time =~ /(\d\d):(\d\d):(\d\d)/);

If the groupings in a regex are nested, $1 gets the group with the leftmost opening parenthesis, $2 the next opening parenthesis... For example, here is a complex regex and the matching variables indicated below it:

/(ab(cd|ef)((gi)|j))/;
 1  2      34

Using back-references

Associated with the matching variables $1, $2... are the backreferences \1, \2... Backreferences are matching variables that can be used inside a regex:

    /(\w\w\w)\s\1/; # find sequences like 'the the' in string

Note that $1, $2.... should only be used outside of a regex, and \1, \2... only inside a regex.

Search & Replace

Use s/regex/replacement/modifiers. Use =~ to match a given variable, otherwise $_ is used.

In SCALAR context, s/// returns the number of matches, or false if no match.

$x = "Time to feed the cat!";
$x =~ s/cat/hacker/;   # $x contains "Time to feed the hacker!"

Note that the matching variablle $1, $2 can be used in the replacement string.

Some modifiers:

  • g - Find all matches
  • e - wraps an eval{...} around the replacement string and the evaluated result is substituted for the matched substring. Example:
# reverse all the words in a string
$x = "the cat in the hat";
$x =~ s/(\w+)/reverse $1/ge;    # $x contains "eht tac ni eht tah"

The split operator

split /regex/, string splits string into a list of substrings and returns that list. The regex determines the character sequence that string is split with respect to. For example, to extract a comma-delimited list of numbers, use

$x = "1.618,2.718,   3.142";
@const = split /,\s*/, $x;      # $const[0] = '1.618', $const[1] = '2.718', $const[2] = '3.142'

If the empty regex // is used, the string is split into individual characters. If the regex has groupings, then the list produced contains the matched substrings from the groupings as well:

$x = "/usr/bin";
@parts = split m!(/)!, $x;      # $parts[0] = ''    Since the first character of $x matched the regex, an initial element was prepended.
                                # $parts[1] = '/'   The delimiter is also in the list because of the grouping (/)
                                # $parts[2] = 'usr'
                                # $parts[3] = '/'   Yet a delimiter because of the grouping
                                # $parts[4] = 'bin'

Lookahead / Lookbehind

The lookahead and lookbehind assertions are generalizations of the anchor concept. Lookahead and lookbehind are zero-width assertions that let us specify which characters we want to test for. The lookahead assertion is denoted by (?=regexp) and the lookbehind assertion is denoted by (?<=fixed-regexp). Some examples are

$x = "I catch the housecat 'Tom-cat' with catnip";
$x =~ /cat(?=\s+)/;                                # matches 'cat' in 'housecat'
@catwords = ($x =~ /(?<=\s)cat\w+/g);              # matches, $catwords[0] = 'catch' $catwords[1] = 'catnip'
$x =~ /\bcat\b/;                                   # matches 'cat' in 'Tom-cat'
$x =~ /(?<=\s)cat(?=\s)/;                          # doesn't match; no isolated 'cat' in middle of $x

Grep / Map

Use grep on a list to return the element of that list for which the expression is true. For instance

@foo = grep(!/^#/, @bar);                           # Only returns line that are not comments
my @array = ("el1","gel2","el3","gel1","gel2");
my @array2 = grep {s/(.*el)/reverse $1/e} @array;   # grep may also modify the elements in the returned list

Use map on a list to apply a given expression on all elements in the list.

@chars = map(chr, @nums);						# Returns the list of character corresponding to the list of of numbers

File and I/O

Basics

See perlopentut for more details.

Use open to open a file. Special variable $! contains the status of last operation:

open (my $in, "<", "input.txt") or die "can't open input.txt: $!";
open (my $out,">", "output.txt") or die "can't open output.txt: $!";
open (my $log,">>", "my.log") or die "can't open my.log: $!";

Read from the file using operator <>:

my $line  = <$in>;                           # Read one line
my @lines = <$in>;                           # Read all lines
while (<$in>) {                              # assigns each line in turn to $_
    print "Just read in this line: $_";
}

TRAP!!!while(<$in>)for(<$in>)! See below:

while(<$in>) {                #(scalar context) Read line-by-line, stop when <$in> returns undef
    print "_=$_";
    print "<in>=".<$in>
}
for(<$in>) {                  #(list) read whole file, then foreach on the resulting list
    print "_=$_";
    print "<in>=".<$in>       # ... <$in> is always undefined (because EOF reached already!)
}

print accepts an optional first argument specifying which filehandle to print to:

print STDERR "This is your final warning.\n";
print $out $record;
print $log $logmessage;

Finally close the file:

close $in or die "$in: $!";

There are 2 advantages to using an indirect filehandles such as my $in:

  • it eases namespace management (filehandle like INFO are global to the package).
  • an indirect filehandle automatically closes when it goes out of scope or when you undefine it.

The old way has some caveats:

open INFO,  "< info.txt" or die "can't open info.txt: $!";       # Leading/trailing whitespace are stripped!
open (INFO, "<", "info.txt") || die "can't open info.txt: $!";   # Using regular FILEHANDLE - name clashes in current package !!!


To 'slurp' a binary file ([1]):

#! /usr/local/bin/perl

my $content;
open(my $fh, '<', 'test.raw') or die "cannot open file $filename";
{
    local $/;
    $content = <$fh>;
}
close($fh);

Another solution:

use File::Slurp;
my $text = read_file($filename);

Common functions / modules

Some commonly used functions / modules:

Name Description
chdir (function) Change the current working directory
-X (function) Various test on files, directories... pretty much like in Bash scripts.
CWD (module) get pathname of current working directory (provides getcwd and abs_path).
File::Basename (module) Parse file paths into directory, filename and suffix

System / STDIN / STDOUT / STDERR

Some examples related to handling of system calls, STDIN, STDOUT and STDERR.

Read something from standard input
$line = <STDIN>;
$line = readline(*STDIN);       # same thing
chomp($line = <STDIN>);         # remove trailing newline
Read one character from STDIN
print "Press RETURN...";
$key = getc();
System calls
system "echo hello world!";
system qq(echo hello world!);
system $MYCMD, qw(param1), 'the name is'.getname($index);
Discard STDERR on Windows / Linux. Note that on Windows, we use \nul because each folder as a nul handler and we want to reduce the number of used handle
my $STDERRNULL = "2>\\nul";             #use this on windows
my $STDERRNULL = "2>/dev/null";         #use this on unix

my @Results =	qx(ls somedirectory $STDERRNULL);
Capture STDOUT
my @ouput = `ls`;
my @ouput = qx(ls);
system("ls >output.txt");
Capture command exit status
my $exit_status = system("del file.txt");
Temporarily disable STDERR and restore it afterwards
open(SAVE_STDERR, '>&STDERR');
close(STDERR) unless $ENV{CLEARCASE_TRACE_TRIGGERS};
$exe = qx(file $ptmp) =~ /executable|bourne|commands text|\bscript/i;
open(STDERR, '>&SAVE_STDERR');
close(SAVE_STDERR);

Processes, Pipes, IPC

References

See

See also the page on Unix's pipes.

exec

source: perldoc

# exec - executes a system command and never returns (except if failed)
exec ('foo') or die ("Can't execute foo: $!");
exec "sort $outfile | uniq"                                  # parsed by system's command shell - support pipes and redirect
exec '/bin/echo', 'Your arguments are: ', @ARGV;             # not parsed

system

source: perldoc

# system - same as exec, but forks first
system ('foo') or die ("Can't execute foo: $?");
system "sort $outfile | uniq"                                # parsed by system's command shell - support pipes and redirect
system '/bin/echo', 'Your arguments are: ', @ARGV;           # not parsed

open

source: perldoc

open(my $fh, "input.txt") or die $!;                         # for input
open(my $fh, '<', "input.txt") or die $!;                    # for input
open(my $fh, '>', "output.txt") or die $!;                   # for output
open(my $fh, '>>', "output.txt") or die $!;                  # for appending
open(my $fh, '+<', "output.txt") or die $!;                  # for input / output (open existing file)
open(my $fh, '+>', "output.txt") or die $!;                  # for input / output (create/rewrite new file)

open(my $fh, '+>', undef) or die $!;                         # anonymous temporary file

open(my $fh, '| sort') or die $!;                            # Equiv to fork, exec, then a waitpid when closing $fh (for the parent)
open(my $fh, 'ps aux |') or die $!;

my $childpid = open(my $fh, '|-', 'sort') or die "Can't fork: $!";      # implicit fork - child reads from STDIN
my $childpid = open(my $fh, '-|', 'ps aux') or die "Can't fork: $!";    # implicit fork - child writes to STDOUT

open my $oldstdin, "<&STDIN"                                 # Duplicate STDIN handle
open STDINBAK, "<&", \*STDIN                                 # Same
open STDIN, "<&READER"                                       # Redirect STDIN from an existing handle READER

open ($fh,'>', \$variable) or die "$!";                      #Since 5.8.0, PerlIO is enabled by default.
close STDOUT;
open STDOUT, '>', \$variable or die "Can't open STDOUT: $!"; #Need to close STDOUT first...

IO::Pipe

source: perldoc

use IO::Pipe;
$pipe = new IO::Pipe;
if($pid = fork()) {			# Parent
    $pipe->reader();
    while(<$pipe>) {
        ...
    }
}
elsif(defined $pid) {		        # Child
    $pipe->writer();
    print $pipe ...
}
use IO::Pipe;
$pipe = new IO::Pipe;
$pipe->reader(qw(ls -l));
while(<$pipe>) {
    ...
}

Pipes

source: O'Reilly - Programming Perl

open SPOOLER, "| cat -v | lpr -h 2>/dev/null" or die "can't fork: $!";
local $SIG{PIPE} = sub { die "spooler pipe broke" };
print SPOOLER "stuff\n";
close SPOOLER or die "bad spool: $! $?";

So to page your program's output, you'd use:

if (-t STDOUT) {                                                   # only if stdout is a terminal
    my $pager = $ENV{PAGER} || 'more';
    open(STDOUT, "| $pager")    or die "can't fork a pager: $!";
}
END {
    close(STDOUT)               or die "can't close STDOUT: $!"
}

open STATUS, "netstat -an 2>/dev/null |" or die "can't fork: $!";
while (<STATUS>) {
    next if /^(tcp|udp)/;
    print;
}
close STATUS or die "bad netstat: $! $?";

Using pipes to talk to self. From parent to child:

if (open(TO, "|-")) {
    print TO $fromparent;
}
else {
    $tochild = <STDIN>;
    exit;					#Don't forget this!
}

From child to parent:

if (open(FROM, "-|")) {
    $toparent = <FROM>;
}
else {
    print STDOUT $fromchild;
    exit;                                       #Don't forget this!
}

Perlipc

source: perldoc

Using open for IPC:

open(SPOOLER, "| cat -v | lpr -h 2>/dev/null")
    || die "can't fork: $!";
local $SIG{PIPE} = sub { die "spooler pipe broke" };
print SPOOLER "stuff\n";
close SPOOLER || die "bad spool: $! $?";
open(STATUS, "netstat -an 2>&1 |")
    || die "can't fork: $!";
while (<STATUS>) {
    next if /^(tcp|udp)/;
    print;
}
close STATUS || die "bad netstat: $! $?";

Safe Pipe Opens

source: perldoc

Doing a safe pipe for reading from child:

# add error processing as above
$pid = open(KID_TO_READ, "-|");
if ($pid) {   # parent
    while (<KID_TO_READ>) {
        # do something interesting
    }
    close(KID_TO_READ) || warn "kid exited $?";
} else {      # child
    ($EUID, $EGID) = ($UID, $GID); # suid only
    exec($program, @options, @args)
        || die "can't exec program: $!";
    # NOTREACHED
}

Doing a safe pipe for writing to child:

# add error processing as above
$pid = open(KID_TO_WRITE, "|-");
$SIG{PIPE} = sub { die "whoops, $program pipe broke" };
if ($pid) {  # parent
    for (@data) {
        print KID_TO_WRITE;
    }
    close(KID_TO_WRITE) || warn "kid exited $?";
} else {     # child
    ($EUID, $EGID) = ($UID, $GID);
    exec($program, @options, @args)
        || die "can't exec program: $!";
    # NOTREACHED
}

Avoiding Pipe Deadlocks

Source: perlipc (perldoc)

A child process that reads from STDIN only exits when it gets an EOF. This EOF is sent only when any processes on the other side closes the handle! If there are multiple processes connected to the same pipe, having one process closes it is not enough; the last process with the pipe open must close it for the child to read EOF.

In the example below, the grand-parent waits until child and grand-child terminate their communication. This is because WRITER was opened using open WRITER, "|-", which has a special behaviour: closing it will call waitpid(), which waits for the sub-process to exit. If the child happens to wait for the parent to do something before exiting, we have a deadlock.

$pid = open WRITER, "|-";
defined $pid or die "fork failed; $!";
if ($pid) {
    if (my $sub_pid = fork()) {
        close WRITER;					#!!! will call waitpid(), so will wait until child exits...
        # do something else...
    }
    else {
        # write to WRITER...
        close WRITER;
        exit;
    }
}
else {
    # do something with STDIN...
    exit;
}

A 1st solution is to build the pipe manually with pipe(), fork(), and the form of open() which sets one file descriptor to another, as below:

pipe(READER, WRITER);
$pid = fork();
defined $pid or die "fork failed; $!";
if ($pid) {
    close READER;
    if (my $sub_pid = fork()) {
        close WRITER;                          #! will not call waitpid() !
        # do something else...           
    }
    else {
        # write to WRITER...
        close WRITER;
        exit;
    }
}
else {
    open STDIN, "<&READER";
    close WRITER;
    # do something...
    exit;
}

Another solution is simply not to fork open in the first parent (i.e. don't create an handle in the process that does not need it):

$pid = fork();
if ($pid)
{
    # do something else...
}
else
{
    $sub_pid = open WRITER,"|-";
    defined $sub_pid or die "fork failed; $!";
    if ($sub_pid)
    {
        # write to WRITER...
        close WRITER;                               # wait for child to exits...
    }
    else
    {
        # do something with STDIN...
        exit;
    }
    exit;
}

Avoiding Zombie Processes

Source: Perl Cookbook

  • If you don't need to record the children that have terminated (<defunct>), use:
$SIG{CHLD} = 'IGNORE';
  • To keep better track of deceased children, install a SIGCHLD handler to call waitpid :
use POSIX ":sys_wait_h";
$SIG{CHLD} = \&REAPER; 
sub REAPER {
    my $stiff;
    while (($stiff = waitpid(-1, &WNOHANG)) > 0) {
                                            # do something with $stiff if you want
    }
    $SIG{CHLD} = \&REAPER;                  # install *after* calling waitpid 
}

Sub-routines

Declaration and definition syntax:

sub NAME[(PROTO)] [: ATTRS];            #  A "forward" declaration
sub NAME[(PROTO)] [: ATTRS] BLOCK       #  A declaration and definition
$subref = sub (PROTO) : ATTRS BLOCK;    #  An anonymous sub-routine, called with &$subref

Importing a sub-routine:

use MODULE qw(NAME1 NAME2 NAME3);

Calling a sub-routine:

NAME(LIST);	   # & is optional with parentheses.
NAME LIST;	   # Parentheses optional if predeclared/imported.
&NAME(LIST);   # Circumvent prototypes.
&NAME;	       # Makes current @_ visible to called subroutine.

Examples:

sub mySub1
{
	my ($param1, $param2) = @_
	return $param1.$param2;
}

sub mySub2
{
	my $param1 = shift
	my $param2 = shift
	return $param1.$param2;
}

Using default value for sub-routine parameters: sub myfunc { my($suffix) = @_ ? "@_" : "defaultvalue"; }

Modules

See Perl Module Primer.

Functions

See [2] for a detailed list of Perl functions.

Chop / Chomp

chop removes the last character of a string. It also works on lists.

chop( my $userinput=<STDIN> );                  #Chop the trailing "\n" in user input
chop( my @list=qx(ls);                          #Chop the trailing "\n" in the command output

chomp removes the trailing record separator (typically \n) of a string. It also works on lists.

chomp( my $userinput=<STDIN> );	                #Chomp the trailing "\n" in user input IF PRESENT
chomp( my @list=qx(ls);                         #Chomp the trailing "\n" in the command output IF PRESENT

-X

The function -X can be used for various test on the files, directories... similar to the test command in Bash:

print "The file exists\n" if -e "../somefile";
print "The directory exists\n" if -d "../some/directory";

Use _ to save a system call, like in:

stat($filename);
print "Readable\n" if -r _;
print "Writable\n" if -w _;
print "Executable\n" if -x _;
print "Text\n" if -T _;
print "Binary\n" if -B _;

Since 5.9.1, operators can be stacked:

print "writable and executable\n" if -f -w -x $file;       # same as -x $file && -w _ && -f _

Modules

See Core Modules for a detailed list of Perl modules. Here a list of frequently used ones:

Benchmark

See also [3].

use Benchmark;
&Benchmark::timethis(500000,'$match=qq/filename.txt/;$match=~/.*\.(.*)/');
use Benchmark;
use Time::HiRes;

sub mySubroutine {
    my $time0 = new Benchmark;
    ...
    my $time1 = new Benchmark;
    print "mySubroutine benchmark: ".timestr(timediff($t1,$t0));
}

(CWD) getcwd / abs_path

The function getcwd returns the current working directory. abs_path transforms a given relative path into its equivalent canonical absolute form.

use Cwd qw(getcwd abs_path);
my $dir = getcwd();
my $abs_path = abs_path($file);

File::Basename

use File::Basename;
(name,$path,$suffix) = fileparse($fullname,@suffixlist);
$name                = fileparse($fullname,@suffixlist);
$basename            = basename($fullname,@suffixlist);
$dirname             = dirname($fullname);

File::Find

File:Find provides functions similar to the Unix find command for searching through directory trees doing work on each file.

use File::Find;
find(\&wanted, @directories_to_search);                 #depth-first search - preorder traversal - no options
sub wanted { ... }

use File::Find;
find({ wanted => \&process, follow => 1 }, '.');        #With options
sub process { ... }

use File::Find;
finddepth(\&wanted, @directories_to_search);            #depth-first search - post-order traversal - no options
sub wanted { ... }

Example:

find (\&wanted, $directory);
sub wanted {
    /(\.c|\.cpp|\.cxx|\.h|\.hpp|\.hxx)$/ &&
        print "$_\n";
}

HTML::TagParser

An easy module to parse HTML files. See example below. More at [4].

#!/usr/bin/perl -w
use strict;
use HTML::TagParser;

my $html = HTML::TagParser->new("test.html");
my @addr = $html->getElementsByTagName("a");
foreach (@addr) {
  my $attr = $_->attributes;
  print "<a href=\"$attr->{'href'}\">", $_->innerText(), "</a>\n";
}

XML::DOM

Again an example to parse an XML file (from [5]). See also documentation on CPAN:

#!/usr/bin/perl -w
use strict;
use XML::DOM;

my $parser = new XML::DOM::Parser;
my $doc = $parser->parse( \*DATA ) or die "Unable to parse document";
my $root = $doc->getDocumentElement(); # safer than just getting the first 
                                       # child, in case the document has a
                                       # DTD or start with comments 
scanner($root);

sub scanner
{
   my ($rt) = @_;
   my $i=0;
   foreach my $nde ( $rt->getChildNodes())         # yes it is anarray!
   {

       if (   ($nde->getNodeType() == TEXT_NODE ) 
           && ($nde->getData()=~ /\S/s) )          # to avoid extra white spaces
       {
           print $i++," TEXT /", $nde->getData(), "/\n";
       }

       if ($nde->getNodeType == ELEMENT_NODE)
       {
           print $i++, " ELEMENT ", $nde->getNodeName(), "\n";
       }
       scanner( $nde );
   }
}

__DATA__
<methodCall>Level1 Text
       <Level2a>Text at Level2a</Level2a>
       <Level2b>Text at Level2b</Level2b>
   </methodCall>

Tips and How-Tos

One-Liners

See [6], [7], [8], [9].

perl -ne 'print unless /^$/../^$/' input                              # print lines, unless blank
perl -ne 'print if ! /^$/../^$/' input                                # reduce runs of blank lines to a single blank line
perl -lne 'print $.; close ARGV if eof' input input                   # $. need to be reset (by closing ARGV) between 2 input files
perl -lne 'print for m/\b(\S+)\b/g' paragraphs                        # print words from file paragraphs
perl -lne 'while(m/(\S+)\s+the\s+(\S+)/g){print "$1 $2"}' paragraphs  # ... while loop needed when using multiple back-references
perl -lne 'print for /id is <(\d+)>/g'                                # match pattern and extract backreference
perl -lne 'print $2 for /id is <(\d+)> or <(\d+)>/g'                  # ... print 2nd matched backreference
perl -lne 'print "$2 $1" if /id is <(\d+)> or <(\d+)>/g'              # Rearrange backreferences (!if)
cat oldfile | perl -pe 's/(\d+)_/sprintf("%2.2d_",$1)/e' > newfile    # evaluate regex substitutions
perl -pe 'tr /A-Za-z/\n/cs' file.txt                                  # Tokenize a text file - c=complement, s=squash duplicates

-e commandline allows to run one line of program. -l enables automatic line-ending processing. -n wrap a while(<>) { ... } around program (but does not print lines like with -p).

The following one-liner simply emulates sed, which can be useful when using PCRE regex (like when using look-ahead or look-behind assertions):

perl -ne 's/before/after/g; print;' input

More complex examples:

#Remove duplicate words from a white-space separated list
WORDS="foo bar baz bar bar foo baz"
perl -e 'print join(" ", keys %{{ map { $_ => 1 } split (/ +/,"'"$WORDS"'") }} );'

Miscellaneous

Report a complete error message when loading a script.

This hack allows for printing a custom error message + file not found-like error message (given by $!) + syntax error messages (@_)

do "your script.pl"
	or (print "Your error message\n$!\n" and die @_);
uppercase / lowercase
my $lowercase = lc "My StRiNg";                 #mystring
my $uppercase = uc "My StRiNg";	                #MYSTRING
my $firstcharlowercase = lcfirst "My StRiNg";   #my StRiNg
my $firstcharuppercase = ucfirst "My StRiNg";   #My StRiNg
Detect operating system (see man perlvar)
print "On cygwin\n" if $^O =~ "cygwin";
print "On linux\n" if $^O =~ "linux";
print "On windows\n" if $^O =~ "MSWin32";

Split a multiline variable/output

Method 1 - using an array variable.

my @myarray=qx(ls);
foreach (@myarray)
{
	chomp;
	print "The file is '$_'.\n";
}

Method 2 - using a scalar variable.

my $myscalar=qx(ls);
foreach (split /\n/,$myscalar)
{
	print "The file is really '$_'\n";           # No need for chomping
}

Parsing Command Line Parameters

Command line parameters are parsed through variable ARGV.

print scalar @ARGV;                   #number of parameters
print $#ARGV;                         #... idem
print "1st param: $ARGV[0]";          #positional parameters
print "2nd param: $ARGV[1]";
print "Executable name: $0";          #Name of current executable
usage() unless defined($ARGV[0])      # defined($ARGV[0]) is true if there is a parameter

Simple version

#!/usr/bin/perl

use strict;
use warnings;

my $verbose=0;
my $projectdir;

# Parse command options (-option).
while ($#ARGV>=0 && $ARGV[0] =~ m/^\-/ ) {
    $verbose=1 if $ARGV[0] =~ m/^\-v/i;
    shift @ARGV;
}

# Parse mandatory parameter
usage() unless defined($ARGV[0]);
$projectdir=$ARGV[0];

# Show parsing result
print "verbose=$verbose\n";
print "projectdir=$projectdir\n";

exit 0;

sub usage {
    print "Usage: $0 [options] directory\n";
    print "\n";
    print "  Options:\n";
    print "    -v   verbose mode\n";
    exit;
}

Using GetOpt

use strict;
use Getopt::Long qw(:config no_ignore_case);

my $debug=0;
my $quiet=0;
my username;

# Parse options
GetOptions ("d|debug+"  => \$debug,
            "q|quiet"   => \$quiet,
            "u|user=s"  => \$username ) || usage();

# Parse remaining parameters
my $url = $ARGV[0];

Internet

#!/usr/bin/perl

use strict;
use LWP::UserAgent;
use LWP::Debug;
use HTTP::Cookies;
use HTML::TableExtract;

my $debug = 0;          # Set to 1 for debug information
my $proxy = 0;
my $username;
my $password;

my $url = $ARGV[0];

LWP::Debug::level('+') if $debug;
my $ua = LWP::UserAgent->new;

# Session cookie
my $jar = HTTP::Cookies->new ();
$ua->cookie_jar($jar);

# Enable proxy...
$ua->env_proxy if $proxy;

# Fetch the articles + url
my %articles = get_articles ( $ua, $url );

# Get starting URL....
my $res = $ua->get($url);
unless ($res->is_success) {
    die "Die: " . $res->status_line, "\n";
}
exit 0;

print $res->content if $debug;

my $res = $ua->post( $url,
                     [
                      'j_username'  =>  $username,
                      'j_password'  =>  $password,
                      'Submit'      =>  'Entrer'
                     ]
    );

unless ($res->is_success) {
    die "Die: ".$res->status_line, "\n";
}

sub get_articles {
    my $ua = shift;
    my $url = shift;

    my $res = $ua->get ($url);
    unless ($res->is_success) {
        die "http-get failed: ".$res->status_line, "\n$url\n";
    }

    my $te = HTML::TableExtract->new ( slice_columns => 0,
                                       keep_html => 1,
                                       keep_headers => 1,
                                       subtables => 1,
                                       headers => [qw(Matter)] );

    $te->parse($res->content);
    open(my $fh, ">mpe.html") || die "Cannot create file: $!\n";
    print $fh $res->content;
    close($fh);

    foreach my $ts ( $te->tables ) {
#       print "Table (", join(',', $ts->coords), "):\n";
        foreach my $row ( $ts->rows ) {
            print "Row: " . join (';', @$row ). "\n";
            next unless $row->[0] =~ m/\/content\/(.*)\/fulltext/;
            print $1."\n";
        }
    }
    return 0;
}

Passing filehandle as sub parameters and return values

This requires the use of a reference. First as return value:

sub openTimeOut($)
{
	my $filename = shift;
	my $timeout=15;
	while( !open(LOG,$filename) ) { sleep 1; --$timeout or die "Time out trying to open file $filename"; }
	return \*LOG;
}

sub printToFile($@)
{
	my $filename = shift;
	my $fh = openTimeOut(">$filename");
	print $fh @_;
	close($fh);
}

BUT BEWARE, actually OpenTimeOut returns a reference to the same file handle in current glob! The code below illustrate this:

my ($to,$from) = @_;
$fhto = openTimeOut(\*TO,">>$to");
$fhfrom = openTimeOut(\*FROM,"<$from");    # This returns same FILEHANDLE reference as $fhto
while (<$fhfrom>) {print $fhto $_}         # Failed, because now $fhto = $fhfrom, which only open for output
close($fhfrom);
close($fhto);

The solution, pass by parameters:

sub openTimeOut2(*;$)
{
	my $fh = shift;
	my $filename = shift;
	my $timeout=15;
	while( !open($fh,$filename) ) { sleep 1; --$timeout or die "Time out trying to open file $filename"; }
}

sub printToFile($@)
{
	my $filename = shift;
	openTimeOut2(\*LOG,">$filename");
	print LOG @_;
	close(LOG);
}

Embedding a perl script in a W2K shell script

Notice how the first rem is actually a multiline assignment to perl array variable @rem, where the value is quoted with ' '.

@rem= 'PERL for Windows NT - ccperl must be in search path
@echo off
ccperl %0 %1 %2 %3 %4 %5 %6 %7 %8 %9
goto endofperl
@rem ';

# Your Perl code comes here

# End of Perl section

__END__
:endofperl

Kill on ALARM signal

Here a small Perl script stopafter that kills a process after some time using alarm (similar to timeout kill):

#!/usr/bin/perl
my ($limit, $command, @args) = @ARGV;
$SIG{ALARM} = 'DEFAULT';
alarm $limit;
exec $command, @args;                  # Exec preserves process signal table, but fork not!
die "Couldn't run '$command': $!";

Use it as follows:

stopafter 30 command arg arg arg...

Command will be killed after 30 sec, unless the command cancelled the alarm clock or caught/ignored signal ALRM

Print each array elements surrounded in quotes

From [10]

#!/usr/bin/perl

use strict;
use warnings;

my @a    = qw/ A B C /;
my @b;
my $text = join ' ', map { qq/"$_"/ } @a;
print "text for (@a) is [$text]\n";

$text = join ' ', map { qq/"$_"/ } @b;
print "text for (@b) is [$text]\n";

Or use map { "'$_'" } @a to surround in single quotes.

Remove trailing slash

str =~ s|/\z||;                # Use \z instead of $ to get absoluete end of a string (deal with \n)

Force buffered output

Buffered output is handy when for instance piping to a program (for instance tee).

Perl enabled buffered output when connected to a terminal. Also <STDIN> triggers flushing of STDOUT. In other cases, we can trigger STDOUT flush with either [11]:

select()->flush();                  # ... or ...
$|=1;                               # ... or ...
$|++;                               # ... or ...
BEGIN{ $|++; }

Pitfalls

Forgetting to chomp the trailing "\n"
my $path = qx(pwd);           #NOK! trailing \n will corrupt path construction
chomp( my $path = qx(pwd) );  #OK!
Mixing case in name of package
# Imagine a module file named Vobs.pm
use Vobs;                 # OK
use VOBs;                 # NOK &rar; Will complain about double definition
                          #           (but will not flag the mix case problem)
Operator precedence and strange behaviour
chomp my @emptylist = qx("dir");  #NOK ! @emptylist will be empty
chomp ( my @list = qx("dir") );	  #OK !
Forgetting to use ${...} to separate variable identifier
my $variable;
print "$variable_temp\n";         # NOK! Print a variable named variable_temp
print "${variable}_temp\n";       # OK! Print a $variable, followed by "_temp"
STDERR redirection cannot be given as a command parameter to system
system "echo hello world! 2>\\nul";       # OK
system qq(echo hello world! 2>\\nul);     # OK
system "echo", "hello world!"," 2>\\nul"; # NOK - 2>\\nul taken as a parameter
Forgetting local in sub-routines (see [12]). In particular pay attention that $_ is assigned e.g. in while loops
sub localized
{
    local @ARGV = ("/etc/motd");     # OK
    local $/ = undef;                # OK
    local $_ = <>;                   # OK
    @Fields = split /^\s*=+\s*$/;
}

CPAN - Perl Packages

Things to consider
  • Consider disabling tests during CPAN install (turn on -notest)
  • Better, use CPANMINUS (see README; see this for an example)


First time launch:

$ cpan                                       # ... OR ...
$ perl -MCPAN -e shell                       #  --> yes auto config

To adapt config related to proxy:

cpan> o conf init /proxy/                  # (to enter an empty string, simply enter 1 space char as a value)
cpan> o conf commit

To install a Perl package (eg. here package GetOpt::Long):

$ cpan
cpan>install GetOpt::Long
cpan>force install GetOpt::Long              # Install even if self-tests failed

or

cpan install GetOpt::Long

Editing the configruation:

cpan> o conf init                            # Reset the configuration
cpan> o conf http_proxy http://proxy:8080/   # Edit a given variable (eg. here changing proxy settings):
cpan> o conf commit                          # To commit changes in the configuration
cpan> o                                      # to get o options
cpan> o conf                                 # To get o conf option

To edit CPAN database url:

cpan> o conf /urllist/
cpan> o conf init /urllist/
cpan> o conf urllist shift
cpan> o conf urllist unshift ftp://my.new.site/
cpan> o conf commit

To update CPAN itself:

cpan> install Bundle::CPAN
cpan> reload cpan