Perl: Difference between revisions
(10 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== Reference == |
== Reference == |
||
* '''Perldoc''' on local computer |
* '''Perldoc''' on local computer |
||
<source lang="perl"> |
|||
% perldoc -q duplicate |
% perldoc -q duplicate |
||
"How can I remove duplicate elements from a list or array?" |
"How can I remove duplicate elements from a list or array?" |
||
Line 11: | Line 11: | ||
% perldoc -f join |
% perldoc -f join |
||
join EXPR,LIST |
join EXPR,LIST |
||
</source> |
</source> |
||
* '''Links''' |
* '''Links''' |
||
** http://www.perl.org/learn.html |
** http://www.perl.org/learn.html |
||
Line 90: | Line 90: | ||
print scalar( @array1 ); # The length of the array array1 |
print scalar( @array1 ); # The length of the array array1 |
||
print $#array1 + 1; # ... same effect (since perl 5, $[ deprecated) |
print $#array1 + 1; # ... same effect (since perl 5, $[ deprecated) |
||
my $arrayref = ["foo","bar"] # [...] anonymous array constructor, returns a reference |
|||
print $arrayref # ARRAY(0x1e39cb8) - print the reference |
|||
print @$arrayref # foobar - dereference the reference |
|||
</source> |
</source> |
||
Line 155: | Line 159: | ||
=== References === |
=== References === |
||
Reference: http://perldoc.perl.org/perlreftut.html |
|||
;Make Rule 1 |
|||
<source lang="perl"> |
|||
Put a <code>\</code> in front of a variable to get a reference |
|||
<source lang=perl> |
|||
$aref = \@array; # $aref now holds a reference to @array |
|||
$href = \%hash; # $href now holds a reference to %hash |
|||
$sref = \$scalar; # $sref now holds a reference to $scalar |
|||
</source> |
|||
Now, we can use any reference as a regular scalar value: |
|||
<source lang=perl> |
|||
$xy = $aref; # $xy now holds a reference to @array |
|||
$p[3] = $href; # $p[3] now holds a reference to %hash |
|||
$z = $p[3]; # $z now holds a reference to %hash |
|||
</source> |
|||
;Make Rule 2 |
|||
<code>[ ITEMS ]</code> makes a new, anonymous array, and return a reference to that array. <code>{ ITEMS }</code> makes a new, anonymous hash, and returns a reference to that hash. |
|||
<source lang=perl> |
|||
$aref = [ 1, "foo", undef, 13 ]; # $aref now holds a reference to an array |
|||
$href = { APR => 4, AUG => 8 }; # $href now holds a reference to a hash |
|||
$aref = [ 1, 2, 3 ]; # This is the same as ... |
|||
@array = (1, 2, 3); # ... this |
|||
</source> |
|||
;Use Rule 1 |
|||
You can use an array reference, ''in curly brace'' in place of the name of an array: |
|||
<source lang=perl> |
|||
$aref = \@array; |
|||
@{$aref} # same as @a An array |
|||
reverse @{$aref} # same as reverse @a Reverse the array |
|||
${$aref}[3] # same as $a[3] An element of the array |
|||
${$aref}[3] = 17 # same as $a[3] = 17 Assigning an element |
|||
</source> |
|||
The same applies to hashes: |
|||
<source lang=perl> |
|||
$href = \@array; |
|||
%{$href} # same as %h A hash |
|||
keys %{$href} # same as keys @h Get the keys from the hash |
|||
${$href}{'red'} # same as $h{'red'} An element of the hash |
|||
${$href}{'red'} = 17 # same as $h{'red'} = 17 Assigning an element |
|||
</source> |
|||
;Use Rule 2 |
|||
Second rule is an abbreviation of first rule above when all we want is to extract a single element. It uses the arrow <code>-></code> notation: |
|||
<source lang=perl> |
|||
$aref->[3] # Same as ${$aref}[3] |
|||
$href->{red} # Same as ${$href}{red} |
|||
# Don't confuse the following |
|||
$aref->[3] # Fourth element of an array referred to by $aref |
|||
$aref[3] # Fourth element of an array deceptively named @aref |
|||
# nor the following |
|||
$href->{red} # Part of the hash referred to by $href |
|||
$href{ref} # Part of the deceptively named %href hash |
|||
</source> |
|||
;Arrow rule |
|||
In between two ''subscripts'', the arrow is optional |
|||
<source lang=perl> |
|||
$a[1][2] # Same as $a[1]->[2], which is same as ${$a[1]}[2] |
|||
$x[1][2][3] # Same as $x[1]->[2]->[3], which is same as ${${$x[1]}[2]}[3] |
|||
</source> |
|||
;More details |
|||
Reference: http://perldoc.perl.org/perlref.html |
|||
* In ''Use rule 1'', you can omit the curly brackets whenever the thing inside them is an atomic scalar variable like <code>$aref</code> |
|||
<source lang=perl> |
|||
@$aref # same as @{$aref} |
|||
$$aref[1] # same as ${$aref}[1] |
|||
</source> |
|||
* Use anonmymous array constructor to make deep copies of array references: |
|||
<source lang=perl> |
|||
$aref2 = $aref1 # This does NOT copy the array |
|||
$aref2 = [@$aref1] # This DOES copy the array |
|||
</source> |
|||
The same applies for copying an anonymous hash: |
|||
<source lang=perl> |
|||
$href2 = [%$href1] # This DOES copy the hash |
|||
</source> |
|||
* Use function <code>ref</code> to test if a variable contains a reference. It returns <code>HASH</code> for hash reference, and <code>ARRAY</code> for array reference, which both evaluate to true. |
|||
* If when using a reference as a string, you get strings like <tt>ARRAY(0x80f5dec)</tt> or <tt>HASH(0x826fc0)</tt>, it means you printed a reference by mistake. |
|||
:A side effect is that you can use <code>eq</code> to see if two references refer to the same thing. But using <code>==</code> is much faster. |
|||
* You can use a string as if it were a reference. If you use the string <code>"foo"</code> as an array reference, it's taken to be a reference to the array <code>@foo</code> This is called a ''soft reference'' or ''symbolic reference''. The declaration <code>use strict 'refs'</code> disables this feature, which can cause all sorts of trouble if you use it by accident. |
|||
;Some examples |
|||
<source lang=perl> |
|||
my $VOBAttrRef = $VOBs{'AdminMask'}; # This return a reference to a Hash |
my $VOBAttrRef = $VOBs{'AdminMask'}; # This return a reference to a Hash |
||
my %VOBAttr = %$VOBAttrRef; # This dereference the reference above and return a Hash |
my %VOBAttr = %$VOBAttrRef; # This dereference the reference above and return a Hash |
||
Line 270: | Line 369: | ||
=== Operators === |
=== Operators === |
||
=== Boolean === |
|||
<source lang="text"> |
|||
# String - see perldoc perlop - |
|||
# See https://stackoverflow.com/questions/1175390/how-do-i-compare-two-strings-in-perl |
|||
eq True if two strings identical |
|||
eq True if two strings different |
|||
cmp Return -1, 0, +1 when comparing two strings |
|||
~~ Smart match |
|||
lt, le, ge, gt Compare using locale |
|||
</source> |
|||
==== Quote and quote-like operators ==== |
==== Quote and quote-like operators ==== |
||
See [http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Operators perldoc] for detailed information. |
See [http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Operators perldoc] for detailed information. |
||
Line 1,057: | Line 1,168: | ||
</source> |
</source> |
||
== |
== Tips and How-Tos == |
||
=== One-Liners === |
=== One-Liners === |
||
Line 1,070: | Line 1,181: | ||
perl -lne 'print for /id is <(\d+)>/g' # match pattern and extract backreference |
perl -lne 'print for /id is <(\d+)>/g' # match pattern and extract backreference |
||
perl -lne 'print $2 for /id is <(\d+)> or <(\d+)>/g' # ... print 2nd matched backreference |
perl -lne 'print $2 for /id is <(\d+)> or <(\d+)>/g' # ... print 2nd matched backreference |
||
perl -lne 'print "$2 $1" if /id is <(\d+)> or <(\d+)>/g' # Rearrange backreferences (!if) |
|||
cat oldfile | perl -pe 's/(\d+)_/sprintf("%2.2d_",$1)/e' > newfile # evaluate regex substitutions |
cat oldfile | perl -pe 's/(\d+)_/sprintf("%2.2d_",$1)/e' > newfile # evaluate regex substitutions |
||
perl -pe 'tr /A-Za-z/\n/cs' file.txt # Tokenize a text file - c=complement, s=squash duplicates |
perl -pe 'tr /A-Za-z/\n/cs' file.txt # Tokenize a text file - c=complement, s=squash duplicates |
||
</source> |
</source> |
||
'''<tt>-e ''commandline''</tt>''' allows to run one line of program. '''<tt>-l</tt>''' enables automatic line-ending processing. '''<tt>-n</tt>''' wrap a <code>while(<>) { ... }</code> around program (but does not print lines like with '''<tt>-p</tt>'''). |
'''<tt>-e ''commandline''</tt>''' allows to run one line of program. '''<tt>-l</tt>''' enables automatic line-ending processing. '''<tt>-n</tt>''' wrap a <code>while(<>) { ... }</code> around program (but does not print lines like with '''<tt>-p</tt>'''). |
||
The following one-liner simply emulates <code>sed</code>, which can be useful when using PCRE regex (like when using ''look-ahead'' or ''look-behind'' assertions): |
|||
<source lang=bash> |
|||
perl -ne 's/before/after/g; print;' input |
|||
</source> |
|||
More complex examples: |
More complex examples: |
||
Line 1,356: | Line 1,473: | ||
</source> |
</source> |
||
Command will be killed after 30 sec, unless the command cancelled the alarm clock or caught/ignored signal ALRM |
Command will be killed after 30 sec, unless the command cancelled the alarm clock or caught/ignored signal ALRM |
||
=== Print each array elements surrounded in quotes === |
|||
From [http://stackoverflow.com/questions/739214/how-do-i-output-each-perl-array-element-surrounded-in-quotes] |
|||
<source lang=perl> |
|||
#!/usr/bin/perl |
|||
use strict; |
|||
use warnings; |
|||
my @a = qw/ A B C /; |
|||
my @b; |
|||
my $text = join ' ', map { qq/"$_"/ } @a; |
|||
print "text for (@a) is [$text]\n"; |
|||
$text = join ' ', map { qq/"$_"/ } @b; |
|||
print "text for (@b) is [$text]\n"; |
|||
</source> |
|||
Or use <code>map { "'$_'" } @a</code> to surround in single quotes. |
|||
=== Remove trailing slash === |
|||
<source lang="perl"> |
|||
str =~ s|/\z||; # Use \z instead of $ to get absoluete end of a string (deal with \n) |
|||
</source> |
|||
=== Force buffered output === |
|||
''Buffered output'' is handy when for instance piping to a program (for instance <code>tee</code>). |
|||
Perl enabled buffered output when connected to a terminal. Also <code><STDIN></code> triggers flushing of STDOUT. |
|||
In other cases, we can trigger STDOUT flush with either [https://stackoverflow.com/questions/33812618/can-you-force-flush-output-in-perl]: |
|||
<source lang=perl> |
|||
select()->flush(); # ... or ... |
|||
$|=1; # ... or ... |
|||
$|++; # ... or ... |
|||
BEGIN{ $|++; } |
|||
</source> |
|||
== Pitfalls == |
== Pitfalls == |
Latest revision as of 19:25, 23 June 2018
Reference
- Perldoc on local computer
% perldoc -q duplicate
"How can I remove duplicate elements from a list or array?"
% perldoc -f split
split /PATTERN/,EXPR,LIMIT
split /PATTERN/,EXPR
split /PATTERN/
...
% perldoc -f join
join EXPR,LIST
- Links
- FAQ
- The FAQ is the primary source of answer to questions like How can I do....
- Manpages - List of highly recommended perldoc manpages (from the FAQ).
Basics perldata, perlvar, perlsyn, perlop, perlsub Execution perlrun, perldebug Functions perlfunc Objects perlref, perlmod, perlobj, perltie Data Structures perlref, perllol, perldsc Modules perlmod, perlmodlib, perlsub Regexes perlre, perlfunc, perlop, perllocale Moving to perl5 perltrap, perl Linking w/C perlxstut, perlxs, perlcall, perlguts, perlembed Various http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz (not a man-page but still useful, a collection of various essays on Perl techniques)
- Command-Line - Useful command-line options
- -e expression
- specififies perl expressions.
- -p
- loops over and prints input.
- -n
- loops over and does not print input.
- -l
- strip newlines on input, and adds them on output. Use this option by default, unless the newlines need special handling, or for efficiency reasons.
Quick Introduction
Program Structure
Example of a simple Hello World program:
#!/usr/bin/perl
use strict; # Immediately stops on potential problem - highly recommended for simplified debugging
use warning; # Warnings - highly recommended for simplified debugging
print "Hello, World!\n";
exit 0;
Data Types
$ |
for scalar values (number, string or reference) |
@ |
for arrays |
% |
for hashes (associative arrays) |
& |
for subroutines (aka functions, procedures, methods) |
* |
for all types of that symbol name. In version 4 you used them like pointers, but in modern perls you can just use references. |
<> |
are used for inputting a record from a filehandle. |
\ |
takes a reference to something. |
Note that the last 2 are not really type specifiers.
Arrays
Some example
my @array1 = ("titi","tutu"); # (...) is an array constructor
my @array2 = ("tata","toto");
push(@array1,"tete"); # Append an element to an array
push(@array1,@array2); # Append another array to an array
print $array1[1]; # 2nd element of array array1
print $#array1; # The last index of array array1
@array1 = (); # Truncate array
$#array = -1; # ... same effect
print scalar( @array1 ); # The length of the array array1
print $#array1 + 1; # ... same effect (since perl 5, $[ deprecated)
my $arrayref = ["foo","bar"] # [...] anonymous array constructor, returns a reference
print $arrayref # ARRAY(0x1e39cb8) - print the reference
print @$arrayref # foobar - dereference the reference
Arrays can be easily constructed through autovivification. Below we create a hash of arrays
my %Projects; # Projects is a hash, but we say nothing on the types of its elements...
foreach my $VOBName (keys %VOBs)
{
my $ProjectName = $VOBs{$VOBName}{'ProjectName'};
push(@{$Projects{$ProjectName}}, $VOBName); # <-- we dereference value returned by $Projects{$ProjectName} as
} # an array, hence creating automatically an array if undef
Below some difference of handling @
in SCALAR or LIST context:
# RESULT CONTEXT EXPLANATION
my @a = ("titi","tutu");
my $varnoquote=@a; print "$varnoquote\n"; # "2" (SCALAR - @_ is evaluated in scalar context)
my $varquote="@a"; print "$varquote\n"; # "titi tutu" (EXPAND - @_ is quote-expanded, each item being separated by space)
print @a; print"\n"; # "tititutu" (LIST - $, is empty)
print(@a); print"\n"; # "tititutu" (LIST - $, is empty)
printf @a; print"\n"; # "titi" (LIST - 1st element in list is interpreted as the format string)
printf(@a); print"\n"; # "titi" (LIST - 1st element in list is interpreted as the format string)
print @a,"\n"; # "tititutu" (LIST - $, is empty)
printf "%s\n",@a; # "titi" (LIST - only 1st element is read)
Set variable $,
to modify the list separator used when printing arrays
my @a = ("titi","tutu");
$,="\n";
print @a;
Hashes
Some example of hashes:
my %cities = ( # (...) is a hash constructor
"US" => "Washington",
"GB" => "London"
);
print $cities{"US"},"\n";
my %hashofhash = ( # This is actually a hash of references to hash (note the {...})
"address" => {name => "US",
city => "Washington" },
"identity" => {firstname => "smith",
lastname => "Smith" } );
print $hashofhash{"address"}{"name"},"\n";
print $hashofhash{"address"}->{"name"},"\n";
keys(%users) = 1000; # allocate 1024 buckets
print scalar(keys %users),"\n"; # length of the hash
Note that in LIST context, a hash is transformed into an array containing both the keys and values in the hash!
my %myhash = ( key1 => "value1", key2 => "value2" );
my @myarray= ( "element1", "element2" );
push (@myarray, %myhash);
$, = ",";
print @myarray; # outputs "element1, element2, key2, value2, key1, value1"
References
Reference: http://perldoc.perl.org/perlreftut.html
- Make Rule 1
Put a \
in front of a variable to get a reference
$aref = \@array; # $aref now holds a reference to @array
$href = \%hash; # $href now holds a reference to %hash
$sref = \$scalar; # $sref now holds a reference to $scalar
Now, we can use any reference as a regular scalar value:
$xy = $aref; # $xy now holds a reference to @array
$p[3] = $href; # $p[3] now holds a reference to %hash
$z = $p[3]; # $z now holds a reference to %hash
- Make Rule 2
[ ITEMS ]
makes a new, anonymous array, and return a reference to that array. { ITEMS }
makes a new, anonymous hash, and returns a reference to that hash.
$aref = [ 1, "foo", undef, 13 ]; # $aref now holds a reference to an array
$href = { APR => 4, AUG => 8 }; # $href now holds a reference to a hash
$aref = [ 1, 2, 3 ]; # This is the same as ...
@array = (1, 2, 3); # ... this
- Use Rule 1
You can use an array reference, in curly brace in place of the name of an array:
$aref = \@array;
@{$aref} # same as @a An array
reverse @{$aref} # same as reverse @a Reverse the array
${$aref}[3] # same as $a[3] An element of the array
${$aref}[3] = 17 # same as $a[3] = 17 Assigning an element
The same applies to hashes:
$href = \@array;
%{$href} # same as %h A hash
keys %{$href} # same as keys @h Get the keys from the hash
${$href}{'red'} # same as $h{'red'} An element of the hash
${$href}{'red'} = 17 # same as $h{'red'} = 17 Assigning an element
- Use Rule 2
Second rule is an abbreviation of first rule above when all we want is to extract a single element. It uses the arrow ->
notation:
$aref->[3] # Same as ${$aref}[3]
$href->{red} # Same as ${$href}{red}
# Don't confuse the following
$aref->[3] # Fourth element of an array referred to by $aref
$aref[3] # Fourth element of an array deceptively named @aref
# nor the following
$href->{red} # Part of the hash referred to by $href
$href{ref} # Part of the deceptively named %href hash
- Arrow rule
In between two subscripts, the arrow is optional
$a[1][2] # Same as $a[1]->[2], which is same as ${$a[1]}[2]
$x[1][2][3] # Same as $x[1]->[2]->[3], which is same as ${${$x[1]}[2]}[3]
- More details
Reference: http://perldoc.perl.org/perlref.html
- In Use rule 1, you can omit the curly brackets whenever the thing inside them is an atomic scalar variable like
$aref
@$aref # same as @{$aref}
$$aref[1] # same as ${$aref}[1]
- Use anonmymous array constructor to make deep copies of array references:
$aref2 = $aref1 # This does NOT copy the array
$aref2 = [@$aref1] # This DOES copy the array
The same applies for copying an anonymous hash:
$href2 = [%$href1] # This DOES copy the hash
- Use function
ref
to test if a variable contains a reference. It returnsHASH
for hash reference, andARRAY
for array reference, which both evaluate to true.
- If when using a reference as a string, you get strings like ARRAY(0x80f5dec) or HASH(0x826fc0), it means you printed a reference by mistake.
- A side effect is that you can use
eq
to see if two references refer to the same thing. But using==
is much faster.
- You can use a string as if it were a reference. If you use the string
"foo"
as an array reference, it's taken to be a reference to the array@foo
This is called a soft reference or symbolic reference. The declarationuse strict 'refs'
disables this feature, which can cause all sorts of trouble if you use it by accident.
- Some examples
my $VOBAttrRef = $VOBs{'AdminMask'}; # This return a reference to a Hash
my %VOBAttr = %$VOBAttrRef; # This dereference the reference above and return a Hash
print $VOBAttr{'ProjectName'},"\n"; # We can use our new Hash variable
print $$VOBAttrRef{'ProjectName'},"\n"; # ... or we can dereference our reference variable using the $$ construct
print $VOBAttrRef->{'ProjectName'},"\n"; # ... but -> can also be used to dereference
print $VOBs{'AdminMask'}->{'ProjectName'},"\n"; # We can also skip altoghether the reference variable
print $VOBs{'AdminMask'}{'ProjectName'},"\n"; # ... This notation is also available as a shortcut, -> can be omitted
Passing reference to sub-routines:
$tab{'somekey'} = '...';
process(\$tab);
sub process ()
{
my $tab = $_[0];
$tab->{'somekey'} = '...';
}
Using Anonymous Hash References:
#!/usr/bin/perl
use strict;
my @myarray;
foreach my $iter ( 1..10 )
{
my $value1 = "value1_".$iter;
my $value2 = "value2_".$iter;
print "Creating our \$hashref... ";
my $hashref = { index1 => $value1, index2 => $value2 }; # { key1 => value1, ... } creates a REFERENCE to an anonymous hash.
# Since reference are SCALAR, we assign it to a scalar variable
print "Done.\n",
" \$hashref: ",$hashref,"\n";
print " content: ",$$hashref{'index1'},",",$$hashref{'index2'},"\n";
print "Adding \$hashref to our array... ";
push( @myarray, $hashref );
print "Done. There are currently ", scalar(@myarray), " elements in \@myarray.\n";
print "Accessing last element of our array...";
print " content: @myarray[$#myarray], ${@myarray[$#myarray]}{'index1'} or better yet @myarray[$#myarray]->{'index2'}\n";
}
print "\n\nNow we will traverse our array again...\n";
foreach ( @myarray )
{
print "$_ containing ",
"index1 => $$_{'index1'},",
"index2 => $$_{'index2'}\n";
print "... or using -> operator: ",
"index1 => $_->{'index1'},",
"index2 => $_->{'index2'}\n";
}
String
# Concat 2 strings
$stringC = $stringA . ucfirst($stringB);
$stringC = "$stringA$stringB";
$stringC = join('', ($stringA, ucfirst($stringB)));
Statement modifiers
The statement modifiers are
if EXPR
unless EXPR
while EXPR
until EXPR
when EXPR
for LIST
foreach LIST
Examples:
print "Basset hounds got long ears" if length $ear >= 10;
go_outside() and play() unless $is_raining;
print "Hello $_!\n" foreach qw(world Dolly nurse);
Compound Statements
The following compound statements can be used to control flow:
if (EXPR) BLOCK
if (EXPR) BLOCK else BLOCK
if (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK
unless (EXPR) BLOCK
unless (EXPR) BLOCK else BLOCK
unless (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK
LABEL while (EXPR) BLOCK
LABEL while (EXPR) BLOCK continue BLOCK
LABEL until (EXPR) BLOCK
LABEL until (EXPR) BLOCK continue BLOCK
LABEL for (EXPR; EXPR; EXPR) BLOCK
LABEL foreach VAR (LIST) BLOCK
LABEL foreach VAR (LIST) BLOCK continue BLOCK
LABEL BLOCK continue BLOCK
Examples:
foreach ( @myarray ) { print "There is $_ in my array\n"; }
foreach my $iter ( 1..10 ) { print "value1_".$iter."\n"; }
foreach my $keyname (keys %myHash) { print "key is $keyname\n"; }
Operators
Boolean
# String - see perldoc perlop -
# See https://stackoverflow.com/questions/1175390/how-do-i-compare-two-strings-in-perl
eq True if two strings identical
eq True if two strings different
cmp Return -1, 0, +1 when comparing two strings
~~ Smart match
lt, le, ge, gt Compare using locale
Quote and quote-like operators
See perldoc for detailed information.
Customary | Generic | Meaning | Interpolates |
---|---|---|---|
'' |
q{} |
Literal | no |
"" |
qq{} |
Literal | yes |
`` |
qx{} |
Command | yes(*) |
|
qw{} |
Word list | no |
// |
m{} |
Pattern match | yes(*) |
|
qr{} |
Pattern | yes(*) |
|
s{}{} |
Substitution | yes(*) |
|
tr{}{} |
Transliteration | no (but see below) |
|
<<EOF |
here-doc | yes(*) |
- (*) unless the delimiter is
''
.
- (*) unless the delimiter is
Interpolates means that variables like $VAR
are expanded, and that escaped sequence like \n
are processed.
Also other delimiters can be used. For instance:
#Use any brackets
print q{Hello World};
print q(Hello World);
print q[Hello World];
print q<Hello World>;
#Brackets delimiters nest correctly, like
print q{Hello {my} World}; # Equivalent to 'Hello {my} World!
#We can use any non-whitespace character
print q!Hello World!;
print q|Hello World|;
print q#Hello World#;
Beware of some caveats:
$s = q{ if($a eq "}") ... }; # WRONG - } inside "}" is not nested, so quoting will stop there
$s = q #Hello World# # WRONG - Because of the whitespace, #Hello World# is taken as a comment
Regular expressions
Use /regex/
or m!regex!
(where !
can be any quoting character).
Use =~
to match a given variable, otherwise $_
is used. Use !~
to reverse the meaning of the match (i.e. must not match).
Finding matches
In SCALAR context, /regex/
returns true/false if matching is found
$myvar =~ /World/ #scalar context, returns true if $myvar contains World
/World/ #scalar context, same as above except that now it is $_ that is matched
"Hello World" =~ /World/ #scalar contect, same as above, to show that left member doesn't need to be an L-Value
Extracting matches
The grouping metacharacters ()
also allow the extraction of the parts of a string that matched. For each grouping, the part that matched inside goes into the special variables $1
, $2
... They can be used just as ordinary variables:
# extract hours, minutes, seconds
$time =~ /(\d\d):(\d\d):(\d\d)/; # match hh:mm:ss format
$hours = $1;
$minutes = $2;
$seconds = $3;
In LIST context, /regex/
with groupings will return the list of matched values ($1,$2,...) . So we could rewrite the above as:
($hours, $minutes, $second) = ($time =~ /(\d\d):(\d\d):(\d\d)/);
If the groupings in a regex are nested, $1
gets the group with the leftmost opening parenthesis, $2
the next opening parenthesis... For example, here is a complex regex and the matching variables indicated below it:
/(ab(cd|ef)((gi)|j))/; 1 2 34
Using back-references
Associated with the matching variables $1
, $2
... are the backreferences \1
, \2
... Backreferences are matching variables that can be used inside a regex:
/(\w\w\w)\s\1/; # find sequences like 'the the' in string
Note that $1
, $2
.... should only be used outside of a regex, and \1
, \2
... only inside a regex.
Search & Replace
Use s/regex/replacement/modifiers
. Use =~
to match a given variable, otherwise $_
is used.
In SCALAR context, s///
returns the number of matches, or false if no match.
$x = "Time to feed the cat!";
$x =~ s/cat/hacker/; # $x contains "Time to feed the hacker!"
Note that the matching variablle $1
, $2
can be used in the replacement string.
Some modifiers:
g
- Find all matchese
- wraps aneval{...}
around the replacement string and the evaluated result is substituted for the matched substring. Example:
# reverse all the words in a string
$x = "the cat in the hat";
$x =~ s/(\w+)/reverse $1/ge; # $x contains "eht tac ni eht tah"
The split operator
split /regex/, string
splits string into a list of substrings and returns that list. The regex determines the character sequence that string is split with respect to. For example, to extract a comma-delimited list of numbers, use
$x = "1.618,2.718, 3.142";
@const = split /,\s*/, $x; # $const[0] = '1.618', $const[1] = '2.718', $const[2] = '3.142'
If the empty regex //
is used, the string is split into individual characters. If the regex has groupings, then the list produced contains the matched substrings from the groupings as well:
$x = "/usr/bin";
@parts = split m!(/)!, $x; # $parts[0] = '' Since the first character of $x matched the regex, an initial element was prepended.
# $parts[1] = '/' The delimiter is also in the list because of the grouping (/)
# $parts[2] = 'usr'
# $parts[3] = '/' Yet a delimiter because of the grouping
# $parts[4] = 'bin'
Lookahead / Lookbehind
The lookahead and lookbehind assertions are generalizations of the anchor concept. Lookahead and lookbehind are zero-width assertions that let us specify which characters we want to test for. The lookahead assertion is denoted by (?=regexp)
and the lookbehind assertion is denoted by (?<=fixed-regexp)
. Some examples are
$x = "I catch the housecat 'Tom-cat' with catnip";
$x =~ /cat(?=\s+)/; # matches 'cat' in 'housecat'
@catwords = ($x =~ /(?<=\s)cat\w+/g); # matches, $catwords[0] = 'catch' $catwords[1] = 'catnip'
$x =~ /\bcat\b/; # matches 'cat' in 'Tom-cat'
$x =~ /(?<=\s)cat(?=\s)/; # doesn't match; no isolated 'cat' in middle of $x
Grep / Map
Use grep
on a list to return the element of that list for which the expression is true. For instance
@foo = grep(!/^#/, @bar); # Only returns line that are not comments
my @array = ("el1","gel2","el3","gel1","gel2");
my @array2 = grep {s/(.*el)/reverse $1/e} @array; # grep may also modify the elements in the returned list
Use map
on a list to apply a given expression on all elements in the list.
@chars = map(chr, @nums); # Returns the list of character corresponding to the list of of numbers
File and I/O
Basics
See perlopentut for more details.
Use open
to open a file. Special variable $!
contains the status of last operation:
open (my $in, "<", "input.txt") or die "can't open input.txt: $!";
open (my $out,">", "output.txt") or die "can't open output.txt: $!";
open (my $log,">>", "my.log") or die "can't open my.log: $!";
Read from the file using operator <>
:
my $line = <$in>; # Read one line
my @lines = <$in>; # Read all lines
while (<$in>) { # assigns each line in turn to $_
print "Just read in this line: $_";
}
TRAP!!! — while(<$in>)
≠ for(<$in>)
! See below:
while(<$in>) { #(scalar context) Read line-by-line, stop when <$in> returns undef
print "_=$_";
print "<in>=".<$in>
}
for(<$in>) { #(list) read whole file, then foreach on the resulting list
print "_=$_";
print "<in>=".<$in> # ... <$in> is always undefined (because EOF reached already!)
}
print
accepts an optional first argument specifying which filehandle to print to:
print STDERR "This is your final warning.\n";
print $out $record;
print $log $logmessage;
Finally close the file:
close $in or die "$in: $!";
There are 2 advantages to using an indirect filehandles such as my $in
:
- it eases namespace management (filehandle like
INFO
are global to the package). - an indirect filehandle automatically closes when it goes out of scope or when you undefine it.
The old way has some caveats:
open INFO, "< info.txt" or die "can't open info.txt: $!"; # Leading/trailing whitespace are stripped!
open (INFO, "<", "info.txt") || die "can't open info.txt: $!"; # Using regular FILEHANDLE - name clashes in current package !!!
To 'slurp' a binary file ([1]):
#! /usr/local/bin/perl
my $content;
open(my $fh, '<', 'test.raw') or die "cannot open file $filename";
{
local $/;
$content = <$fh>;
}
close($fh);
Another solution:
use File::Slurp;
my $text = read_file($filename);
Common functions / modules
Some commonly used functions / modules:
Name | Description |
---|---|
chdir (function) | Change the current working directory |
-X (function) | Various test on files, directories... pretty much like in Bash scripts. |
CWD (module) | get pathname of current working directory (provides getcwd and abs_path ).
|
File::Basename (module) | Parse file paths into directory, filename and suffix |
System / STDIN / STDOUT / STDERR
Some examples related to handling of system calls, STDIN, STDOUT and STDERR.
Read something from standard input | $line = <STDIN>;
$line = readline(*STDIN); # same thing
chomp($line = <STDIN>); # remove trailing newline
|
Read one character from STDIN | print "Press RETURN...";
$key = getc();
|
System calls | system "echo hello world!";
system qq(echo hello world!);
system $MYCMD, qw(param1), 'the name is'.getname($index);
|
Discard STDERR on Windows / Linux. Note that on Windows, we use \nul because each folder as a nul handler and we want to reduce the number of used handle | my $STDERRNULL = "2>\\nul"; #use this on windows
my $STDERRNULL = "2>/dev/null"; #use this on unix
my @Results = qx(ls somedirectory $STDERRNULL);
|
Capture STDOUT | my @ouput = `ls`;
my @ouput = qx(ls);
system("ls >output.txt");
|
Capture command exit status | my $exit_status = system("del file.txt");
|
Temporarily disable STDERR and restore it afterwards | open(SAVE_STDERR, '>&STDERR');
close(STDERR) unless $ENV{CLEARCASE_TRACE_TRIGGERS};
$exe = qx(file $ptmp) =~ /executable|bourne|commands text|\bscript/i;
open(STDERR, '>&SAVE_STDERR');
close(SAVE_STDERR);
|
Processes, Pipes, IPC
References
See
- http://perldoc.perl.org/perlipc.html
- http://docstore.mik.ua/orelly/perl/prog3/ch16_03.htm
- http://perldoc.perl.org/IO/Pipe.html
See also the page on Unix's pipes.
exec
source: perldoc
# exec - executes a system command and never returns (except if failed)
exec ('foo') or die ("Can't execute foo: $!");
exec "sort $outfile | uniq" # parsed by system's command shell - support pipes and redirect
exec '/bin/echo', 'Your arguments are: ', @ARGV; # not parsed
system
source: perldoc
# system - same as exec, but forks first
system ('foo') or die ("Can't execute foo: $?");
system "sort $outfile | uniq" # parsed by system's command shell - support pipes and redirect
system '/bin/echo', 'Your arguments are: ', @ARGV; # not parsed
open
source: perldoc
open(my $fh, "input.txt") or die $!; # for input
open(my $fh, '<', "input.txt") or die $!; # for input
open(my $fh, '>', "output.txt") or die $!; # for output
open(my $fh, '>>', "output.txt") or die $!; # for appending
open(my $fh, '+<', "output.txt") or die $!; # for input / output (open existing file)
open(my $fh, '+>', "output.txt") or die $!; # for input / output (create/rewrite new file)
open(my $fh, '+>', undef) or die $!; # anonymous temporary file
open(my $fh, '| sort') or die $!; # Equiv to fork, exec, then a waitpid when closing $fh (for the parent)
open(my $fh, 'ps aux |') or die $!;
my $childpid = open(my $fh, '|-', 'sort') or die "Can't fork: $!"; # implicit fork - child reads from STDIN
my $childpid = open(my $fh, '-|', 'ps aux') or die "Can't fork: $!"; # implicit fork - child writes to STDOUT
open my $oldstdin, "<&STDIN" # Duplicate STDIN handle
open STDINBAK, "<&", \*STDIN # Same
open STDIN, "<&READER" # Redirect STDIN from an existing handle READER
open ($fh,'>', \$variable) or die "$!"; #Since 5.8.0, PerlIO is enabled by default.
close STDOUT;
open STDOUT, '>', \$variable or die "Can't open STDOUT: $!"; #Need to close STDOUT first...
IO::Pipe
source: perldoc
use IO::Pipe;
$pipe = new IO::Pipe;
if($pid = fork()) { # Parent
$pipe->reader();
while(<$pipe>) {
...
}
}
elsif(defined $pid) { # Child
$pipe->writer();
print $pipe ...
}
use IO::Pipe;
$pipe = new IO::Pipe;
$pipe->reader(qw(ls -l));
while(<$pipe>) {
...
}
Pipes
source: O'Reilly - Programming Perl
open SPOOLER, "| cat -v | lpr -h 2>/dev/null" or die "can't fork: $!";
local $SIG{PIPE} = sub { die "spooler pipe broke" };
print SPOOLER "stuff\n";
close SPOOLER or die "bad spool: $! $?";
So to page your program's output, you'd use:
if (-t STDOUT) { # only if stdout is a terminal
my $pager = $ENV{PAGER} || 'more';
open(STDOUT, "| $pager") or die "can't fork a pager: $!";
}
END {
close(STDOUT) or die "can't close STDOUT: $!"
}
open STATUS, "netstat -an 2>/dev/null |" or die "can't fork: $!";
while (<STATUS>) {
next if /^(tcp|udp)/;
print;
}
close STATUS or die "bad netstat: $! $?";
Using pipes to talk to self. From parent to child:
if (open(TO, "|-")) {
print TO $fromparent;
}
else {
$tochild = <STDIN>;
exit; #Don't forget this!
}
From child to parent:
if (open(FROM, "-|")) {
$toparent = <FROM>;
}
else {
print STDOUT $fromchild;
exit; #Don't forget this!
}
Perlipc
source: perldoc
Using open
for IPC:
open(SPOOLER, "| cat -v | lpr -h 2>/dev/null")
|| die "can't fork: $!";
local $SIG{PIPE} = sub { die "spooler pipe broke" };
print SPOOLER "stuff\n";
close SPOOLER || die "bad spool: $! $?";
open(STATUS, "netstat -an 2>&1 |")
|| die "can't fork: $!";
while (<STATUS>) {
next if /^(tcp|udp)/;
print;
}
close STATUS || die "bad netstat: $! $?";
Safe Pipe Opens
source: perldoc
Doing a safe pipe for reading from child:
# add error processing as above
$pid = open(KID_TO_READ, "-|");
if ($pid) { # parent
while (<KID_TO_READ>) {
# do something interesting
}
close(KID_TO_READ) || warn "kid exited $?";
} else { # child
($EUID, $EGID) = ($UID, $GID); # suid only
exec($program, @options, @args)
|| die "can't exec program: $!";
# NOTREACHED
}
Doing a safe pipe for writing to child:
# add error processing as above
$pid = open(KID_TO_WRITE, "|-");
$SIG{PIPE} = sub { die "whoops, $program pipe broke" };
if ($pid) { # parent
for (@data) {
print KID_TO_WRITE;
}
close(KID_TO_WRITE) || warn "kid exited $?";
} else { # child
($EUID, $EGID) = ($UID, $GID);
exec($program, @options, @args)
|| die "can't exec program: $!";
# NOTREACHED
}
Avoiding Pipe Deadlocks
Source: perlipc (perldoc)
A child process that reads from STDIN
only exits when it gets an EOF
. This EOF
is sent only when any processes on the other side closes the handle! If there are multiple processes connected to the same pipe, having one process closes it is not enough; the last process with the pipe open must close it for the child to read EOF
.
In the example below, the grand-parent waits until child and grand-child terminate their communication. This is because WRITER
was opened using open WRITER, "|-"
, which has a special behaviour: closing it will call waitpid()
, which waits for the sub-process to exit. If the child happens to wait for the parent to do something before exiting, we have a deadlock.
$pid = open WRITER, "|-";
defined $pid or die "fork failed; $!";
if ($pid) {
if (my $sub_pid = fork()) {
close WRITER; #!!! will call waitpid(), so will wait until child exits...
# do something else...
}
else {
# write to WRITER...
close WRITER;
exit;
}
}
else {
# do something with STDIN...
exit;
}
A 1st solution is to build the pipe manually with pipe()
, fork()
, and the form of open()
which sets one file descriptor to another, as below:
pipe(READER, WRITER);
$pid = fork();
defined $pid or die "fork failed; $!";
if ($pid) {
close READER;
if (my $sub_pid = fork()) {
close WRITER; #! will not call waitpid() !
# do something else...
}
else {
# write to WRITER...
close WRITER;
exit;
}
}
else {
open STDIN, "<&READER";
close WRITER;
# do something...
exit;
}
Another solution is simply not to fork open in the first parent (i.e. don't create an handle in the process that does not need it):
$pid = fork();
if ($pid)
{
# do something else...
}
else
{
$sub_pid = open WRITER,"|-";
defined $sub_pid or die "fork failed; $!";
if ($sub_pid)
{
# write to WRITER...
close WRITER; # wait for child to exits...
}
else
{
# do something with STDIN...
exit;
}
exit;
}
Avoiding Zombie Processes
Source: Perl Cookbook
- If you don't need to record the children that have terminated (<defunct>), use:
$SIG{CHLD} = 'IGNORE';
- To keep better track of deceased children, install a SIGCHLD handler to call waitpid :
use POSIX ":sys_wait_h";
$SIG{CHLD} = \&REAPER;
sub REAPER {
my $stiff;
while (($stiff = waitpid(-1, &WNOHANG)) > 0) {
# do something with $stiff if you want
}
$SIG{CHLD} = \&REAPER; # install *after* calling waitpid
}
Sub-routines
Declaration and definition syntax:
sub NAME[(PROTO)] [: ATTRS]; # A "forward" declaration
sub NAME[(PROTO)] [: ATTRS] BLOCK # A declaration and definition
$subref = sub (PROTO) : ATTRS BLOCK; # An anonymous sub-routine, called with &$subref
Importing a sub-routine:
use MODULE qw(NAME1 NAME2 NAME3);
Calling a sub-routine:
NAME(LIST); # & is optional with parentheses.
NAME LIST; # Parentheses optional if predeclared/imported.
&NAME(LIST); # Circumvent prototypes.
&NAME; # Makes current @_ visible to called subroutine.
Examples:
sub mySub1
{
my ($param1, $param2) = @_
return $param1.$param2;
}
sub mySub2
{
my $param1 = shift
my $param2 = shift
return $param1.$param2;
}
Using default value for sub-routine parameters: sub myfunc { my($suffix) = @_ ? "@_" : "defaultvalue"; }
Modules
See Perl Module Primer.
Functions
See [2] for a detailed list of Perl functions.
Chop / Chomp
chop
removes the last character of a string. It also works on lists.
chop( my $userinput=<STDIN> ); #Chop the trailing "\n" in user input
chop( my @list=qx(ls); #Chop the trailing "\n" in the command output
chomp
removes the trailing record separator (typically \n
) of a string. It also works on lists.
chomp( my $userinput=<STDIN> ); #Chomp the trailing "\n" in user input IF PRESENT
chomp( my @list=qx(ls); #Chomp the trailing "\n" in the command output IF PRESENT
-X
The function -X can be used for various test on the files, directories... similar to the test command in Bash:
print "The file exists\n" if -e "../somefile";
print "The directory exists\n" if -d "../some/directory";
Use _ to save a system call, like in:
stat($filename);
print "Readable\n" if -r _;
print "Writable\n" if -w _;
print "Executable\n" if -x _;
print "Text\n" if -T _;
print "Binary\n" if -B _;
Since 5.9.1, operators can be stacked:
print "writable and executable\n" if -f -w -x $file; # same as -x $file && -w _ && -f _
Modules
See Core Modules for a detailed list of Perl modules. Here a list of frequently used ones:
Benchmark
See also [3].
use Benchmark;
&Benchmark::timethis(500000,'$match=qq/filename.txt/;$match=~/.*\.(.*)/');
use Benchmark;
use Time::HiRes;
sub mySubroutine {
my $time0 = new Benchmark;
...
my $time1 = new Benchmark;
print "mySubroutine benchmark: ".timestr(timediff($t1,$t0));
}
(CWD) getcwd / abs_path
The function getcwd returns the current working directory. abs_path transforms a given relative path into its equivalent canonical absolute form.
use Cwd qw(getcwd abs_path);
my $dir = getcwd();
my $abs_path = abs_path($file);
File::Basename
use File::Basename;
(name,$path,$suffix) = fileparse($fullname,@suffixlist);
$name = fileparse($fullname,@suffixlist);
$basename = basename($fullname,@suffixlist);
$dirname = dirname($fullname);
File::Find
File:Find
provides functions similar to the Unix find command for searching through directory trees doing work on each file.
use File::Find;
find(\&wanted, @directories_to_search); #depth-first search - preorder traversal - no options
sub wanted { ... }
use File::Find;
find({ wanted => \&process, follow => 1 }, '.'); #With options
sub process { ... }
use File::Find;
finddepth(\&wanted, @directories_to_search); #depth-first search - post-order traversal - no options
sub wanted { ... }
Example:
find (\&wanted, $directory);
sub wanted {
/(\.c|\.cpp|\.cxx|\.h|\.hpp|\.hxx)$/ &&
print "$_\n";
}
HTML::TagParser
An easy module to parse HTML files. See example below. More at [4].
#!/usr/bin/perl -w
use strict;
use HTML::TagParser;
my $html = HTML::TagParser->new("test.html");
my @addr = $html->getElementsByTagName("a");
foreach (@addr) {
my $attr = $_->attributes;
print "<a href=\"$attr->{'href'}\">", $_->innerText(), "</a>\n";
}
XML::DOM
Again an example to parse an XML file (from [5]). See also documentation on CPAN:
#!/usr/bin/perl -w
use strict;
use XML::DOM;
my $parser = new XML::DOM::Parser;
my $doc = $parser->parse( \*DATA ) or die "Unable to parse document";
my $root = $doc->getDocumentElement(); # safer than just getting the first
# child, in case the document has a
# DTD or start with comments
scanner($root);
sub scanner
{
my ($rt) = @_;
my $i=0;
foreach my $nde ( $rt->getChildNodes()) # yes it is anarray!
{
if ( ($nde->getNodeType() == TEXT_NODE )
&& ($nde->getData()=~ /\S/s) ) # to avoid extra white spaces
{
print $i++," TEXT /", $nde->getData(), "/\n";
}
if ($nde->getNodeType == ELEMENT_NODE)
{
print $i++, " ELEMENT ", $nde->getNodeName(), "\n";
}
scanner( $nde );
}
}
__DATA__
<methodCall>Level1 Text
<Level2a>Text at Level2a</Level2a>
<Level2b>Text at Level2b</Level2b>
</methodCall>
Tips and How-Tos
One-Liners
perl -ne 'print unless /^$/../^$/' input # print lines, unless blank
perl -ne 'print if ! /^$/../^$/' input # reduce runs of blank lines to a single blank line
perl -lne 'print $.; close ARGV if eof' input input # $. need to be reset (by closing ARGV) between 2 input files
perl -lne 'print for m/\b(\S+)\b/g' paragraphs # print words from file paragraphs
perl -lne 'while(m/(\S+)\s+the\s+(\S+)/g){print "$1 $2"}' paragraphs # ... while loop needed when using multiple back-references
perl -lne 'print for /id is <(\d+)>/g' # match pattern and extract backreference
perl -lne 'print $2 for /id is <(\d+)> or <(\d+)>/g' # ... print 2nd matched backreference
perl -lne 'print "$2 $1" if /id is <(\d+)> or <(\d+)>/g' # Rearrange backreferences (!if)
cat oldfile | perl -pe 's/(\d+)_/sprintf("%2.2d_",$1)/e' > newfile # evaluate regex substitutions
perl -pe 'tr /A-Za-z/\n/cs' file.txt # Tokenize a text file - c=complement, s=squash duplicates
-e commandline allows to run one line of program. -l enables automatic line-ending processing. -n wrap a while(<>) { ... }
around program (but does not print lines like with -p).
The following one-liner simply emulates sed
, which can be useful when using PCRE regex (like when using look-ahead or look-behind assertions):
perl -ne 's/before/after/g; print;' input
More complex examples:
#Remove duplicate words from a white-space separated list
WORDS="foo bar baz bar bar foo baz"
perl -e 'print join(" ", keys %{{ map { $_ => 1 } split (/ +/,"'"$WORDS"'") }} );'
Miscellaneous
Report a complete error message when loading a script. This hack allows for printing a custom error message + file not found-like error message (given by |
do "your script.pl"
or (print "Your error message\n$!\n" and die @_);
|
uppercase / lowercase | my $lowercase = lc "My StRiNg"; #mystring
my $uppercase = uc "My StRiNg"; #MYSTRING
my $firstcharlowercase = lcfirst "My StRiNg"; #my StRiNg
my $firstcharuppercase = ucfirst "My StRiNg"; #My StRiNg
|
Detect operating system (see man perlvar )
|
print "On cygwin\n" if $^O =~ "cygwin";
print "On linux\n" if $^O =~ "linux";
print "On windows\n" if $^O =~ "MSWin32";
|
Split a multiline variable/output
Method 1 - using an array variable.
my @myarray=qx(ls);
foreach (@myarray)
{
chomp;
print "The file is '$_'.\n";
}
Method 2 - using a scalar variable.
my $myscalar=qx(ls);
foreach (split /\n/,$myscalar)
{
print "The file is really '$_'\n"; # No need for chomping
}
Parsing Command Line Parameters
Command line parameters are parsed through variable ARGV.
print scalar @ARGV; #number of parameters
print $#ARGV; #... idem
print "1st param: $ARGV[0]"; #positional parameters
print "2nd param: $ARGV[1]";
print "Executable name: $0"; #Name of current executable
usage() unless defined($ARGV[0]) # defined($ARGV[0]) is true if there is a parameter
Simple version
#!/usr/bin/perl
use strict;
use warnings;
my $verbose=0;
my $projectdir;
# Parse command options (-option).
while ($#ARGV>=0 && $ARGV[0] =~ m/^\-/ ) {
$verbose=1 if $ARGV[0] =~ m/^\-v/i;
shift @ARGV;
}
# Parse mandatory parameter
usage() unless defined($ARGV[0]);
$projectdir=$ARGV[0];
# Show parsing result
print "verbose=$verbose\n";
print "projectdir=$projectdir\n";
exit 0;
sub usage {
print "Usage: $0 [options] directory\n";
print "\n";
print " Options:\n";
print " -v verbose mode\n";
exit;
}
Using GetOpt
use strict;
use Getopt::Long qw(:config no_ignore_case);
my $debug=0;
my $quiet=0;
my username;
# Parse options
GetOptions ("d|debug+" => \$debug,
"q|quiet" => \$quiet,
"u|user=s" => \$username ) || usage();
# Parse remaining parameters
my $url = $ARGV[0];
Internet
#!/usr/bin/perl
use strict;
use LWP::UserAgent;
use LWP::Debug;
use HTTP::Cookies;
use HTML::TableExtract;
my $debug = 0; # Set to 1 for debug information
my $proxy = 0;
my $username;
my $password;
my $url = $ARGV[0];
LWP::Debug::level('+') if $debug;
my $ua = LWP::UserAgent->new;
# Session cookie
my $jar = HTTP::Cookies->new ();
$ua->cookie_jar($jar);
# Enable proxy...
$ua->env_proxy if $proxy;
# Fetch the articles + url
my %articles = get_articles ( $ua, $url );
# Get starting URL....
my $res = $ua->get($url);
unless ($res->is_success) {
die "Die: " . $res->status_line, "\n";
}
exit 0;
print $res->content if $debug;
my $res = $ua->post( $url,
[
'j_username' => $username,
'j_password' => $password,
'Submit' => 'Entrer'
]
);
unless ($res->is_success) {
die "Die: ".$res->status_line, "\n";
}
sub get_articles {
my $ua = shift;
my $url = shift;
my $res = $ua->get ($url);
unless ($res->is_success) {
die "http-get failed: ".$res->status_line, "\n$url\n";
}
my $te = HTML::TableExtract->new ( slice_columns => 0,
keep_html => 1,
keep_headers => 1,
subtables => 1,
headers => [qw(Matter)] );
$te->parse($res->content);
open(my $fh, ">mpe.html") || die "Cannot create file: $!\n";
print $fh $res->content;
close($fh);
foreach my $ts ( $te->tables ) {
# print "Table (", join(',', $ts->coords), "):\n";
foreach my $row ( $ts->rows ) {
print "Row: " . join (';', @$row ). "\n";
next unless $row->[0] =~ m/\/content\/(.*)\/fulltext/;
print $1."\n";
}
}
return 0;
}
Passing filehandle as sub parameters and return values
This requires the use of a reference. First as return value:
sub openTimeOut($)
{
my $filename = shift;
my $timeout=15;
while( !open(LOG,$filename) ) { sleep 1; --$timeout or die "Time out trying to open file $filename"; }
return \*LOG;
}
sub printToFile($@)
{
my $filename = shift;
my $fh = openTimeOut(">$filename");
print $fh @_;
close($fh);
}
BUT BEWARE, actually OpenTimeOut
returns a reference to the same file handle in current glob! The code below illustrate this:
my ($to,$from) = @_;
$fhto = openTimeOut(\*TO,">>$to");
$fhfrom = openTimeOut(\*FROM,"<$from"); # This returns same FILEHANDLE reference as $fhto
while (<$fhfrom>) {print $fhto $_} # Failed, because now $fhto = $fhfrom, which only open for output
close($fhfrom);
close($fhto);
The solution, pass by parameters:
sub openTimeOut2(*;$)
{
my $fh = shift;
my $filename = shift;
my $timeout=15;
while( !open($fh,$filename) ) { sleep 1; --$timeout or die "Time out trying to open file $filename"; }
}
sub printToFile($@)
{
my $filename = shift;
openTimeOut2(\*LOG,">$filename");
print LOG @_;
close(LOG);
}
Embedding a perl script in a W2K shell script
Notice how the first rem
is actually a multiline assignment to perl array variable @rem
, where the value is quoted with ' '
.
@rem= 'PERL for Windows NT - ccperl must be in search path
@echo off
ccperl %0 %1 %2 %3 %4 %5 %6 %7 %8 %9
goto endofperl
@rem ';
# Your Perl code comes here
# End of Perl section
__END__
:endofperl
Kill on ALARM signal
Here a small Perl script stopafter that kills a process after some time using alarm (similar to timeout kill):
#!/usr/bin/perl
my ($limit, $command, @args) = @ARGV;
$SIG{ALARM} = 'DEFAULT';
alarm $limit;
exec $command, @args; # Exec preserves process signal table, but fork not!
die "Couldn't run '$command': $!";
Use it as follows:
stopafter 30 command arg arg arg...
Command will be killed after 30 sec, unless the command cancelled the alarm clock or caught/ignored signal ALRM
Print each array elements surrounded in quotes
From [10]
#!/usr/bin/perl
use strict;
use warnings;
my @a = qw/ A B C /;
my @b;
my $text = join ' ', map { qq/"$_"/ } @a;
print "text for (@a) is [$text]\n";
$text = join ' ', map { qq/"$_"/ } @b;
print "text for (@b) is [$text]\n";
Or use map { "'$_'" } @a
to surround in single quotes.
Remove trailing slash
str =~ s|/\z||; # Use \z instead of $ to get absoluete end of a string (deal with \n)
Force buffered output
Buffered output is handy when for instance piping to a program (for instance tee
).
Perl enabled buffered output when connected to a terminal. Also <STDIN>
triggers flushing of STDOUT.
In other cases, we can trigger STDOUT flush with either [11]:
select()->flush(); # ... or ...
$|=1; # ... or ...
$|++; # ... or ...
BEGIN{ $|++; }
Pitfalls
Forgetting to chomp the trailing "\n" | my $path = qx(pwd); #NOK! trailing \n will corrupt path construction
chomp( my $path = qx(pwd) ); #OK!
|
Mixing case in name of package | # Imagine a module file named Vobs.pm
use Vobs; # OK
use VOBs; # NOK &rar; Will complain about double definition
# (but will not flag the mix case problem)
|
Operator precedence and strange behaviour | chomp my @emptylist = qx("dir"); #NOK ! @emptylist will be empty
chomp ( my @list = qx("dir") ); #OK !
|
Forgetting to use ${...} to separate variable identifier | my $variable;
print "$variable_temp\n"; # NOK! Print a variable named variable_temp
print "${variable}_temp\n"; # OK! Print a $variable, followed by "_temp"
|
STDERR redirection cannot be given as a command parameter to system | system "echo hello world! 2>\\nul"; # OK
system qq(echo hello world! 2>\\nul); # OK
system "echo", "hello world!"," 2>\\nul"; # NOK - 2>\\nul taken as a parameter
|
Forgetting local in sub-routines (see [12]). In particular pay attention that $_ is assigned e.g. in while loops
|
sub localized
{
local @ARGV = ("/etc/motd"); # OK
local $/ = undef; # OK
local $_ = <>; # OK
@Fields = split /^\s*=+\s*$/;
}
|
CPAN - Perl Packages
- Things to consider
- Consider disabling tests during CPAN install (turn on
-notest
) - Better, use CPANMINUS (see README; see this for an example)
First time launch:
$ cpan # ... OR ...
$ perl -MCPAN -e shell # --> yes auto config
To adapt config related to proxy:
cpan> o conf init /proxy/ # (to enter an empty string, simply enter 1 space char as a value)
cpan> o conf commit
To install a Perl package (eg. here package GetOpt::Long):
$ cpan
cpan>install GetOpt::Long
cpan>force install GetOpt::Long # Install even if self-tests failed
or
cpan install GetOpt::Long
Editing the configruation:
cpan> o conf init # Reset the configuration
cpan> o conf http_proxy http://proxy:8080/ # Edit a given variable (eg. here changing proxy settings):
cpan> o conf commit # To commit changes in the configuration
cpan> o # to get o options
cpan> o conf # To get o conf option
To edit CPAN database url:
cpan> o conf /urllist/
cpan> o conf init /urllist/
cpan> o conf urllist shift
cpan> o conf urllist unshift ftp://my.new.site/
cpan> o conf commit
To update CPAN itself:
cpan> install Bundle::CPAN
cpan> reload cpan