Perl: Difference between revisions

Revision as of 20:43, 24 May 2010

Reference

Perldoc on local computer

% perldoc -q duplicate
  "How can I remove duplicate elements from a list or array?"
% perldoc -f split
  split /PATTERN/,EXPR,LIMIT
  split /PATTERN/,EXPR
  split /PATTERN/
  ...

The FAQ is the primary source of answer to questions like How can I do....

Manpages - List of highly recommended perldoc manpages (from the FAQ).

Basics          perldata, perlvar, perlsyn, perlop, perlsub
Execution       perlrun, perldebug
Functions       perlfunc
Objects         perlref, perlmod, perlobj, perltie
Data            Structures   perlref, perllol, perldsc
Modules         perlmod, perlmodlib, perlsub
Regexes         perlre, perlfunc, perlop, perllocale
Moving to perl5 perltrap, perl
Linking w/C     perlxstut, perlxs, perlcall, perlguts, perlembed
Various         http://www.cpan.org/misc/olddoc/FMTEYEWTK.tgz
                (not a man-page but still useful, a collection of various essays on Perl techniques)

Command-Line - Useful command-line options

-e expression: specififies perl expressions.
-p: loops over and prints input.
-n: loops over and does not print input.
-l: strip newlines on input, and adds them on output. Use this option by default, unless the newlines need special handling, or for efficiency reasons.

Quick Introduction

Program Structure

Example of a simple Hello World program:

#!/usr/bin/perl

use strict;                 # Immediately stops on potential problem - highly recommended for simplified debugging
use warning;                # Warnings                               - highly recommended for simplified debugging

print "Hello, World!\n";

exit 0;

Data Types

`$`	for scalar values (number, string or reference)
`@`	for arrays
`%`	for hashes (associative arrays)
`&`	for subroutines (aka functions, procedures, methods)
*``**	for all types of that symbol name. In version 4 you used them like pointers, but in modern perls you can just use references.
`<>`	are used for inputting a record from a filehandle.
`\`	takes a reference to something.

Note that the last 2 are not really type specifiers.

Arrays

Some example

my @array1 = ("titi","tutu");           # (...) is an array constructor
my @array2 = ("tata","toto");
push(@array1,"tete");                   # Append an element to an array
push(@array1,@array2);                  # Append another array to an array
print $array1[1];                       # 2nd element of array array1
print $#array1;                         # The last index of array array1
@array1 = ();                           # Truncate array
$#array = -1;                           # ... same effect
print scalar( @array1 );                # The length of the array array1
print $#array1 + 1;                     # ... same effect (since perl 5, $[ deprecated)

Arrays can be easily constructed through autovivification. Below we create a hash of arrays

my %Projects;                                       # Projects is a hash, but we say nothing on the types of its elements...
foreach my $VOBName (keys %VOBs)
{
	my $ProjectName = $VOBs{$VOBName}{'ProjectName'};
	push(@{$Projects{$ProjectName}}, $VOBName);     # <-- we dereference value returned by $Projects{$ProjectName} as
}                                                   #     an array, hence creating automatically an array if undef

Below some difference of handling @ in SCALAR or LIST context:

#                                            RESULT      CONTEXT  EXPLANATION
my @a = ("titi","tutu");
my $varnoquote=@a; print "$varnoquote\n"; # "2"         (SCALAR - @_ is evaluated in scalar context)
my $varquote="@a"; print "$varquote\n";   # "titi tutu" (EXPAND - @_ is quote-expanded, each item being separated by space)
print @a; print"\n";                      # "tititutu"  (LIST   - $, is empty)
print(@a); print"\n";                     # "tititutu"  (LIST   - $, is empty)
printf @a; print"\n";                     # "titi"      (LIST   - 1st element in list is interpreted as the format string)
printf(@a); print"\n";                    # "titi"      (LIST   - 1st element in list is interpreted as the format string)
print @a,"\n";                            # "tititutu"  (LIST   - $, is empty)
printf "%s\n",@a;                         # "titi"      (LIST   - only 1st element is read)

Set variable $, to modify the list separator used when printing arrays

my @a = ("titi","tutu");
$,="\n";
print @a;

Hashes

Some example of hashes:

my %cities = (                                        # (...) is a hash constructor
    "US"   => "Washington",
    "GB"   => "London"
);
print $cities{"US"},"\n";

my %hashofhash = (                                    # This is actually a hash of references to hash
    "address"  => {name      => "US",
                   city      => "Washington" },
    "identity" => {firstname => "smith",
                   lastname  => "Smith"      } );
print $hashofhash{"address"}{"name"},"\n";
print $hashofhash{"address"}->{"name"},"\n";

keys(%users) = 1000;		                      # allocate 1024 buckets
print scalar(keys %users),"\n";                       # length of the hash

Note that in LIST context, a hash is transformed into an array containing both the keys and values in the hash!

my %myhash = ( key1 => "value1", key2 => "value2" );
my @myarray= ( "element1", "element2" );
push (@myarray, %myhash);

$, = ",";
print @myarray;                                     # outputs "element1, element2, key2, value2, key1, value1"

References

my $VOBAttrRef = $VOBs{'AdminMask'};                # This return a reference to a Hash
my %VOBAttr = %$VOBAttrRef;                         # This dereference the reference above and return a Hash

print $VOBAttr{'ProjectName'},"\n";                 # We can use our new Hash variable
print $$VOBAttrRef{'ProjectName'},"\n";             #  ... or we can dereference our reference variable using the $$ construct
print $VOBAttrRef->{'ProjectName'},"\n";            #  ... but -> can also be used to dereference
print $VOBs{'AdminMask'}->{'ProjectName'},"\n";     # We can also skip altoghether the reference variable
print $VOBs{'AdminMask'}{'ProjectName'},"\n";       #  ... This notation is also available as a shortcut, -> can be omitted

Passing reference to sub-routines:

$tab{'somekey'} = '...';
process(\$tab);
sub process ()
{
	my $tab = $_[0];
	$tab->{'somekey'} = '...';
}

Using Anonymous Hash References:

#!/usr/bin/perl
use strict;
my @myarray;

foreach my $iter ( 1..10 )
{
	my $value1 = "value1_".$iter;
	my $value2 = "value2_".$iter;

   	print "Creating our \$hashref... ";
   	my $hashref = { index1 => $value1, index2 => $value2 }; # { key1 => value1, ... } creates a REFERENCE to an anonymous hash.
                                                            # Since reference are SCALAR, we assign it to a scalar variable
	print "Done.\n",
	      "  \$hashref: ",$hashref,"\n";
	print "  content: ",$$hashref{'index1'},",",$$hashref{'index2'},"\n";

	print "Adding \$hashref to our array... ";
	push( @myarray, $hashref );

	print "Done. There are currently ", scalar(@myarray), " elements in \@myarray.\n";
	print "Accessing last element of our array...";
	print " content: @myarray[$#myarray], ${@myarray[$#myarray]}{'index1'} or better yet @myarray[$#myarray]->{'index2'}\n";
}

print "\n\nNow we will traverse our array again...\n";
foreach ( @myarray )
{
	print 	"$_ containing ",
	 		"index1 => $$_{'index1'},",
			"index2 => $$_{'index2'}\n";
	print 	"... or using -> operator: ",
	 		"index1 => $_->{'index1'},",
			"index2 => $_->{'index2'}\n";
}

String

# Concat 2 strings
$stringC = $stringA . ucfirst($stringB);
$stringC = "$stringA$stringB";
$stringC = join('', ($stringA, ucfirst($stringB)));

Statement modifiers

The statement modifiers are

if EXPR
unless EXPR
while EXPR
until EXPR
when EXPR
for LIST
foreach LIST

Examples:

print "Basset hounds got long ears" if length $ear >= 10;
go_outside() and play() unless $is_raining;
print "Hello $_!\n" foreach qw(world Dolly nurse);

Compound Statements

The following compound statements can be used to control flow:

if (EXPR) BLOCK
if (EXPR) BLOCK else BLOCK
if (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK
unless (EXPR) BLOCK
unless (EXPR) BLOCK else BLOCK
unless (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK
LABEL while (EXPR) BLOCK
LABEL while (EXPR) BLOCK continue BLOCK
LABEL until (EXPR) BLOCK
LABEL until (EXPR) BLOCK continue BLOCK
LABEL for (EXPR; EXPR; EXPR) BLOCK
LABEL foreach VAR (LIST) BLOCK
LABEL foreach VAR (LIST) BLOCK continue BLOCK
LABEL BLOCK continue BLOCK

Examples:

foreach ( @myarray ) { print "There is $_ in my array\n"; }
foreach my $iter ( 1..10 ) { print "value1_".$iter."\n"; }
foreach my $keyname (keys %myHash) { print "key is $keyname\n"; }

Operators

Quote and quote-like operators

See perldoc for detailed information.

Customary	Generic	Meaning	Interpolates
`''`	`q{}`	Literal	no
`""`	`qq{}`	Literal	yes
``	`qx{}`	Command	yes(*)
	`qw{}`	Word list	no
`//`	`m{}`	Pattern match	yes(*)
	`qr{}`	Pattern	yes(*)
	`s{}{}`	Substitution	yes(*)
	`tr{}{}`	Transliteration	no (but see below)
	`<<EOF`	here-doc	yes(*)

(*) unless the delimiter is ''.

Interpolates means that variables like $VAR are expanded, and that escaped sequence like \n are processed.
Also other delimiters can be used. For instance:

#Use any brackets
print q{Hello World};
print q(Hello World);
print q[Hello World];
print q<Hello World>;
#Brackets delimiters nest correctly, like
print q{Hello {my} World};    # Equivalent to 'Hello {my} World!
#We can use any non-whitespace character
print q!Hello World!;
print q|Hello World|;
print q#Hello World#;

Beware of some caveats:

$s = q{ if($a eq "}") ... }; # WRONG - } inside "}" is not nested, so quoting will stop there
$s = q #Hello World#         # WRONG - Because of the whitespace, #Hello World# is taken as a comment

Regular expressions

Use /regex/ or m!regex! (where ! can be any quoting character).
Use =~ to match a given variable, otherwise $_ is used. Use !~ to reverse the meaning of the match (i.e. must not match).

Finding matches

In SCALAR context, /regex/ returns true/false if matching is found

$myvar =~ /World/               #scalar context, returns true if $myvar contains World
/World/                         #scalar context, same as above except that now it is $_ that is matched
"Hello World" =~ /World/        #scalar contect, same as above, to show that left member doesn't need to be an L-Value

Extracting matches

The grouping metacharacters () also allow the extraction of the parts of a string that matched. For each grouping, the part that matched inside goes into the special variables $1 , $2... They can be used just as ordinary variables:

# extract hours, minutes, seconds
$time =~ /(\d\d):(\d\d):(\d\d)/;  # match hh:mm:ss format
$hours = $1;
$minutes = $2;
$seconds = $3;

In LIST context, /regex/ with groupings will return the list of matched values ($1,$2,...) . So we could rewrite the above as:

($hours, $minutes, $second) = ($time =~ /(\d\d):(\d\d):(\d\d)/);

If the groupings in a regex are nested, $1 gets the group with the leftmost opening parenthesis, $2 the next opening parenthesis... For example, here is a complex regex and the matching variables indicated below it:

/(ab(cd|ef)((gi)|j))/;
 1  2      34

Using back-references

Associated with the matching variables $1, $2... are the backreferences \1, \2... Backreferences are matching variables that can be used inside a regex:

    /(\w\w\w)\s\1/; # find sequences like 'the the' in string

Note that $1, $2.... should only be used outside of a regex, and \1, \2... only inside a regex.

Search & Replace

Use s/regex/replacement/modifiers. Use =~ to match a given variable, otherwise $_ is used.

In SCALAR context, s/// returns the number of matches, or false if no match.

$x = "Time to feed the cat!";
$x =~ s/cat/hacker/;   # $x contains "Time to feed the hacker!"

Note that the matching variablle $1, $2 can be used in the replacement string.

Some modifiers:

g - Find all matches
e - wraps an eval{...} around the replacement string and the evaluated result is substituted for the matched substring. Example:

# reverse all the words in a string
$x = "the cat in the hat";
$x =~ s/(\w+)/reverse $1/ge;    # $x contains "eht tac ni eht tah"

The split operator

split /regex/, string splits string into a list of substrings and returns that list. The regex determines the character sequence that string is split with respect to. For example, to extract a comma-delimited list of numbers, use

$x = "1.618,2.718,   3.142";
@const = split /,\s*/, $x;      # $const[0] = '1.618', $const[1] = '2.718', $const[2] = '3.142'

If the empty regex // is used, the string is split into individual characters. If the regex has groupings, then the list produced contains the matched substrings from the groupings as well:

$x = "/usr/bin";
@parts = split m!(/)!, $x;      # $parts[0] = ''    Since the first character of $x matched the regex, an initial element was prepended.
                                # $parts[1] = '/'   The delimiter is also in the list because of the grouping (/)
                                # $parts[2] = 'usr'
                                # $parts[3] = '/'   Yet a delimiter because of the grouping
                                # $parts[4] = 'bin'

Lookahead / Lookbehind

The lookahead and lookbehind assertions are generalizations of the anchor concept. Lookahead and lookbehind are zero-width assertions that let us specify which characters we want to test for. The lookahead assertion is denoted by (?=regexp) and the lookbehind assertion is denoted by (?<=fixed-regexp). Some examples are

$x = "I catch the housecat 'Tom-cat' with catnip";
$x =~ /cat(?=\s+)/;                                # matches 'cat' in 'housecat'
@catwords = ($x =~ /(?<=\s)cat\w+/g);              # matches, $catwords[0] = 'catch' $catwords[1] = 'catnip'
$x =~ /\bcat\b/;                                   # matches 'cat' in 'Tom-cat'
$x =~ /(?<=\s)cat(?=\s)/;                          # doesn't match; no isolated 'cat' in middle of $x

Grep / Map

Use grep on a list to return the element of that list for which the expression is true. For instance

@foo = grep(!/^#/, @bar);                           # Only returns line that are not comments
my @array = ("el1","gel2","el3","gel1","gel2");
my @array2 = grep {s/(.*el)/reverse $1/e} @array;   # grep may also modify the elements in the returned list

Use map on a list to apply a given expression on all elements in the list.

@chars = map(chr, @nums);						# Returns the list of character corresponding to the list of of numbers

File and I/O

Basics

See perlopentut for more details.

Use open to open a file. Special variable $! contains the status of last operation:

open (my $in, "<", "input.txt") or die "can't open input.txt: $!";
open (my $out,">", "output.txt") or die "can't open output.txt: $!";
open (my $log,">>", "my.log") or die "can't open my.log: $!";

Read from the file using operator <>:

my $line  = <$in>;                           # Read one line
my @lines = <$in>;                           # Read all lines
while (<$in>) {                              # assigns each line in turn to $_
    print "Just read in this line: $_";
}

print accepts an optional first argument specifying which filehandle to print to:

print STDERR "This is your final warning.\n";
print $out $record;
print $log $logmessage;

Finally close the file:

close $in or die "$in: $!";

There are 2 advantages to using an indirect filehandles such as my $in:

it eases namespace management (filehandle like INFO are global to the package).
an indirect filehandle automatically closes when it goes out of scope or when you undefine it.

The old way has some caveats:

open INFO,  "< info.txt" or die "can't open info.txt: $!";       # Leading/trailing whitespace are stripped!
open (INFO, "<", "info.txt") || die "can't open info.txt: $!";   # Using regular FILEHANDLE - name clashes in current package !!!

Common functions / modules

Some commonly used functions / modules:

Name	Description
chdir (function)	Change the current working directory
-X (function)	Various test on files, directories... pretty much like in Bash scripts.
CWD (module)	get pathname of current working directory (provides `getcwd` and `abs_path`).
File::Basename (module)	Parse file paths into directory, filename and suffix

System / STDIN / STDOUT / STDERR

Some examples related to handling of system calls, STDIN, STDOUT and STDERR.

Read something from standard input	$line = <STDIN>; $line = readline(*STDIN); # same thing chomp($line = <STDIN>); # remove trailing newline
Read one character from STDIN	print "Press RETURN..."; $key = getc();
System calls	system "echo hello world!"; system qq(echo hello world!); system $MYCMD, qw(param1), 'the name is'.getname($index);
Discard STDERR on Windows / Linux. Note that on Windows, we use `\nul` because each folder as a nul handler and we want to reduce the number of used handle	my $STDERRNULL = "2>\\nul"; #use this on windows my $STDERRNULL = "2>/dev/null"; #use this on unix my @Results = qx(ls somedirectory $STDERRNULL);
Capture STDOUT	my @ouput = `ls`; my @ouput = qx(ls); system("ls >output.txt");
Capture command exit status	my $exit_status = system("del file.txt");
Temporarily disable STDERR and restore it afterwards	open(SAVE_STDERR, '>&STDERR'); close(STDERR) unless $ENV{CLEARCASE_TRACE_TRIGGERS}; $exe = qx(file $ptmp) =~ /executable\|bourne\|commands text\|\bscript/i; open(STDERR, '>&SAVE_STDERR'); close(SAVE_STDERR);

Pipes

See

Sub-routines

Declaration and definition syntax:

sub NAME[(PROTO)] [: ATTRS];            #  A "forward" declaration
sub NAME[(PROTO)] [: ATTRS] BLOCK       #  A declaration and definition
$subref = sub (PROTO) : ATTRS BLOCK;    #  An anonymous sub-routine, called with &$subref

Importing a sub-routine:

use MODULE qw(NAME1 NAME2 NAME3);

Calling a sub-routine:

NAME(LIST);	   # & is optional with parentheses.
NAME LIST;	   # Parentheses optional if predeclared/imported.
&NAME(LIST);   # Circumvent prototypes.
&NAME;	       # Makes current @_ visible to called subroutine.

Examples:

sub mySub1
{
	my ($param1, $param2) = @_
	return $param1.$param2;
}

sub mySub2
{
	my $param1 = shift
	my $param2 = shift
	return $param1.$param2;
}

Using default value for sub-routine parameters: sub myfunc { my($suffix) = @_ ? "@_" : "defaultvalue"; }

Functions

See [1] for a detailed list of Perl functions.

Chop / Chomp

chop removes the last character of a string. It also works on lists.

chop( my $userinput=<STDIN> );                  #Chop the trailing "\n" in user input
chop( my @list=qx(ls);                          #Chop the trailing "\n" in the command output

chomp removes the trailing record separator (typically \n) of a string. It also works on lists.

chomp( my $userinput=<STDIN> );	                #Chomp the trailing "\n" in user input IF PRESENT
chomp( my @list=qx(ls);                         #Chomp the trailing "\n" in the command output IF PRESENT

-X

The function -X can be used for various test on the files, directories... similar to the test command in Bash:

print "The file exists\n" if -e "../somefile";
print "The directory exists\n" if -d "../some/directory";

Use _ to save a system call, like in:

stat($filename);
print "Readable\n" if -r _;
print "Writable\n" if -w _;
print "Executable\n" if -x _;
print "Text\n" if -T _;
print "Binary\n" if -B _;

Since 5.9.1, operators can be stacked:

print "writable and executable\n" if -f -w -x $file;       # same as -x $file && -w _ && -f _

Modules

See Core Modules for a detailed list of Perl modules. Here a list of frequently used ones:

Benchmark

(CWD) getcwd / abs_path

The function getcwd returns the current working directory. abs_path transforms a given relative path into its equivalent canonical absolute form.

use Cwd qw(getcwd abs_path);
my $dir = getcwd();
my $abs_path = abs_path($file);

File::Basename

use File::Basename;
(name,$path,$suffix) = fileparse($fullname,@suffixlist);
$name                = fileparse($fullname,@suffixlist);
$basename            = basename($fullname,@suffixlist);
$dirname             = dirname($fullname);

File::Find

File:Find provides functions similar to the Unix find command for searching through directory trees doing work on each file.

use File::Find;
find(\&wanted, @directories_to_search);                 #depth-first search - preorder traversal - no options
sub wanted { ... }

use File::Find;
find({ wanted => \&process, follow => 1 }, '.');        #With options
sub process { ... }

use File::Find;
finddepth(\&wanted, @directories_to_search);            #depth-first search - post-order traversal - no options
sub wanted { ... }

Example:

find (\&wanted, $directory);
sub wanted {
    /(\.c|\.cpp|\.cxx|\.h|\.hpp|\.hxx)$/ &&
        print "$_\n";
}

Examples

One-Liners

See [3], [4], [5], [6].

perl -ne 'print unless /^$/../^$/' input                              # print lines, unless blank
perl -ne 'print if ! /^$/../^$/' input                                # reduce runs of blank lines to a single blank line
perl -nle 'print $.; close ARGV if eof' input input                   # $. need to be reset (by closing ARGV) between 2 input files
perl -nle 'print for m/\b(\S+)\b/g' paragraphs                        # print words from file paragraphs
perl -nle 'while(m/(\S+)\s+the\s+(\S+)/g){print "$1 $2"}' paragraphs  # ... while loop needed when using multiple back-references
perl -lne 'print for /id is <(\d+)>/g'                                # match pattern and extract backreference
perl -lne 'print $2 for /id is <(\d+)> or <(\d+)>/g'                  # ... print 2nd matched backreference
cat oldfile | perl -pe 's/(\d+)_/sprintf("%2.2d_",$1)/e' > newfile    # evaluate regex substitutions
perl -pe 'tr /A-Za-z/\n/cs' file.txt                                  # Tokenize a text file - c=complement, s=squash duplicates

Miscellaneous

Report a complete error message when loading a script.

This hack allows for printing a custom error message + file not found-like error message (given by $!) + syntax error messages (@_)

do "your script.pl"
	or (print "Your error message\n$!\n" and die @_);

uppercase / lowercase

my $lowercase = lc "My StRiNg";                 #mystring
my $uppercase = uc "My StRiNg";	                #MYSTRING
my $firstcharlowercase = lcfirst "My StRiNg";   #my StRiNg
my $firstcharuppercase = ucfirst "My StRiNg";   #My StRiNg

Split a multiline variable/output

Method 1 - using an array variable.

my @myarray=qx(ls);
foreach (@myarray)
{
	chomp;
	print "The file is '$_'.\n";
}

Method 2 - using a scalar variable.

my $myscalar=qx(ls);
foreach (split /\n/,$myscalar)
{
	print "The file is really '$_'\n";           # No need for chomping
}

Parsing Command Line Parameters

Command line parameters are parsed through variable ARGV.

print scalar @ARGV;                   #number of parameters
print $#ARGV;                         #... idem
print "1st param: $ARGV[0]";          #positional parameters
print "2nd param: $ARGV[1]";
print "Executable name: $0";          #Name of current executable
usage() unless defined($ARGV[0])      # defined($ARGV[0]) is true if there is a parameter

Simple version

#!/usr/bin/perl

use strict;
use warnings;

my $verbose=0;
my $projectdir;

# Parse command options (-option).
while ($#ARGV>=0 && $ARGV[0] =~ m/^\-/ ) {
    $verbose=1 if $ARGV[0] =~ m/^\-v/i;
    shift @ARGV;
}

# Parse mandatory parameter
usage() unless defined($ARGV[0]);
$projectdir=$ARGV[0];

# Show parsing result
print "verbose=$verbose\n";
print "projectdir=$projectdir\n";

exit 0;

sub usage {
    print "Usage: $0 [options] directory\n";
    print "\n";
    print "  Options:\n";
    print "    -v   verbose mode\n";
    exit;
}

Using GetOpt

use strict;
use Getopt::Long qw(:config no_ignore_case);

my $debug=0;
my $quiet=0;
my username;

# Parse options
GetOptions ("d|debug+"  => \$debug,
            "q|quiet"   => \$quiet,
            "u|user=s"  => \$username ) || usage();

# Parse remaining parameters
my $url = $ARGV[0];

Internet

#!/usr/bin/perl

use strict;
use LWP::UserAgent;
use LWP::Debug;
use HTTP::Cookies;
use HTML::TableExtract;

my $debug = 0;          # Set to 1 for debug information
my $proxy = 0;
my $username;
my $password;

my $url = $ARGV[0];

LWP::Debug::level('+') if $debug;
my $ua = LWP::UserAgent->new;

# Session cookie
my $jar = HTTP::Cookies->new ();
$ua->cookie_jar($jar);

# Enable proxy...
$ua->env_proxy if $proxy;

# Fetch the articles + url
my %articles = get_articles ( $ua, $url );

# Get starting URL....
my $res = $ua->get($url);
unless ($res->is_success) {
    die "Die: " . $res->status_line, "\n";
}
exit 0;

print $res->content if $debug;

my $res = $ua->post( $url,
                     [
                      'j_username'  =>  $username,
                      'j_password'  =>  $password,
                      'Submit'      =>  'Entrer'
                     ]
    );

unless ($res->is_success) {
    die "Die: ".$res->status_line, "\n";
}

sub get_articles {
    my $ua = shift;
    my $url = shift;

    my $res = $ua->get ($url);
    unless ($res->is_success) {
        die "http-get failed: ".$res->status_line, "\n$url\n";
    }

    my $te = HTML::TableExtract->new ( slice_columns => 0,
                                       keep_html => 1,
                                       keep_headers => 1,
                                       subtables => 1,
                                       headers => [qw(Matter)] );

    $te->parse($res->content);
    open(my $fh, ">mpe.html") || die "Cannot create file: $!\n";
    print $fh $res->content;
    close($fh);

    foreach my $ts ( $te->tables ) {
#       print "Table (", join(',', $ts->coords), "):\n";
        foreach my $row ( $ts->rows ) {
            print "Row: " . join (';', @$row ). "\n";
            next unless $row->[0] =~ m/\/content\/(.*)\/fulltext/;
            print $1."\n";
        }
    }
    return 0;
}

Passing filehandle as sub parameters and return values

This requires the use of a reference. First as return value:

sub openTimeOut($)
{
	my $filename = shift;
	my $timeout=15;
	while( !open(LOG,$filename) ) { sleep 1; --$timeout or die "Time out trying to open file $filename"; }
	return \*LOG;
}

sub printToFile($@)
{
	my $filename = shift;
	my $fh = openTimeOut(">$filename");
	print $fh @_;
	close($fh);
}

BUT BEWARE, actually OpenTimeOut returns a reference to the same file handle in current glob! The code below illustrate this:

my ($to,$from) = @_;
$fhto = openTimeOut(\*TO,">>$to");
$fhfrom = openTimeOut(\*FROM,"<$from");    # This returns same FILEHANDLE reference as $fhto
while (<$fhfrom>) {print $fhto $_}         # Failed, because now $fhto = $fhfrom, which only open for output
close($fhfrom);
close($fhto);

The solution, pass by parameters:

sub openTimeOut2(*;$)
{
	my $fh = shift;
	my $filename = shift;
	my $timeout=15;
	while( !open($fh,$filename) ) { sleep 1; --$timeout or die "Time out trying to open file $filename"; }
}

sub printToFile($@)
{
	my $filename = shift;
	openTimeOut2(\*LOG,">$filename");
	print LOG @_;
	close(LOG);
}

Embedding a perl script in a W2K shell script

Notice how the first rem is actually a multiline assignment to perl array variable @rem, where the value is quoted with ' '.

@rem= 'PERL for Windows NT - ccperl must be in search path
@echo off
ccperl %0 %1 %2 %3 %4 %5 %6 %7 %8 %9
goto endofperl
@rem ';

# Your Perl code comes here

# End of Perl section

__END__
:endofperl

Pitfalls

Forgetting to chomp the trailing "\n"	my $path = qx(pwd); #NOK! trailing \n will corrupt path construction chomp( my $path = qx(pwd) ); #OK!
Mixing case in name of package	# Imagine a module file named Vobs.pm use Vobs; # OK use VOBs; # NOK &rar; Will complain about double definition # (but will not flag the mix case problem)
Operator precedence and strange behaviour	chomp my @emptylist = qx("dir"); #NOK ! @emptylist will be empty chomp ( my @list = qx("dir") ); #OK !
Forgetting to use ${...} to separate variable identifier	my $variable; print "$variable_temp\n"; # NOK! Print a variable named variable_temp print "${variable}_temp\n"; # OK! Print a $variable, followed by "_temp"
STDERR redirection cannot be given as a command parameter to system	system "echo hello world! 2>\\nul"; # OK system qq(echo hello world! 2>\\nul); # OK system "echo", "hello world!"," 2>\\nul"; # NOK - 2>\\nul taken as a parameter
Forgetting local in sub-routines (see [7]). In particular pay attention that `$_` is assigned e.g. in while loops	sub localized { local @ARGV = ("/etc/motd"); # OK local $/ = undef; # OK local $_ = <>; # OK @Fields = split /^\s=+\s$/; }

CPAN - Perl Packages

First time launch:

$ cpan                                       # ... OR ...
$ perl -MCPAN -e shell                       #  --> yes auto config

To adapt config related to proxy:

cpan> o config init /proxy/                  # (to enter an empty string, simply enter 1 space char as a value)
cpan> o conf commit

To install a Perl package (eg. here package GetOpt::Long):

$ cpan
cpan>install GetOpt::Long

Editing the configruation:

cpan> o conf init                            # Reset the configuration
cpan> o conf http_proxy http://proxy:8080/   # Edit a given variable (eg. here changing proxy settings):
cpan> o conf commit                          # To commit changes in the configuration
cpan> o                                      # to get o options
cpan> o conf                                 # To get o conf option

To edit CPAN database url:

cpan> o conf /urllist/
cpan> o conf init /urllist/
cpan> o conf urllist shift
cpan> o conf urllist unshift ftp://my.new.site/
cpan> o conf commit

To update CPAN itself:

cpan> install Bundle::CPAN
cpan> reload cpan

Perl: Difference between revisions

Revision as of 20:43, 24 May 2010

Reference

Quick Introduction

Program Structure

Data Types

Arrays

Hashes

References

String

Statement modifiers

Compound Statements

Operators

Quote and quote-like operators

Regular expressions

Finding matches

Extracting matches

Using back-references

Search & Replace

The split operator

Lookahead / Lookbehind

Grep / Map

File and I/O

Basics

Common functions / modules

System / STDIN / STDOUT / STDERR

Pipes

Sub-routines

Functions

Chop / Chomp

-X

Modules

Benchmark

(CWD) getcwd / abs_path

File::Basename

File::Find

Examples

One-Liners

Miscellaneous

Split a multiline variable/output

Parsing Command Line Parameters

Simple version

Using GetOpt

Internet

Passing filehandle as sub parameters and return values

Embedding a perl script in a W2K shell script

Pitfalls

CPAN - Perl Packages

Navigation menu

Search