Awk: Difference between revisions
(→Tips) |
|||
(18 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== References == |
== References == |
||
* [http://vc.airvectors.net/tsawk.html An Awk Primer (good tutorial on Awk)] |
* [http://vc.airvectors.net/tsawk.html An Awk Primer (good tutorial on Awk)] |
||
* [https://www.gnu.org/software/gawk/manual/ gawk User guide] |
|||
* '''GAWK: Effective AWK Programming''' ({{file|gawk.pdf}} from package {{deb|gawk-doc}}) |
* '''GAWK: Effective AWK Programming''' ({{file|gawk.pdf}} from package {{deb|gawk-doc}}) |
||
* [https://learnbyexample.github.io/learn_gnuawk/ CLI text processing with GNU Awk] (book, many examples) |
|||
On Awk: |
|||
== Awk Program Examples == |
|||
* [http://www.skeeve.com/awk-sys-prog.html AWK As A Major Systems Programming Language — Revisited] |
|||
== Awk Examples == |
|||
<source lang="bash"> |
<source lang="bash"> |
||
ps al | awk '{print $2}' # Print second field of ps output |
ps al | awk '{print $2}' # Print second field of ps output |
||
Line 19: | Line 24: | ||
perl -lne 'print $1 if /<configuration .* id="([^"]*)" name="some_name"/' FILE |
perl -lne 'print $1 if /<configuration .* id="([^"]*)" name="some_name"/' FILE |
||
# some_id.1525790178 |
# some_id.1525790178 |
||
</source> |
|||
== Language reference == |
|||
=== Awk program structure === |
|||
<source lang="awk"> |
|||
@include "script1" # gawk extension |
|||
pattern {action} |
|||
pattern {action} |
|||
# ... |
|||
function name (args) { ... } |
|||
</source> |
|||
A ''rule'' is a ''pattern'' and ''action''. Either pattern or action can be omitted. |
|||
=== Patterns === |
|||
<source lang="awk"> |
|||
/regular expression/ { } # match when input records fits reg. exp. |
|||
expression { } # match when expression is nonzero |
|||
begpat, endpat { } |
|||
BEGIN { } # match program begin. All BEGIN rules are merged. |
|||
END { } # match program end. All END rules are merged. |
|||
BEGINFILE { } # match begin of each file (merged) |
|||
ENDFILE { } # match end of each file (merged) |
|||
{ } # empty pattern. Match every input record |
|||
</source> |
|||
Search patterns using regex can be constrained to a given field: |
|||
<source lang="awk"> |
|||
$1 ~ /^France$/ { } # searches for lines whose first field is the word France |
|||
$1 !~ /^Norway$/ { } # searches for lines whose first field is NOT the word Norway |
|||
</source> |
|||
EXamples of expressions: |
|||
<source lang="awk"> |
|||
NR == 10 { } # Match line number 10 |
|||
NR == 10, NR == 20 { } # Match line 10 through 20 |
|||
NF == 0 { } # Match empty lines (ie. with ZERO field) |
|||
$1 == "France" { } # Match line whose first word is "France" |
|||
</source> |
|||
'''Attention''' with numeric comparisons: |
|||
<source lang="awk"> |
|||
(( $1 + 0 ) == $1 ) { } # Match if first field is numeric |
|||
(( $1 + 0 ) != $1 ) { } # Match if first field is string |
|||
$1 == 100 { } # Numeric compare -- always OK |
|||
$1 < 100 { } # DANGEROUS - FAIL IF $1 not numeric |
|||
((( $1 + 0 ) == $1 ) && ( $1 > 100 )) { } # BETTER - 1st check if field is numeric |
|||
</source> |
|||
=== Control statement === |
|||
;Block and sequences |
|||
:Instructions are grouped with braces <code>{ ... }</code> and separated by newlines or semi-colons <code>;</code> |
|||
<source lang="awk"> |
|||
{ if (NR) { print NR; print "hello" } } |
|||
</source> |
|||
;If statement |
|||
<source lang="awk"> |
|||
# multiline |
|||
if (x % 2) |
|||
print "x is even" |
|||
else |
|||
print "x is odd" |
|||
# single line |
|||
if (x % 2) print "x is even"; else print "x is odd" |
|||
</source> |
|||
;While statement |
|||
<source lang="awk"> |
|||
i = 1; while (i <= 3) { print $i; i++ } |
|||
</source> |
|||
;For statement |
|||
<source lang="awk"> |
|||
for (i = 1; i <= 3; i++) print $i |
|||
</source> |
|||
=== Functions === |
|||
<source lang="awk"> |
|||
t=mktime("2020 12 26 23 43 11") # Convert to time integer |
|||
gsub(/[:-]/," ",$1); t=t=mktime($1) # if input in 1st field, formatted as 2020-12-26 23:43:11 |
|||
</source> |
|||
== How-To == |
|||
=== Execute a system command and capture its output === |
|||
To run a system command, we use <code>system("cmd")</code>. However to capture its output, we use <code>cmd | getline value</code> [https://stackoverflow.com/questions/1960895/assigning-system-commands-output-to-variable]. |
|||
However, we must also '''close the command''', otherwise awk will complain / will not reexecute the command / will produce strange resuts: |
|||
Example of program: |
|||
<source lang="awk"> |
|||
/\/\/ test password/ { |
|||
cmd = "openssl rand -hex 16"; |
|||
cmd | getline r; |
|||
gsub(/[0-9a-f][0-9a-f]/,"0x&, ",r); |
|||
print " { ", r, "}, // test password - DO NOT EDIT THIS COMMENT"; |
|||
close(cmd); |
|||
next; |
|||
} |
|||
{print}' |
|||
</source> |
</source> |
||
== Tips == |
== Tips == |
||
=== Defining environment variable === |
|||
Using an ''Awk'' script and Bash builtin '''eval''' |
|||
<source lang="bash"> |
<source lang="bash"> |
||
eval $(awk 'BEGIN{printf "MY_VAR=value";}') |
eval $(awk 'BEGIN{printf "MY_VAR=value";}') |
||
Line 28: | Line 133: | ||
</source> |
</source> |
||
=== Hexadecimal conversion === |
|||
Use <code>strtonum</code> to convert parameter: |
|||
<source lang="awk"> |
<source lang="awk"> |
||
{ |
{ |
||
Line 36: | Line 142: | ||
} |
} |
||
</source> |
</source> |
||
* '''Using environment variables''' - Use <code>ENvIRON["NAME"]</code>: |
|||
Alternatively, use <code>awk --non-decimal-data</code> to have gawk interpret hexadecimal and octal immediately. |
|||
=== Using environment variables === |
|||
Use <code>ENVIRON["NAME"]</code>: |
|||
<source lang="awk"> |
<source lang="awk"> |
||
{ print strtonum("0x"ENVIRON["STARTADDR"]); } |
{ print strtonum("0x"ENVIRON["STARTADDR"]); } |
||
</source> |
</source> |
||
* '''Pass command-line parameters''' - Awk variables can be defined directly on the invocation line: |
|||
=== Pass command-line parameters === |
|||
Awk variables can be defined directly on the invocation line: |
|||
<source lang="bash"> |
<source lang="bash"> |
||
awk -v myvar=123 'BEGIN { printf "myvar is %d\n",myvar }' # Use -v (before program text) for var used in BEGIN section |
awk -v myvar=123 'BEGIN { printf "myvar is %d\n",myvar }' # Use -v (before program text) for var used in BEGIN section |
||
echo foo | awk '{ printf "myvar is %d\n",myvar }' myvar=123 # Otherwise specify var after program text |
echo foo | awk '{ printf "myvar is %d\n",myvar }' myvar=123 # Otherwise specify var after program text |
||
</source> |
</source> |
||
* '''Pass command-line parameters''' - Awk defines the variables <code>ARGC</code> and <code>ARGV</code>: |
|||
=== Pass command-line parameters === |
|||
Awk defines the variables <code>ARGC</code> and <code>ARGV</code>: |
|||
<source lang="awk"> |
<source lang="awk"> |
||
BEGIN { |
BEGIN { |
||
Line 52: | Line 166: | ||
} |
} |
||
</source> |
</source> |
||
* '''<code>$0</code> is the whole line''' |
|||
=== <code>$0</code> is the whole line === |
|||
<source lang=awk> |
<source lang=awk> |
||
# Concatenate DNS |
# Concatenate DNS |
||
Line 59: | Line 174: | ||
END {print record} |
END {print record} |
||
</source> |
</source> |
||
* '''String concatenation''' — simply line up the string without operator. |
|||
=== String concatenation === |
|||
simply line up the string without operator. |
|||
<source lang=awk> |
<source lang=awk> |
||
print "The result is " result; |
print "The result is " result; |
||
</source> |
</source> |
||
* '''Next line on pattern match''' — Only match one pattern in a pattern list |
|||
=== Next line on pattern match === |
|||
Only match one pattern in a pattern list |
|||
<source lang="awk"> |
<source lang="awk"> |
||
/PATTERN1/ {print $1; next} |
/PATTERN1/ {print $1; next} |
||
Line 69: | Line 188: | ||
{print $3} |
{print $3} |
||
</source> |
</source> |
||
=== Force int conversion with <code>x+0</code> === |
|||
Say we have a file with numbers collated to non-digit: |
|||
( 1 2) |
|||
( 1 3) |
|||
We can force integer conversion by applying some mathematical operation: |
|||
<source lang="bash"> |
|||
awk '{print $3}' foo |
|||
# 2) |
|||
# 3) |
|||
awk '{print $3+0}' foo |
|||
# 2 |
|||
# 3 |
|||
</source> |
|||
=== Pattern conversion === |
|||
2014-01 2,277.40 |
|||
2014-02 2,282.20 |
|||
2014-03 3,047.90 |
|||
2014-04 4,127.60 |
|||
2014-05 5,117.60 |
|||
Use <code>gsub</code> for regex replacement (here remove the commas <code>,</code>): |
|||
<source lang="bash"> |
|||
awk '{gsub(/,/,"",$2);sum+=$2}END{printf("%f",sum)}' |
|||
</source> |
|||
=== Remove duplicates, keeping line order === |
|||
A simple awk script to remove duplicate lines from a file, keeping original order [https://iridakos.com/how-to/2019/05/16/remove-duplicate-lines-preserving-order-linux.html]: |
|||
<source lang=bash> |
|||
awk '!visited[$0]++' your_file > deduplicated_file |
|||
</source> |
|||
=== Remove the first, second... line matching a pattern === |
|||
From [https://stackoverflow.com/questions/23696871/how-to-remove-only-the-first-occurrence-of-a-line-in-a-file-using-sed SO]: |
|||
<source lang="bash"> |
|||
awk '/foo/{ if (++f == 1) next} 1' file # Delete 1st matching line |
|||
awk '/foo/{ if (++f == 2) next} 1' file # Delete 2nd matching line |
|||
awk '/foo/{ if (++f ~ /^(1|2)$/) next} 1' file # Delete 1st and 2nd matching line |
|||
</source> |
|||
=== Process CSV files === |
|||
See '''csvquote''' in [[Linux Commands]]. |
|||
There is also a rewrite of [https://github.com/benhoyt/goawk AWK in Go], with csv support. |
Latest revision as of 16:00, 28 August 2023
References
- An Awk Primer (good tutorial on Awk)
- gawk User guide
- GAWK: Effective AWK Programming (gawk.pdf from package gawk-doc)
- CLI text processing with GNU Awk (book, many examples)
On Awk:
Awk Examples
ps al | awk '{print $2}' # Print second field of ps output
arp -n 10.137.3.129|awk '/ether/{print $3}' # Print third field of arp output, if line contains 'ether' somewhere
getent hosts unix.stackexchange.com | awk '{ print $1 ; exit }' # Print only first line, then exit
find /proc -type l | awk -F"/" '{print $3}' # Print second folder name (i.e. process pid)
Example of parsing an XML file (and comparing with perl
):
cat FILE
# <configuration buildProperties="" description="" id="some_id.1525790178" name="some_name" parent="some_parent">
awk -F "[= <>\"]+" '/<configuration / { if ($8 == "some_name") print $6 }' FILE
# some_id.1525790178
perl -lne 'print $1 if /<configuration .* id="([^"]*)" name="some_name"/' FILE
# some_id.1525790178
Language reference
Awk program structure
@include "script1" # gawk extension
pattern {action}
pattern {action}
# ...
function name (args) { ... }
A rule is a pattern and action. Either pattern or action can be omitted.
Patterns
/regular expression/ { } # match when input records fits reg. exp.
expression { } # match when expression is nonzero
begpat, endpat { }
BEGIN { } # match program begin. All BEGIN rules are merged.
END { } # match program end. All END rules are merged.
BEGINFILE { } # match begin of each file (merged)
ENDFILE { } # match end of each file (merged)
{ } # empty pattern. Match every input record
Search patterns using regex can be constrained to a given field:
$1 ~ /^France$/ { } # searches for lines whose first field is the word France
$1 !~ /^Norway$/ { } # searches for lines whose first field is NOT the word Norway
EXamples of expressions:
NR == 10 { } # Match line number 10
NR == 10, NR == 20 { } # Match line 10 through 20
NF == 0 { } # Match empty lines (ie. with ZERO field)
$1 == "France" { } # Match line whose first word is "France"
Attention with numeric comparisons:
(( $1 + 0 ) == $1 ) { } # Match if first field is numeric
(( $1 + 0 ) != $1 ) { } # Match if first field is string
$1 == 100 { } # Numeric compare -- always OK
$1 < 100 { } # DANGEROUS - FAIL IF $1 not numeric
((( $1 + 0 ) == $1 ) && ( $1 > 100 )) { } # BETTER - 1st check if field is numeric
Control statement
- Block and sequences
- Instructions are grouped with braces
{ ... }
and separated by newlines or semi-colons;
{ if (NR) { print NR; print "hello" } }
- If statement
# multiline
if (x % 2)
print "x is even"
else
print "x is odd"
# single line
if (x % 2) print "x is even"; else print "x is odd"
- While statement
i = 1; while (i <= 3) { print $i; i++ }
- For statement
for (i = 1; i <= 3; i++) print $i
Functions
t=mktime("2020 12 26 23 43 11") # Convert to time integer
gsub(/[:-]/," ",$1); t=t=mktime($1) # if input in 1st field, formatted as 2020-12-26 23:43:11
How-To
Execute a system command and capture its output
To run a system command, we use system("cmd")
. However to capture its output, we use cmd | getline value
[1].
However, we must also close the command, otherwise awk will complain / will not reexecute the command / will produce strange resuts:
Example of program:
/\/\/ test password/ {
cmd = "openssl rand -hex 16";
cmd | getline r;
gsub(/[0-9a-f][0-9a-f]/,"0x&, ",r);
print " { ", r, "}, // test password - DO NOT EDIT THIS COMMENT";
close(cmd);
next;
}
{print}'
Tips
Defining environment variable
Using an Awk script and Bash builtin eval
eval $(awk 'BEGIN{printf "MY_VAR=value";}')
echo $MY_VAR
Hexadecimal conversion
Use strtonum
to convert parameter:
{
print strtonum($1); # decimal, octal or hexa (guessed from prefix)
print strtonum("0"$2); # To force octal
print strtonum("0x"$3); # To force hexadecimal
}
Alternatively, use awk --non-decimal-data
to have gawk interpret hexadecimal and octal immediately.
Using environment variables
Use ENVIRON["NAME"]
:
{ print strtonum("0x"ENVIRON["STARTADDR"]); }
Pass command-line parameters
Awk variables can be defined directly on the invocation line:
awk -v myvar=123 'BEGIN { printf "myvar is %d\n",myvar }' # Use -v (before program text) for var used in BEGIN section
echo foo | awk '{ printf "myvar is %d\n",myvar }' myvar=123 # Otherwise specify var after program text
Pass command-line parameters
Awk defines the variables ARGC
and ARGV
:
BEGIN {
for (i = 0; i < ARGC; i++)
print ARGV[i]
}
$0
is the whole line
# Concatenate DNS
/^A\?/{print record; record=$0}
/^A /{record=record " " $0;}
END {print record}
String concatenation
simply line up the string without operator.
print "The result is " result;
Next line on pattern match
Only match one pattern in a pattern list
/PATTERN1/ {print $1; next}
/PATTERN2/ {print $2; next}
{print $3}
Force int conversion with x+0
Say we have a file with numbers collated to non-digit:
( 1 2) ( 1 3)
We can force integer conversion by applying some mathematical operation:
awk '{print $3}' foo
# 2)
# 3)
awk '{print $3+0}' foo
# 2
# 3
Pattern conversion
2014-01 2,277.40 2014-02 2,282.20 2014-03 3,047.90 2014-04 4,127.60 2014-05 5,117.60
Use gsub
for regex replacement (here remove the commas ,
):
awk '{gsub(/,/,"",$2);sum+=$2}END{printf("%f",sum)}'
Remove duplicates, keeping line order
A simple awk script to remove duplicate lines from a file, keeping original order [2]:
awk '!visited[$0]++' your_file > deduplicated_file
Remove the first, second... line matching a pattern
From SO:
awk '/foo/{ if (++f == 1) next} 1' file # Delete 1st matching line
awk '/foo/{ if (++f == 2) next} 1' file # Delete 2nd matching line
awk '/foo/{ if (++f ~ /^(1|2)$/) next} 1' file # Delete 1st and 2nd matching line
Process CSV files
See csvquote in Linux Commands.
There is also a rewrite of AWK in Go, with csv support.