Sed
References
- The SED Homepage on SourceForge
- The SED FAQ
- The SED man page
- Sed, a stream editor
- The sed one-liners
Installation
It is recommended to add the following alias in your ~/.bashrc:
alias sed="sed -r"
Of course, this alias has no effect on shell script. There you'll have to specify the option explicitly at each invokation.
Usage
Some basic usage:
sed [OPTION]... {script-only-if-no-other-script} [input-file]...
sed -n # Silent - suppress automatic printing of pattern space
sed -r # Use extended regular expression
sed -i "s/foo/bar/" *.txt # In-place file modification
Commands a
, i
and c
Use of address commands a\text, i\text, c\text. The command is terminated by a *newline*. To insert a newline character, use \n:
cat mytext
# First line
# Second line
cat mysedscript
# 1 {i\inserted text
# s/$/ (not anymore)/g}
sed -f mysedscript mytext
# inserted text
# First line (not anymore)
# Second line
All on one line: use echo -e
to generate the newline that terminates the command i
:
echo -e "1 {i\\inserted text\ns/$/ (not anymore)/g}"| sed -f - mytext
# inserted text
# First line (not anymore)
# Second line
Same result without command i
:
sed "1 {s/^/inserted text\n/; s/$/ (not anymore)/}" mytext
Regular expressions
See Regular Expressions.
Script Examples
Remove <script>...</script> HTML tag
s!<script[>\x20\t].*</script>!!g
/<script[>\x20\t]/{
s!<script[>\x20\t].*!!g
:NEXTCYCLE
n
/<\/script>/!{
s!.*!!g
b NEXTCYCLE
}
s!.*</script>!!g
}
Remove newlines
Newline characters are added to the pattern space when using the append command N. The script below removes all newlines from standard input:
:a N
s/\n/ /g
b a
Remove trailing whitespaces
find -name '*.[c|h|s]' -print0 | xargs -r0 sed -e 's/[[:blank:]]\+$//' -i
ack-grep --text --type-set=pdf=.pdf --nopdf -f --print0 | xargs -r0 sed -r -i 's/\s+$//';
Recursive patterns
For instance, to transform a path like /usr/local/share/bin/../../../bin/foo into /usr/bin/foo:
s!^([^./])!\./\1! # Prefix with './' unless starts with '.' or '/'
s!/./!/!g # Remove any './' in middle
:a s!/[^/]*[^/.]/\.\.!!g # Remove /foo/.. (1st letter must not be '/', last letter must not be '.')
t a # ... and repeat until no more substitutions
echo "/usr/local/share/bin/../../../bin/foo" | sed -r 's!^([^./])!\./\1!; s!/a./!/!g; :a s!/[^/]*[^/.]/\.\.!!g; t a'
Test paths:
/usr/local/share/../../../bin/foo # /bin/foo
/usr/local/./share/../../../bin/foo # /bin/foo
./usr/../bin/foo # ./bin/foo
usr/../bin/foo # ./bin/foo
usr/../bin # ./bin
usr/../bin/.. # .
usr/../bin/../.. # ./..
hex conversion in .reg file
eval "$(sed -r ':a N; s/\\\n *//g; b a' mapi-utf8.reg | sed -r "s/(.*)/echo \'\1\'/; /hex:/s/echo/echo -e/" | sed -r '/hex:/{s/,00//g; s/([:,])([0-9a-f][0-9a-f])/\1\\x\2/g}; s/,//g')"
Find whole word matches only
Use \b
, as in
sed -rn '/\bWORD\b/p' myfile.txt
Concatenate C commands spanning on multiple lines
Say we have some C file where some commands are spanning on multiple lines, and we want them back on a single line (for instance, to process them further). Use the following script:
find -name "*.[ch]" -type f -print0|xargs -0 sed -r '/#define/b a; /my_function/{:b /;/b a;N;s/\n//; b b};:a'|grep my_function # To review result
find -name "*.[ch]" -type f -print0|xargs -0 sed -ri '/#define/b a; /my_function/{:b /;/b a;N;s/\n//; b b};:a' # To apply result in-place
Match non-ascii characters / invalid collation character
By default sed only works with 7-bit ascii character [1], [2].
Here, in LANG=en_US.UTF-8
, we see that non-ascii character is ignored:
echo $'Copyright \xa9 1999' | sed -r 's/./x/g'
# xxxxxxxxxx�xxxxx
Trying to give non-ascii range gives error Invalid collation character
:
echo $'Copyright \xa9 1999' | sed -r 's/[\d128-\d255]/x/g'
# sed: -e expression #1, char 19: Invalid collation character
We can bypass this issue by using a 8-bit character set, for instance iso-8859-1
:
echo $'Copyright \xa9 1999' | LANG=iso-8859-1 sed -r 's/./x/g'
# xxxxxxxxxxxxxxxx
echo $'Copyright \xa9 1999' | LANG=iso-8859-1 sed -r 's/[\d128-\d255]/x/g'
# Copyright x 1999
Another solution is to set LC_COLLATE=C LC_CTYPE=C, which always avoid bugs in shell scripts [3]:
$ echo $'Copyright \xa9 1999' | LC_COLLATE=C LC_CTYPE=C sed -r 's/./x/g'
xxxxxxxxxxxxxxxx