Sed: Difference between revisions
(6 intermediate revisions by the same user not shown) | |||
Line 61: | Line 61: | ||
sed "1 {s/^/inserted text\n/; s/$/ (not anymore)/}" mytext |
sed "1 {s/^/inserted text\n/; s/$/ (not anymore)/}" mytext |
||
</source> |
</source> |
||
=== Empty regular expression === |
|||
Using <code>//</code> allows to match the previous regex, without repeating it (see [https://www.gnu.org/software/sed/manual/html_node/Addresses.html]). |
|||
== Regular expressions== |
== Regular expressions== |
||
Line 87: | Line 90: | ||
s/\n/ /g |
s/\n/ /g |
||
b a |
b a |
||
</source> |
|||
One liner in bash: |
|||
<source lang="bash"> |
|||
sed -r ':a N; s/\n//; b a' FILE |
|||
</source> |
</source> |
||
Line 165: | Line 173: | ||
xxxxxxxxxxxxxxxx |
xxxxxxxxxxxxxxxx |
||
</source> |
</source> |
||
=== Delete the first matching line === |
|||
From [https://stackoverflow.com/questions/23696871/how-to-remove-only-the-first-occurrence-of-a-line-in-a-file-using-sed SO]: |
|||
<source lang="bash"> |
|||
# Delete first line matching 'foo' |
|||
sed '0,/foo/{//d}' inputfile # Use 0,ADDR2, so that ADDR2 can match the 1st line |
|||
</source> |
|||
Note the special construction <code>//d</code> using '''empty regular expression''' [https://www.gnu.org/software/sed/manual/html_node/Addresses.html], that matches the last given regular expression. |
Latest revision as of 16:28, 3 July 2021
References
- The SED Homepage on SourceForge
- The SED FAQ
- The SED man page
- Sed, a stream editor
- The sed one-liners
Installation
It is recommended to add the following alias in your ~/.bashrc:
alias sed="sed -r"
Of course, this alias has no effect on shell script. There you'll have to specify the option explicitly at each invokation.
Usage
Some basic usage:
sed [OPTION]... {script-only-if-no-other-script} [input-file]...
sed -n # Silent - suppress automatic printing of pattern space
sed -r # Use extended regular expression
sed -i "s/foo/bar/" *.txt # In-place file modification
Portable scripts / deal with locale
It is recommended to set environment variables LC_COLLATE
and LC_CTYPE
to C
[1] to avoid bugs in shell scripts:
export LC_COLLATE=C LC_CTYPE=C
# Now the following line works as expected
echo $'Copyright \xa9 1999' | sed -r 's/./x/g'
Another solution is also to set environment variable LANG
to 8-bit character set like iso-8859-1
.
Commands a
, i
and c
Use of address commands a\text, i\text, c\text. The command is terminated by a *newline*. To insert a newline character, use \n:
cat mytext
# First line
# Second line
cat mysedscript
# 1 {i\inserted text
# s/$/ (not anymore)/g}
sed -f mysedscript mytext
# inserted text
# First line (not anymore)
# Second line
All on one line: use echo -e
to generate the newline that terminates the command i
:
echo -e "1 {i\\inserted text\ns/$/ (not anymore)/g}"| sed -f - mytext
# inserted text
# First line (not anymore)
# Second line
Same result without command i
:
sed "1 {s/^/inserted text\n/; s/$/ (not anymore)/}" mytext
Empty regular expression
Using //
allows to match the previous regex, without repeating it (see [2]).
Regular expressions
See Regular Expressions.
Script Examples
Remove <script>...</script> HTML tag
s!<script[>\x20\t].*</script>!!g
/<script[>\x20\t]/{
s!<script[>\x20\t].*!!g
:NEXTCYCLE
n
/<\/script>/!{
s!.*!!g
b NEXTCYCLE
}
s!.*</script>!!g
}
Remove newlines
Newline characters are added to the pattern space when using the append command N. The script below removes all newlines from standard input:
:a N
s/\n/ /g
b a
One liner in bash:
sed -r ':a N; s/\n//; b a' FILE
Remove trailing whitespaces
find -name '*.[c|h|s]' -print0 | xargs -r0 sed -e 's/[[:blank:]]\+$//' -i
ack-grep --text --type-set=pdf=.pdf --nopdf -f --print0 | xargs -r0 sed -r -i 's/\s+$//';
Recursive patterns
For instance, to transform a path like /usr/local/share/bin/../../../bin/foo into /usr/bin/foo:
s!^([^./])!\./\1! # Prefix with './' unless starts with '.' or '/'
s!/./!/!g # Remove any './' in middle
:a s!/[^/]*[^/.]/\.\.!!g # Remove /foo/.. (1st letter must not be '/', last letter must not be '.')
t a # ... and repeat until no more substitutions
echo "/usr/local/share/bin/../../../bin/foo" | sed -r 's!^([^./])!\./\1!; s!/a./!/!g; :a s!/[^/]*[^/.]/\.\.!!g; t a'
Test paths:
/usr/local/share/../../../bin/foo # /bin/foo
/usr/local/./share/../../../bin/foo # /bin/foo
./usr/../bin/foo # ./bin/foo
usr/../bin/foo # ./bin/foo
usr/../bin # ./bin
usr/../bin/.. # .
usr/../bin/../.. # ./..
hex conversion in .reg file
eval "$(sed -r ':a N; s/\\\n *//g; b a' mapi-utf8.reg | sed -r "s/(.*)/echo \'\1\'/; /hex:/s/echo/echo -e/" | sed -r '/hex:/{s/,00//g; s/([:,])([0-9a-f][0-9a-f])/\1\\x\2/g}; s/,//g')"
Find whole word matches only
Use \b
, as in
sed -rn '/\bWORD\b/p' myfile.txt
Concatenate C commands spanning on multiple lines
Say we have some C file where some commands are spanning on multiple lines, and we want them back on a single line (for instance, to process them further). Use the following script:
find -name "*.[ch]" -type f -print0|xargs -0 sed -r '/#define/b a; /my_function/{:b /;/b a;N;s/\n//; b b};:a'|grep my_function # To review result
find -name "*.[ch]" -type f -print0|xargs -0 sed -ri '/#define/b a; /my_function/{:b /;/b a;N;s/\n//; b b};:a' # To apply result in-place
Match non-ascii characters / invalid collation character
By default sed only works with 7-bit ascii character [3], [4].
Here, in LANG=en_US.UTF-8
, we see that non-ascii character is ignored:
echo $'Copyright \xa9 1999' | sed -r 's/./x/g'
# xxxxxxxxxx�xxxxx
Trying to give non-ascii range gives error Invalid collation character
:
echo $'Copyright \xa9 1999' | sed -r 's/[\d128-\d255]/x/g'
# sed: -e expression #1, char 19: Invalid collation character
We can bypass this issue by using a 8-bit character set, for instance iso-8859-1
:
echo $'Copyright \xa9 1999' | LANG=iso-8859-1 sed -r 's/./x/g'
# xxxxxxxxxxxxxxxx
echo $'Copyright \xa9 1999' | LANG=iso-8859-1 sed -r 's/[\d128-\d255]/x/g'
# Copyright x 1999
Another solution is to set LC_COLLATE=C LC_CTYPE=C, which always avoid bugs in shell scripts [5]:
$ echo $'Copyright \xa9 1999' | LC_COLLATE=C LC_CTYPE=C sed -r 's/./x/g'
xxxxxxxxxxxxxxxx
Delete the first matching line
From SO:
# Delete first line matching 'foo'
sed '0,/foo/{//d}' inputfile # Use 0,ADDR2, so that ADDR2 can match the 1st line
Note the special construction //d
using empty regular expression [6], that matches the last given regular expression.