Sed: Difference between revisions

Latest revision as of 16:28, 3 July 2021

References

Installation

It is recommended to add the following alias in your ~/.bashrc:

alias sed="sed -r"

Of course, this alias has no effect on shell script. There you'll have to specify the option explicitly at each invokation.

Usage

Some basic usage:

sed [OPTION]... {script-only-if-no-other-script} [input-file]...
sed -n                              # Silent - suppress automatic printing of pattern space
sed -r                              # Use extended regular expression
sed -i "s/foo/bar/" *.txt           # In-place file modification

Portable scripts / deal with locale

It is recommended to set environment variables LC_COLLATE and LC_CTYPE to C [1] to avoid bugs in shell scripts:

export LC_COLLATE=C LC_CTYPE=C

# Now the following line works as expected
echo $'Copyright \xa9 1999' | sed -r 's/./x/g'

Another solution is also to set environment variable LANG to 8-bit character set like iso-8859-1.

Commands `a`, `i` and `c`

Use of address commands a\text, i\text, c\text. The command is terminated by a *newline*. To insert a newline character, use \n:

cat mytext
# First line
# Second line
cat mysedscript
# 1 {i\inserted text
# s/$/ (not anymore)/g}
sed -f mysedscript mytext
# inserted text
# First line (not anymore)
# Second line

All on one line: use echo -e to generate the newline that terminates the command i:

echo -e "1 {i\\inserted text\ns/$/ (not anymore)/g}"| sed -f - mytext
# inserted text
# First line (not anymore)
# Second line

Same result without command i:

sed "1 {s/^/inserted text\n/; s/$/ (not anymore)/}" mytext

Empty regular expression

Using // allows to match the previous regex, without repeating it (see [2]).

Regular expressions

See Regular Expressions.

Script Examples

Remove <script>...</script> HTML tag

s!<script[>\x20\t].*</script>!!g
/<script[>\x20\t]/{
    s!<script[>\x20\t].*!!g
    :NEXTCYCLE
    n
    /<\/script>/!{
        s!.*!!g
        b NEXTCYCLE
    }
    s!.*</script>!!g
}

Remove newlines

Newline characters are added to the pattern space when using the append command N. The script below removes all newlines from standard input:

:a N
s/\n/ /g
b a

One liner in bash:

sed -r ':a N; s/\n//; b a' FILE

Remove trailing whitespaces

find -name '*.[c|h|s]' -print0 | xargs -r0 sed -e 's/[[:blank:]]\+$//' -i
ack-grep --text --type-set=pdf=.pdf --nopdf -f --print0 | xargs -r0 sed -r -i 's/\s+$//';

Recursive patterns

For instance, to transform a path like /usr/local/share/bin/../../../bin/foo into /usr/bin/foo:

s!^([^./])!\./\1! # Prefix with './' unless starts with '.' or '/' s!/./!/!g # Remove any './' in middle :a s!/[^/]*[^/.]/\.\.!!g # Remove /foo/.. (1st letter must not be '/', last letter must not be '.') t a # ... and repeat until no more substitutionsecho "/usr/local/share/bin/../../../bin/foo" | sed -r 's!^([^./])!\./\1!; s!/a./!/!g; :a s!/[^/]*[^/.]/\.\.!!g; t a'Test paths:/usr/local/share/../../../bin/foo # /bin/foo /usr/local/./share/../../../bin/foo # /bin/foo ./usr/../bin/foo # ./bin/foo usr/../bin/foo # ./bin/foo usr/../bin # ./bin usr/../bin/.. # . usr/../bin/../.. # ./..hex conversion in .reg fileeval "$(sed -r ':a N; s/\\\n *//g; b a' mapi-utf8.reg | sed -r "s/(.*)/echo \'\1\'/; /hex:/s/echo/echo -e/" | sed -r '/hex:/{s/,00//g; s/([:,])([0-9a-f][0-9a-f])/\1\\x\2/g}; s/,//g')"Find whole word matches onlyUse \b, as insed -rn '/\bWORD\b/p' myfile.txtConcatenate C commands spanning on multiple linesSay we have some C file where some commands are spanning on multiple lines, and we want them back on a single line (for instance, to process them further). Use the following script:find -name "*.[ch]" -type f -print0|xargs -0 sed -r '/#define/b a; /my_function/{:b /;/b a;N;s/\n//; b b};:a'|grep my_function # To review result find -name "*.[ch]" -type f -print0|xargs -0 sed -ri '/#define/b a; /my_function/{:b /;/b a;N;s/\n//; b b};:a' # To apply result in-placeMatch non-ascii characters / invalid collation characterBy default sed only works with 7-bit ascii character [3], [4]. Here, in LANG=en_US.UTF-8, we see that non-ascii character is ignored:echo $'Copyright \xa9 1999' | sed -r 's/./x/g' # xxxxxxxxxx�xxxxxTrying to give non-ascii range gives error Invalid collation character:echo $'Copyright \xa9 1999' | sed -r 's/[\d128-\d255]/x/g' # sed: -e expression #1, char 19: Invalid collation characterWe can bypass this issue by using a 8-bit character set, for instance iso-8859-1:echo $'Copyright \xa9 1999' | LANG=iso-8859-1 sed -r 's/./x/g' # xxxxxxxxxxxxxxxx echo $'Copyright \xa9 1999' | LANG=iso-8859-1 sed -r 's/[\d128-\d255]/x/g' # Copyright x 1999Another solution is to set LC_COLLATE=C LC_CTYPE=C, which always avoid bugs in shell scripts [5]:$ echo $'Copyright \xa9 1999' | LC_COLLATE=C LC_CTYPE=C sed -r 's/./x/g' xxxxxxxxxxxxxxxxDelete the first matching lineFrom SO:# Delete first line matching 'foo' sed '0,/foo/{//d}' inputfile # Use 0,ADDR2, so that ADDR2 can match the 1st lineNote the special construction //d using empty regular expression [6], that matches the last given regular expression.

Sed: Difference between revisions

Latest revision as of 16:28, 3 July 2021

Contents

References

Installation

Usage

Portable scripts / deal with locale

Commands `a`, `i` and `c`

Empty regular expression

Regular expressions

Script Examples

Remove <script>...</script> HTML tag

Remove newlines

Remove trailing whitespaces

Recursive patterns

hex conversion in .reg file

Find whole word matches only

Concatenate C commands spanning on multiple lines

Match non-ascii characters / invalid collation character

Delete the first matching line

Navigation menu

Sed: Difference between revisions

Latest revision as of 16:28, 3 July 2021

References

Installation

Usage

Portable scripts / deal with locale

Commands a, i and c

Empty regular expression

Regular expressions

Script Examples

Remove <script>...</script> HTML tag

Remove newlines

Remove trailing whitespaces

Recursive patterns

hex conversion in .reg file

Find whole word matches only

Concatenate C commands spanning on multiple lines

Match non-ascii characters / invalid collation character

Delete the first matching line

Navigation menu

Search

Commands `a`, `i` and `c`