Introduction
Let’s define this problem by an example. This is the kind of text I want to modify.
# include <stdio.h>
void main(int argc, char ** argv) {
printf("Hello world!\n");
printf("Version 2.0\n\
You can use this program as you want.\n");
return 0;
}
This is a C program but I want to use gettext to provide translated version of this program easily. Usually, the solution is to replace each string by a call to a function (the string being the key). I will not enter in details of gettext here but to resume, we want something like this as the result.
# include <stdio.h>
void main(int argc, char ** argv) {
printf(_("Hello world!\n"));
printf(_("Version 2.0\n\
You can use this program as you want.\n"));
return 0;
}
Note that each string has been nested into the function _()
wich is the usual
convention for gettext. If you want the quick solution, go directly to the
end of this article.
Brief introduction to sed
To understand the explanations that will follow, you must at least understand
the basics from sed
. You can either read this section but I strongly
encourage to read the official sed
documentation which
is well-written.
Spaces
The first thing is that sed
will parse the lines one by one. Each line parsed
is put inside the pattern space, then actions can be done on it, and finally
the line is printed.
However, there is also a hold space where you can store informations. And if
you read the sed
documentation, you’ll see that there
is actions which can copy from one space to the other. This will be fundamental
for the multi-line pattern replacement since sed
is mostly for line-by-line
edition. Keep that in mind.
sed
with a script file
There is two main way of running sed
. You can either give arguments to the
command-line or give a sed
script. For simple commands, you may want to use
command-line since it’s faster than creating a script file then giving it as an
argument to sed
. However, for complex commands, you may want to work in a
script file, test it, improve it, correct it. For example, lets do a simple
replacement/deletion and see how we can run it in one or the other way.
With command-line, you’ll do the replacement this way
sed 's/PATTERN/TEXTIWANT/' -e '/OTHERPATTERN/d' my_file.txt
You can add multiple actions by appending -e
for each new action.
For the script-file solution, you would create a file my_file.sed
for example
that will contains the actions.
s/PATTERN/TEXTIWANT/
/OTHERPATTERN/d
Note that the quotes are not needed in the script file. Now, you can run the script.
sed -f my_file.sed my_file.txt
Actions on target
In sed
, most of the possible actions can be restricted to specific lines. For
example, in the previous section, we used the d
action to delete lines. This
action has been prefixed with the pattern that should trigger this action.
-
n will look for the n th line of the input text
-
$
refers to the last line of the input text -
/PATTERN/
will look for lines containingPATTERN
You can also select ranges by using 2 of the previous possibilities separated
with a comma. For example, the following will select and delete any line between a line
containing BPATTERN
and a line containing EPATTERN
.
/BPATTERN/,/EPATTERN/d
Group actions
Sometimes, you need to apply more than one action to the same selected line. In
this case, you can nest the actions into {
and }
. For example, if between
BPATTERN
and EPATTERN
, there is line you want to delete (line containing
BAR
) and others you want to modify (change PATT
into OUT
).
/BPATTERN/,/EPATTERN/ {
/BAR/d
s/PATT/OUT/g
}
Multi-lines pattern
Now, you should have the basics to understand what will follow. These are the steps to follow to make it possible:
-
Print all unconcerned line as it is
-
Put the line that contains the beginning of our pattern into the hold space
-
Append all lines that are not the beginning of our pattern into the hold space
-
Operate multiple operations only on our pattern and when the last line of the pattern is reached
-
Copy what’s in hold space into the pattern space (should contain all the lines we need)
-
Do the replacement
-
Print the result
-
The pattern
The pattern is simple, we know where it begins and when it ends. The start of
the pattern is /("/
and the end is /");/
. Our pattern can be defined as a
range of line like the following.
/("/,/");/
Print lines not in the pattern
We know how to select our pattern. However, we don’t know how to select all
lines that are not in the pattern. sed
provides us a way to do that with
the character !
. Prefixing an action with this character will apply the
actions on all lines that are not the selected pattern or range.
The action to print the current line is p
. Printing the lines that are not in
our pattern can be done like the following.
/("/,/");/!p
Filling the hold space
To fill the hold space, there is 2 possible actions (there is a third one, x
but we will not discuss it):
-
h
will replace the hold space with the content of the pattern space (usually the current line). -
H
will append to the hold space what’s in the pattern space
When we reach the beginning of our pattern, we will call the h
action. Else,
we will append the current line to the hold space with H
.
/("/h
/("/!H
In our example of 7 lnies, the hold space will successively have the following
values (you may use g
then l
to see it).
-
# include <stdio.h>
-
# include <stdio.h>\nvoid main(int argc, char ** argv) {
-
\tprintf("Hello world!\\n");
-
\tprintf("Version 2.0\\n\
-
\tprintf("Version 2.0\\n\\\n\t\tYou can use this program as you want.\\n");
-
\tprintf("Version 2.0\\n\\\n\t\tYou can use this program as you want.\\n");\n\treturn 0;
-
\tprintf("Version 2.0\\n\\\n\t\tYou can use this program as you want.\\n");\n\treturn 0;\n}
Note that the backslash and the tabulation has been escaped with a backslash.
Do the replacement
Now, we want do the replacement when the last line of our pattern is reached. To confine this modification, we first nest all the following operations into the desired range.
/("/,/");/ {
...
}
Then, when we reach the end of our pattern, we create a new block where the actions will take place.
/("/,/");/ {
/");/ {
...
}
}
The actions will be the following. First, we restore (with g
) the content of the hold
space into the pattern space (where actions take place). Then we do the
replacement. And finally, we print the result.
/("/,/");/ {
/");/ {
g
s/(\("[^"]*"\))/(_(\1))/g
p
}
}
Resume
First, we have the following file main.c
.
# include <stdio.h>
void main(int argc, char ** argv) {
printf("Hello world!\n");
printf("Version 2.0\n\
You can use this program as you want.\n");
return 0;
}
Then we have the main.sed
script.
/("/,/");/!p
/("/h
/("/!H
/("/,/");/ {
/");/ {
g
s/(\("[^"]*"\))/(_(\1))/g
p
}
}
We can run the following command.
sed -f main.sed main.c
Then we should obtain the following result.
# include <stdio.h>
void main(int argc, char ** argv) {
printf(_("Hello world!\n"));
printf(_("Version 2.0\n\
You can use this program as you want.\n"));
return 0;
}
References
-
A (flashy) web page about multi-lines pattern replacement