Replace multi-lines pattern with sed

Introduction

Let’s define this problem by an example. This is the kind of text I want to modify.

# include <stdio.h>
void main(int argc, char ** argv) {
	printf("Hello world!\n");
	printf("Version 2.0\n\
			You can use this program as you want.\n");
	return 0;
}

This is a C program but I want to use gettext to provide translated version of this program easily. Usually, the solution is to replace each string by a call to a function (the string being the key). I will not enter in details of gettext here but to resume, we want something like this as the result.

# include <stdio.h>
void main(int argc, char ** argv) {
	printf(_("Hello world!\n"));
	printf(_("Version 2.0\n\
				You can use this program as you want.\n"));
	return 0;
}

Note that each string has been nested into the function _() wich is the usual convention for gettext. If you want the quick solution, go directly to the end of this article.

Brief introduction to sed

To understand the explanations that will follow, you must at least understand the basics from sed. You can either read this section but I strongly encourage to read the official sed documentation which is well-written.

Spaces

The first thing is that sed will parse the lines one by one. Each line parsed is put inside the pattern space, then actions can be done on it, and finally the line is printed.

However, there is also a hold space where you can store informations. And if you read the sed documentation, you’ll see that there is actions which can copy from one space to the other. This will be fundamental for the multi-line pattern replacement since sed is mostly for line-by-line edition. Keep that in mind.

sed with a script file

There is two main way of running sed. You can either give arguments to the command-line or give a sed script. For simple commands, you may want to use command-line since it’s faster than creating a script file then giving it as an argument to sed. However, for complex commands, you may want to work in a script file, test it, improve it, correct it. For example, lets do a simple replacement/deletion and see how we can run it in one or the other way.

With command-line, you’ll do the replacement this way

	sed 's/PATTERN/TEXTIWANT/' -e '/OTHERPATTERN/d' my_file.txt

You can add multiple actions by appending -e for each new action.

For the script-file solution, you would create a file my_file.sed for example that will contains the actions.

	s/PATTERN/TEXTIWANT/
	/OTHERPATTERN/d

Note that the quotes are not needed in the script file. Now, you can run the script.

	sed -f my_file.sed my_file.txt

Actions on target

In sed, most of the possible actions can be restricted to specific lines. For example, in the previous section, we used the d action to delete lines. This action has been prefixed with the pattern that should trigger this action.

  • n will look for the n th line of the input text

  • $ refers to the last line of the input text

  • /PATTERN/ will look for lines containing PATTERN

You can also select ranges by using 2 of the previous possibilities separated with a comma. For example, the following will select and delete any line between a line containing BPATTERN and a line containing EPATTERN.

	/BPATTERN/,/EPATTERN/d

Group actions

Sometimes, you need to apply more than one action to the same selected line. In this case, you can nest the actions into { and }. For example, if between BPATTERN and EPATTERN, there is line you want to delete (line containing BAR) and others you want to modify (change PATT into OUT).

	/BPATTERN/,/EPATTERN/ {
		/BAR/d
		s/PATT/OUT/g
	}

Multi-lines pattern

Now, you should have the basics to understand what will follow. These are the steps to follow to make it possible:

  • Print all unconcerned line as it is

  • Put the line that contains the beginning of our pattern into the hold space

  • Append all lines that are not the beginning of our pattern into the hold space

  • Operate multiple operations only on our pattern and when the last line of the pattern is reached

    • Copy what’s in hold space into the pattern space (should contain all the lines we need)

    • Do the replacement

    • Print the result

The pattern

The pattern is simple, we know where it begins and when it ends. The start of the pattern is /("/ and the end is /");/. Our pattern can be defined as a range of line like the following.

	/("/,/");/

Print lines not in the pattern

We know how to select our pattern. However, we don’t know how to select all lines that are not in the pattern. sed provides us a way to do that with the character !. Prefixing an action with this character will apply the actions on all lines that are not the selected pattern or range.

The action to print the current line is p. Printing the lines that are not in our pattern can be done like the following.

	/("/,/");/!p

Filling the hold space

To fill the hold space, there is 2 possible actions (there is a third one, x but we will not discuss it):

  • h will replace the hold space with the content of the pattern space (usually the current line).

  • H will append to the hold space what’s in the pattern space

When we reach the beginning of our pattern, we will call the h action. Else, we will append the current line to the hold space with H.

	/("/h
	/("/!H

In our example of 7 lnies, the hold space will successively have the following values (you may use g then l to see it).

  1. # include <stdio.h>

  2. # include <stdio.h>\nvoid main(int argc, char ** argv) {

  3. \tprintf("Hello world!\\n");

  4. \tprintf("Version 2.0\\n\

  5. \tprintf("Version 2.0\\n\\\n\t\tYou can use this program as you want.\\n");

  6. \tprintf("Version 2.0\\n\\\n\t\tYou can use this program as you want.\\n");\n\treturn 0;

  7. \tprintf("Version 2.0\\n\\\n\t\tYou can use this program as you want.\\n");\n\treturn 0;\n}

Note that the backslash and the tabulation has been escaped with a backslash.

Do the replacement

Now, we want do the replacement when the last line of our pattern is reached. To confine this modification, we first nest all the following operations into the desired range.

	/("/,/");/ {
		...
	}

Then, when we reach the end of our pattern, we create a new block where the actions will take place.

	/("/,/");/ {
		/");/ {
			...
		}
	}

The actions will be the following. First, we restore (with g) the content of the hold space into the pattern space (where actions take place). Then we do the replacement. And finally, we print the result.

	/("/,/");/ {
		/");/ {
			g
			s/(\("[^"]*"\))/(_(\1))/g
			p
		}
	}

Resume

First, we have the following file main.c.

# include <stdio.h>
void main(int argc, char ** argv) {
	printf("Hello world!\n");
	printf("Version 2.0\n\
			You can use this program as you want.\n");
	return 0;
}

Then we have the main.sed script.

/("/,/");/!p
/("/h
/("/!H
/("/,/");/ {
	/");/ {
		g
		s/(\("[^"]*"\))/(_(\1))/g
		p
	}
}

We can run the following command.

	sed -f main.sed main.c

Then we should obtain the following result.

# include <stdio.h>
void main(int argc, char ** argv) {
	printf(_("Hello world!\n"));
	printf(_("Version 2.0\n\
				You can use this program as you want.\n"));
	return 0;
}

References

links

social