The Concept of 'Hold Space' and 'Pattern Space' in Sed

The Concept of 'Hold space' and 'Pattern space' in sed

When sed reads a file line by line, the line that has been currently read is inserted into the pattern buffer (pattern space). Pattern buffer is like the temporary buffer, the scratchpad where the current information is stored. When you tell sed to print, it prints the pattern buffer.

Hold buffer / hold space is like a long-term storage, such that you can catch something, store it and reuse it later when sed is processing another line. You do not directly process the hold space, instead, you need to copy it or append to the pattern space if you want to do something with it. For example, the print command p prints the pattern space only. Likewise, s operates on the pattern space.

Here is an example:

sed -n '1!G;h;$p'

(the -n option suppresses automatic printing of lines)

There are three commands here: 1!G, h and $p. 1!G has an address, 1 (first line), but the ! means that the command will be executed everywhere but on the first line. $p on the other hand will only be executed on the last line. So what happens is this:

  1. first line is read and inserted automatically into the pattern space
  2. on the first line, first command is not executed; h copies the first line into the hold space.
  3. now the second line replaces whatever was in the pattern space
  4. on the second line, first we execute G, appending the contents of the hold buffer to the pattern buffer, separating it by a newline. The pattern space now contains the second line, a newline, and the first line.
  5. Then, h command inserts the concatenated contents of the pattern buffer into the hold space, which now holds the reversed lines two and one.
  6. We proceed to line number three -- go to the point (3) above.

Finally, after the last line has been read and the hold space (containing all the previous lines in a reverse order) have been appended to the pattern space, pattern space is printed with p. As you have guessed, the above does exactly what the tac command does -- prints the file in reverse.

Understanding sed hold-space work-flow

You almost have it. Here is what your script does:

/[0-9]+/h     # if line contains a number, save the line to hold space
x # swap content of pattern space and hold space
$p # when on the last line print pattern space

You save the line to hold space then swap it back to pattern space. The contents of pattern space and hold space can be illustrated like this:

Line      Command     Pattern Space       Hold Space
~~~~ ~~~~~~~~~~~ ~~~~~~~~~~~~~ ~~~~~~~~~~
1 /[0-9]+/h dog
1 x dog
2 /[0-9]+/h lion 34 lion 34
2 x lion 34 lion 34
3 /[0-9]+/h elephant lion 34
3 x lion 34 elephant
4 /[0-9]+/h tiger 7 tiger 7
4 x tiger 7 tiger 7
.
.
.
$ /[0-9]+/h cat geopard
$ x geopard cat
$ p geopard cat

What you really want is to only swap contents when the last line of the input file is reached. You can do this by grouping the x and p commands:

gsed -n -r '/[0-9]+/h; $ {x;p}' testfile

Output:

hippo 9991

The corresponding pattern space and hold space sequence is now:

Line      Command     Pattern Space       Hold Space
~~~~ ~~~~~~~~~~~ ~~~~~~~~~~~~~ ~~~~~~~~~~
1 /[0-9]+/h dog
2 /[0-9]+/h lion 34 lion 34
3 /[0-9]+/h elephant lion 34
4 /[0-9]+/h tiger 7 tiger 7
.
.
.
$ /[0-9]+/h cat hippo 9991
$ x hippo 9991 cat
$ p hippo 9991 cat

The pattern space and hold space of the Sed utility has an initialized value of null or empty string?

I think the answer is that the p command, like the default print action, is actually adding a newline to the end of the empty pattern space. This is based on this little snippet from the GNU sed documentation (just below that bit you quote, by the way):

sed operates by performing the following cycle on each line of input: first, sed reads one line from the input stream, removes any trailing newline, and places it in the pattern space.

... blah, blah blah ...

When the end of the script is reached, unless the -n option is in use, the contents of pattern space are printed out to the output stream, adding back the trailing newline if it was removed.

In other words, the line being held in the pattern (and hold) space does not have the trailing newline - the aa line is held as aa rather than aa<newline>.

Of course, the hold space may still contain multiple lines but that just means that executing the H command on the first two lines of your file will give you a hold space containing aa<newline>bb, not aa<newline>bb<newline>.

sed: match the hold space against the pattern space

This is not sed, but some like this?

echo "foo: one foo three" | awk -F": " '$2~$1 {print $2}'
one foo three

bash: How to use sed's hold and pattern space to dynamically swap list content?

Stop using sed to modify a multi-line string, and use an array instead.

array=(
"Line 5"
"Line 4"
"Line 3"
"Line 2"
"Line 1"
)

counter=1

change_selection () {
src=$((${#array[@]} - 1 ))
dest=$((src - counter))
tmp="${array[dest]}"
array[dest]=${array[-1]}
array[-1]=$tmp
((counter++))
}

Does deleting sed pattern space with 'd' erase hold space as well?

I guess it's related to the following line in man sed:

d Delete pattern space. Start next cycle.

The following works as expected:

$ echo -e "foo\nbar" | sed -n 'h; s/.*//; g; p'
foo
bar

Sorry for bothering you guys.

At what stage is sed's pattern space printed?

An address is a way of selecting lines. Lines can be selected using zero, one or two addresses. This has nothing to do with the capacity of pattern space.

Consider the following input file:

aaa
bbb
ccc
ddd
eee

This sed command has zero addresses, so it processes every line:

s/./X/

Result:

Xaa
Xbb
Xcc
Xdd
Xee

This command has one address, it selects only the third line:

3s/./X/

Result:

aaa
bbb
Xcc
ddd
eee

An address of $ as in $s/./X/ would function the same way, but for the last line (regardless of the number of lines).

Here is a two-address command. In this case, it selects the lines based on their content. A single address command can do this, too.

/b/,/d/s/./X/

Result:

aaa
Xbb
Xcc
Xdd
eee

Pattern space is printed when given an explicit p or P command or when the script is complete for the current line of the input file (which includes ending the processing of the file with the q command) if the -n (suppress automatic printing) option is not in place.

Here's a demonstration of sed printing each line immediately upon receiving and processing it:

for i in {1..3}; do echo aaa$i; sleep 2; done | sed 's/./X/'

The capacity of pattern space (and hold space) has to do with the number of characters it can hold (and is implementation dependent) rather than the number of input lines. The newlines separating those lines are simply another character in that total. The G command simply appends a copy of hold space onto the end of what's in pattern space. Multiple applications of the G command appends that many copies.

In the tutorial that you linked to, the statement "The maximum number of addresses is two." is somewhat ambiguous. What that indicates is that you can use zero, one or two addresses to select lines to apply that command to. As in the above examples, you could apply G to all lines, one line or a range of lines. Each command can accept zero, zero or one, or zero, one, or two addresses. See man sed under the Synopsis section for sub headings that group the commands by the number of addresses they accept.

From info sed:

3.1 How `sed' Works

'sed' maintains two data buffers: the active pattern space, and the
auxiliary hold space. Both are initially empty.

'sed' operates by performing the following cycle on each lines of
input: first, 'sed' reads one line from the input stream, removes any
trailing newline, and places it in the pattern space. Then commands
are executed; each command can have an address associated to it:
addresses are a kind of condition code, and a command is only executed
if the condition is verified before the command is to be executed.

When the end of the script is reached, unless the '-n' option is in
use, the contents of pattern space are printed out to the output
stream, adding back the trailing newline if it was removed.(1) Then the
next cycle starts for the next input line.

Unless special commands (like 'D') are used, the pattern space is
deleted between two cycles. The hold space, on the other hand, keeps
its data between cycles (see commands 'h', 'H', 'x', 'g', 'G' to move
data between both buffers).

What actually the meaning of -n in sed?

Just try a sed do-nothing:

sed '' file

and

sed -n '' file

First will print whole file but second will NOT print anything.



Related Topics



Leave a reply



Submit