Extract Text Between Two Strings Repeatedly Using Sed or Awk

Extract text between two strings repeatedly using sed or awk?

Using sed:

sed -E 's/.*\/(.*)-.*/\1/' plainlinks

Output:

999999-94092
999999-94094
999999-94096
999999-94097
999999-94098
999999-94644
999999-94645
999999-94995
999999-94996
999999-96404

To save the changes to the file use the -i option:

sed -Ei 's/.*\/(.*)-.*/\1/' plainlinks

Or to save to a new file then redirect:

sed -E 's/.*\/(.*)-.*/\1/' plainlinks > newfile.txt

Explanation:

s/    # subsitution
.* # match anything
\/ # upto the last forward-slash (escaped to not confused a sed)
(.*) # anything after the last forward-slash (captured in brackets)
- # upto a hypen
.* # anything else left on line
/ # end match; start replace
\1 # the value captured in the first (only) set of brackets
/ # end

How to use sed/grep to extract text between two words?

sed -e 's/Here\(.*\)String/\1/'

Extract data between two strings using either AWK or SED

I think the best, easiest, way is with cut:

$ echo "xyz>someurl>xyz" | cut -d'>' -f2
someurl

With awk can be done like:

$ echo "xyz>someurl>xyz" | awk  'BEGIN { FS = ">" } ; { print $2 }'
someurl

And with sed is a little bit more tricky:

$ echo "xyz>someurl>xyz" | sed 's/\(.*\)>\(.*\)>\(.*\)/\2/g'
someurl

we get blocks of something1<something2<something3 and print the 2nd one.

Sed or awk. Find text between two strings + and additional identifier

Here's an awk onliner:

awk -v sid=1630710955 '/HOOK_EV_OFF$/{flag=0;next}{if(flag && $0 ~ "SID:"sid){print}}/HOOK_EV$/{flag=1;next}' infile

Explanation:

awk -v sid=1630710955 '/HOOK_EV_OFF$/{flag=0;next} # Final pattern found   --> turn off the flag and read next line
{if(flag && $0 ~ "SID:"sid){print}} # if flag and SID pattern in line print it
/HOOK_EV$/{flag=1;next} # Initial pattern found --> turn on the flag and read the next line
' infile

For a dynamic SIDextraction, you can use:

awk '/HOOK_EV_OFF$/{flag=0;SID="";next} 
flag && $NF==SID
/HOOK_EV$/{flag=1;SID=$(NF-1);next}' infile

Having this input file:

2015-04-29T08:05:24.668345-04:00 test1 [S=4444] [SID:1630710955] HOOK_EV
2015-04-29T08:05:24.668345-04:00 test1 [S=4445] [SID:1630710955]
2015-04-29T08:05:24.668345-04:00 test1 [S=4445] [SID:1630710956]
2015-04-29T08:05:24.668345-04:00 test1 [S=4444] [SID:1630710955] HOOK_EV_OFF
2015-04-29T08:05:24.668345-04:00 test1 [S=4445] [SID:1630710955]
2015-04-29T08:05:24.668345-04:00 test2 [S=4444] [SID:1630710965] HOOK_EV
2015-04-29T08:05:24.668345-04:00 test2 [S=4447] [SID:1630710965]
2015-04-29T08:05:24.668345-04:00 test2 [S=4447] [SID:1630710967]
2015-04-29T08:05:24.668345-04:00 test2 [S=4444] [SID:1630710965] HOOK_EV_OFF

The output will be:

2015-04-29T08:05:24.668345-04:00 test1 [S=4445] [SID:1630710955]
2015-04-29T08:05:24.668345-04:00 test2 [S=4447] [SID:1630710965]

how to print text between two specific words using awk, sed?

Following awk may help you here.(considering that your input to awk will be same as shown sample only)

your_command | awk '{sub(/[^-]*/,"");sub(/ .*/,"");sub(/-/,"");print}' 

Solution 2nd: With sed solution now.

your_command | sed 's/\([^-]*\)-\([^ ]*\).*/\2/'

Solution 3rd: Using awk's match utility:

your_command | awk 'match($0,/[0-9]+\.[0-9]+\-[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/){print substr($0,RSTART,RLENGTH)}'

Sed to extract text between two strings

sed -n '/^START=A$/,/^END$/p' data

The -n option means don't print by default; then the script says 'do print between the line containing START=A and the next END.

You can also do it with awk:

A pattern may consist of two patterns separated by a comma; in this case, the action is performed for
all lines from an occurrence of the first pattern though an occurrence of the second.

(from man awk on Mac OS X).

awk '/^START=A$/,/^END$/ { print }' data

Given a modified form of the data file in the question:

START=A
xxx01
xxx02
END
START=A
xxx03
xxx04
END
START=A
xxx05
xxx06
END
START=B
xxx07
xxx08
END
START=A
xxx09
xxx10
END
START=C
xxx11
xxx12
END
START=A
xxx13
xxx14
END
START=D
xxx15
xxx16
END

The output using GNU sed or Mac OS X (BSD) sed, and using GNU awk or BSD awk, is the same:

START=A
xxx01
xxx02
END
START=A
xxx03
xxx04
END
START=A
xxx05
xxx06
END
START=A
xxx09
xxx10
END
START=A
xxx13
xxx14
END

Note how I modified the data file so it is easier to see where the various blocks of data printed came from in the file.

If you have a different output requirement (such as 'only the first block between START=A and END', or 'only the last ...'), then you need to articulate that more clearly in the question.

How to select lines between two marker patterns which may occur multiple times with awk/sed

Use awk with a flag to trigger the print when necessary:

$ awk '/abc/{flag=1;next}/mno/{flag=0}flag' file
def1
ghi1
jkl1
def2
ghi2
jkl2

How does this work?

  • /abc/ matches lines having this text, as well as /mno/ does.
  • /abc/{flag=1;next} sets the flag when the text abc is found. Then, it skips the line.
  • /mno/{flag=0} unsets the flag when the text mno is found.
  • The final flag is a pattern with the default action, which is to print $0: if flag is equal 1 the line is printed.

For a more detailed description and examples, together with cases when the patterns are either shown or not, see How to select lines between two patterns?.

How to print lines between two patterns, inclusive or exclusive (in sed, AWK or Perl)?

Print lines between PAT1 and PAT2

$ awk '/PAT1/,/PAT2/' file
PAT1
3 - first block
4
PAT2
PAT1
7 - second block
PAT2
PAT1
10 - third block

Or, using variables:

awk '/PAT1/{flag=1} flag; /PAT2/{flag=0}' file

How does this work?

  • /PAT1/ matches lines having this text, as well as /PAT2/ does.
  • /PAT1/{flag=1} sets the flag when the text PAT1 is found in a line.
  • /PAT2/{flag=0} unsets the flag when the text PAT2 is found in a line.
  • flag is a pattern with the default action, which is to print $0: if flag is equal 1 the line is printed. This way, it will print all those lines occurring from the time PAT1 occurs and up to the next PAT2 is seen. This will also print the lines from the last match of PAT1 up to the end of the file.

Print lines between PAT1 and PAT2 - not including PAT1 and PAT2

$ awk '/PAT1/{flag=1; next} /PAT2/{flag=0} flag' file
3 - first block
4
7 - second block
10 - third block

This uses next to skip the line that contains PAT1 in order to avoid this being printed.

This call to next can be dropped by reshuffling the blocks: awk '/PAT2/{flag=0} flag; /PAT1/{flag=1}' file.

Print lines between PAT1 and PAT2 - including PAT1

$ awk '/PAT1/{flag=1} /PAT2/{flag=0} flag' file
PAT1
3 - first block
4
PAT1
7 - second block
PAT1
10 - third block

By placing flag at the very end, it triggers the action that was set on either PAT1 or PAT2: to print on PAT1, not to print on PAT2.

Print lines between PAT1 and PAT2 - including PAT2

$ awk 'flag; /PAT1/{flag=1} /PAT2/{flag=0}' file
3 - first block
4
PAT2
7 - second block
PAT2
10 - third block

By placing flag at the very beginning, it triggers the action that was set previously and hence print the closing pattern but not the starting one.

Print lines between PAT1 and PAT2 - excluding lines from the last PAT1 to the end of file if no other PAT2 occurs

This is based on a solution by Ed Morton.

awk 'flag{
if (/PAT2/)
{printf "%s", buf; flag=0; buf=""}
else
buf = buf $0 ORS
}
/PAT1/ {flag=1}' file

As a one-liner:

$ awk 'flag{ if (/PAT2/){printf "%s", buf; flag=0; buf=""} else buf = buf $0 ORS}; /PAT1/{flag=1}' file
3 - first block
4
7 - second block

# note the lack of third block, since no other PAT2 happens after it

This keeps all the selected lines in a buffer that gets populated from the moment PAT1 is found. Then, it keeps being filled with the following lines until PAT2 is found. In that point, it prints the stored content and empties the buffer.



Related Topics



Leave a reply



Submit