How to Extract Characters Between the Delimiters Using Sed

How to use sed/grep to extract text between two words?

sed -e 's/Here\(.*\)String/\1/'

sed extract every occurrence of string between specific delimiters

This might work for you (GNU sed):

sed -r 's/[^:]*:([^:]*):\S*(\s)*/\1\2/g' file

Sed to extract text between two strings

sed -n '/^START=A$/,/^END$/p' data

The -n option means don't print by default; then the script says 'do print between the line containing START=A and the next END.

You can also do it with awk:

A pattern may consist of two patterns separated by a comma; in this case, the action is performed for
all lines from an occurrence of the first pattern though an occurrence of the second.

(from man awk on Mac OS X).

awk '/^START=A$/,/^END$/ { print }' data

Given a modified form of the data file in the question:

START=A
xxx01
xxx02
END
START=A
xxx03
xxx04
END
START=A
xxx05
xxx06
END
START=B
xxx07
xxx08
END
START=A
xxx09
xxx10
END
START=C
xxx11
xxx12
END
START=A
xxx13
xxx14
END
START=D
xxx15
xxx16
END

The output using GNU sed or Mac OS X (BSD) sed, and using GNU awk or BSD awk, is the same:

START=A
xxx01
xxx02
END
START=A
xxx03
xxx04
END
START=A
xxx05
xxx06
END
START=A
xxx09
xxx10
END
START=A
xxx13
xxx14
END

Note how I modified the data file so it is easier to see where the various blocks of data printed came from in the file.

If you have a different output requirement (such as 'only the first block between START=A and END', or 'only the last ...'), then you need to articulate that more clearly in the question.

grep substring between two delimiters

Assuming there's no more than one occurrence per line, you can use

sed -nr 's/.*Begin(.*)End.*/\1/p'

With grep and non-greedy quantifier you could also print more than one per line.

extract text between two words using sed

grep solution (since you are only looking to find a match, you are not looking to edit anything):

$ echo "$extract"
sometext Query State: FINISHED\n Query Status: OK\n soonnnnnnnnnnnn

$ echo "$extract" | grep -oP '(?<=Query State: ).*?(?=\\n)'
FINISHED

Explanation:

-o Return only the matched substring (this will return all matches, one per line)

-P For perl-compatible regular expressions; needed for lookaround as well as lazy quantifier

(?<= ... ) lookbehind : The match should start at a position immediately following the last character (in this case, the space) between the opening sequence (?<= and the closing parenthesis.

.*? zero or more characters (any characters), as few as possible. *? is called lazy (or non-greedy) quantifier.

(?=\\n) lookahead : Similar to lookbehind. Backslash must be escaped.

EDIT:

If the "Query State: ..." fragment may appear at the very end of the string, not terminated by the \n marker, and if in that case the state must still be returned, the regular expression needs to be modified as follows:

$ echo $extract
sometext Query State: FINISHED

$ echo $extract | grep -oP '(?<=Query State: ).*?((?=\\n)|$)'
FINISHED

Notice the alternation in the lookahead: we are looking for the substring \n or the end of the input string; either one will work.

Extracting string between two slashes using sed

If you want to use sed, this would work:

~/tmp> str="directory_root /root/config/data/"
~/tmp> echo $str | sed 's|^[^/]*\(/[^/]*/\).*$|\1|'
/root/

Or a single liner (assuming directory_root literal is in the line:)

 cat file | sed -e 's|^directory_root \(/[^/]*/\).*$|\1|;tx;d;:x'

Explanation of regex in first example:

s| : using the | as the dilimiter (makes it easier to read in this case)

^ : match beginning of line

[^/]* : match all non / characters (this is greedy so it will stop when it hits the first /.

\( : start recording string 1

/ : match literal /

[^/]* : match all non / charcaters

\) : finish rcording string 1

.* : match everything else to the end of the line

| : delimitter

\1 : replace match with string 1

| : delimitter

In the second example, I appended the ;tx;d;:x which does not echo lines that do not match see here. You can then run this on the entire file, and it will only print the lines it modified.

~/tmp> echo "xx" > tmp.txt
~/tmp> echo "directory_root /root/config/data/" >> tmp.txt
~/tmp> echo "xxxx ttt" >> tmp.txt
~/tmp>
~/tmp> cat tmp.txt | sed -e 's|^directory_root \(/[^/]*/\).*$|\1|;tx;d;:x'
/root/

Sed syntax to extract all of text BEFORE last delimiter?

You can use a negated character class in your regex:

sed 's/-[^-]*$//' <<< 'scanning-client-container-0.2.tar'

scanning-client-container

RegEx Details:

  • -: Match a -
  • [^-]*: Match 0 or more characters that are not -
  • $: Match end

Extract string between combination of words and characters

Merging into one regex expression is hard here because POSIX regex does not support lazy quantifiers.

With GNU sed, you can pass the command as

sed 's/.*FROM \(.*\) as.*/\1/;s/FROM //' file

See this online demo.

However, if you have a GNU grep you can use a bit more precise expression:

#!/bin/bash
s='FROM some_registry as registry1
From another_registry'
grep -oP '(?i)\bFROM\s+\K.*?(?=\s+as\b|$)' <<< "$s"

See the online demo. Details:

  • (?i) - case insensitive matching ON
  • \b - a word boundary
  • FROM - a word
  • \s+ - one or more whitespaces
  • \K - "forget" all text matched so far
  • .*? - any zero or more chars other than line break chars as few as possible
  • (?=\s+as\b|$) - a positive lookahead that matches a location immediately followed with one or more whitespaces and then a whole word as, or end of string.


Related Topics



Leave a reply



Submit