Extract text between two strings repeatedly using sed or awk?
Using sed
:
sed -E 's/.*\/(.*)-.*/\1/' plainlinks
Output:
999999-94092
999999-94094
999999-94096
999999-94097
999999-94098
999999-94644
999999-94645
999999-94995
999999-94996
999999-96404
To save the changes to the file use the -i
option:
sed -Ei 's/.*\/(.*)-.*/\1/' plainlinks
Or to save to a new file then redirect:
sed -E 's/.*\/(.*)-.*/\1/' plainlinks > newfile.txt
Explanation:
s/ # subsitution
.* # match anything
\/ # upto the last forward-slash (escaped to not confused a sed)
(.*) # anything after the last forward-slash (captured in brackets)
- # upto a hypen
.* # anything else left on line
/ # end match; start replace
\1 # the value captured in the first (only) set of brackets
/ # end
How to use sed/grep to extract text between two words?
sed -e 's/Here\(.*\)String/\1/'
Extract data between two strings using either AWK or SED
I think the best, easiest, way is with cut
:
$ echo "xyz>someurl>xyz" | cut -d'>' -f2
someurl
With awk
can be done like:
$ echo "xyz>someurl>xyz" | awk 'BEGIN { FS = ">" } ; { print $2 }'
someurl
And with sed
is a little bit more tricky:
$ echo "xyz>someurl>xyz" | sed 's/\(.*\)>\(.*\)>\(.*\)/\2/g'
someurl
we get blocks of something1<something2<something3
and print the 2nd one.
Sed or awk. Find text between two strings + and additional identifier
Here's an awk
onliner:
awk -v sid=1630710955 '/HOOK_EV_OFF$/{flag=0;next}{if(flag && $0 ~ "SID:"sid){print}}/HOOK_EV$/{flag=1;next}' infile
Explanation:
awk -v sid=1630710955 '/HOOK_EV_OFF$/{flag=0;next} # Final pattern found --> turn off the flag and read next line
{if(flag && $0 ~ "SID:"sid){print}} # if flag and SID pattern in line print it
/HOOK_EV$/{flag=1;next} # Initial pattern found --> turn on the flag and read the next line
' infile
For a dynamic SID
extraction, you can use:
awk '/HOOK_EV_OFF$/{flag=0;SID="";next}
flag && $NF==SID
/HOOK_EV$/{flag=1;SID=$(NF-1);next}' infile
Having this input file:
2015-04-29T08:05:24.668345-04:00 test1 [S=4444] [SID:1630710955] HOOK_EV
2015-04-29T08:05:24.668345-04:00 test1 [S=4445] [SID:1630710955]
2015-04-29T08:05:24.668345-04:00 test1 [S=4445] [SID:1630710956]
2015-04-29T08:05:24.668345-04:00 test1 [S=4444] [SID:1630710955] HOOK_EV_OFF
2015-04-29T08:05:24.668345-04:00 test1 [S=4445] [SID:1630710955]
2015-04-29T08:05:24.668345-04:00 test2 [S=4444] [SID:1630710965] HOOK_EV
2015-04-29T08:05:24.668345-04:00 test2 [S=4447] [SID:1630710965]
2015-04-29T08:05:24.668345-04:00 test2 [S=4447] [SID:1630710967]
2015-04-29T08:05:24.668345-04:00 test2 [S=4444] [SID:1630710965] HOOK_EV_OFF
The output will be:
2015-04-29T08:05:24.668345-04:00 test1 [S=4445] [SID:1630710955]
2015-04-29T08:05:24.668345-04:00 test2 [S=4447] [SID:1630710965]
how to print text between two specific words using awk, sed?
Following awk
may help you here.(considering that your input to awk
will be same as shown sample only)
your_command | awk '{sub(/[^-]*/,"");sub(/ .*/,"");sub(/-/,"");print}'
Solution 2nd: With sed
solution now.
your_command | sed 's/\([^-]*\)-\([^ ]*\).*/\2/'
Solution 3rd: Using awk
's match
utility:
your_command | awk 'match($0,/[0-9]+\.[0-9]+\-[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/){print substr($0,RSTART,RLENGTH)}'
Sed to extract text between two strings
sed -n '/^START=A$/,/^END$/p' data
The -n
option means don't print by default; then the script says 'do print between the line containing START=A
and the next END
.
You can also do it with awk
:
A pattern may consist of two patterns separated by a comma; in this case, the action is performed for
all lines from an occurrence of the first pattern though an occurrence of the second.
(from man awk
on Mac OS X).
awk '/^START=A$/,/^END$/ { print }' data
Given a modified form of the data file in the question:
START=A
xxx01
xxx02
END
START=A
xxx03
xxx04
END
START=A
xxx05
xxx06
END
START=B
xxx07
xxx08
END
START=A
xxx09
xxx10
END
START=C
xxx11
xxx12
END
START=A
xxx13
xxx14
END
START=D
xxx15
xxx16
END
The output using GNU sed
or Mac OS X (BSD) sed
, and using GNU awk
or BSD awk
, is the same:
START=A
xxx01
xxx02
END
START=A
xxx03
xxx04
END
START=A
xxx05
xxx06
END
START=A
xxx09
xxx10
END
START=A
xxx13
xxx14
END
Note how I modified the data file so it is easier to see where the various blocks of data printed came from in the file.
If you have a different output requirement (such as 'only the first block between START=A and END', or 'only the last ...'), then you need to articulate that more clearly in the question.
How to select lines between two marker patterns which may occur multiple times with awk/sed
Use awk
with a flag to trigger the print when necessary:
$ awk '/abc/{flag=1;next}/mno/{flag=0}flag' file
def1
ghi1
jkl1
def2
ghi2
jkl2
How does this work?
/abc/
matches lines having this text, as well as/mno/
does./abc/{flag=1;next}
sets theflag
when the textabc
is found. Then, it skips the line./mno/{flag=0}
unsets theflag
when the textmno
is found.- The final
flag
is a pattern with the default action, which is toprint $0
: ifflag
is equal 1 the line is printed.
For a more detailed description and examples, together with cases when the patterns are either shown or not, see How to select lines between two patterns?.
How to print lines between two patterns, inclusive or exclusive (in sed, AWK or Perl)?
Print lines between PAT1 and PAT2
$ awk '/PAT1/,/PAT2/' file
PAT1
3 - first block
4
PAT2
PAT1
7 - second block
PAT2
PAT1
10 - third block
Or, using variables:
awk '/PAT1/{flag=1} flag; /PAT2/{flag=0}' file
How does this work?
/PAT1/
matches lines having this text, as well as/PAT2/
does./PAT1/{flag=1}
sets theflag
when the textPAT1
is found in a line./PAT2/{flag=0}
unsets theflag
when the textPAT2
is found in a line.flag
is a pattern with the default action, which is toprint $0
: ifflag
is equal 1 the line is printed. This way, it will print all those lines occurring from the timePAT1
occurs and up to the nextPAT2
is seen. This will also print the lines from the last match ofPAT1
up to the end of the file.
Print lines between PAT1 and PAT2 - not including PAT1 and PAT2
$ awk '/PAT1/{flag=1; next} /PAT2/{flag=0} flag' file
3 - first block
4
7 - second block
10 - third block
This uses next
to skip the line that contains PAT1
in order to avoid this being printed.
This call to next
can be dropped by reshuffling the blocks: awk '/PAT2/{flag=0} flag; /PAT1/{flag=1}' file
.
Print lines between PAT1 and PAT2 - including PAT1
$ awk '/PAT1/{flag=1} /PAT2/{flag=0} flag' file
PAT1
3 - first block
4
PAT1
7 - second block
PAT1
10 - third block
By placing flag
at the very end, it triggers the action that was set on either PAT1 or PAT2: to print on PAT1, not to print on PAT2.
Print lines between PAT1 and PAT2 - including PAT2
$ awk 'flag; /PAT1/{flag=1} /PAT2/{flag=0}' file
3 - first block
4
PAT2
7 - second block
PAT2
10 - third block
By placing flag
at the very beginning, it triggers the action that was set previously and hence print the closing pattern but not the starting one.
Print lines between PAT1 and PAT2 - excluding lines from the last PAT1 to the end of file if no other PAT2 occurs
This is based on a solution by Ed Morton.
awk 'flag{
if (/PAT2/)
{printf "%s", buf; flag=0; buf=""}
else
buf = buf $0 ORS
}
/PAT1/ {flag=1}' file
As a one-liner:
$ awk 'flag{ if (/PAT2/){printf "%s", buf; flag=0; buf=""} else buf = buf $0 ORS}; /PAT1/{flag=1}' file
3 - first block
4
7 - second block
# note the lack of third block, since no other PAT2 happens after it
This keeps all the selected lines in a buffer that gets populated from the moment PAT1 is found. Then, it keeps being filled with the following lines until PAT2 is found. In that point, it prints the stored content and empties the buffer.
Related Topics
Type Null Character in Terminal
How to Redirect the Telnet Console Logs to a File Linux
How to Convert a Linux Executable File (Binary) to Windows Exe File
Iptables Script to Block All Internet Access Except for Desired Applications
Docker Oci Runtime Create Failed: Container_Linux.Go:349: Starting Container Process Caused
Explanation of Memcpy Memmove Glibc_2.14/2.2.5
Bash Join Multiple Files with Empty Replacement (-E Option)
Does Linux Support Memory Isolation for Processes
How to Count Most Occuring Sequence of 3 Letters Within a Word with a Bash Script
Changing Color of Eclipse Links in Quick Fix or Eclipse Links in Preferences on Linux
Which Is the Best Way to Make Config Changes in Conf Files in Ansible
Getting Meteor 0.9.2 Build to Work Osx -> Linux
Ignoring Comma in Field of CSV File with Awk
What's the Purpose of the Ud2 Opcode in the Linux Kernel
Is There an Scp Variant of Mv Command
How to Find Out Where Is My Code Causing Glib-Gobject-Critical
How Many Instructions Does Linux Kernel Need in Order to Handle an Interrupt on an Arm Cortex A9