How to Extract Patterns Form a Text Files in Shell Bash

How to extract patterns form a text files in shell bash

Do you really need sed? You could use cut:

cut -d. -f2 filename

How to find/extract a pattern from a file?

Try using grep

v=$(grep -oE '\bb[0-9]{3}\b' file)

This will seach for a word starting with b followed by '3' digits.

regex101 demo


Using sed

v=$(sed -nr 's/.*\b(b[0-9]{3})\b.*/\1/p' file)

extract words matching a pattern and print character length

With awk

awk '{for (i=1;i<=NF;i++) if ($i~/abc.smart/) print $i,length($i)}' file

You can run it directly on the first file. Output:

"abc.smartxyz" 14
abc.smartabc 12
"https://abc.smart/strings" 27

How to extract a string after matching characters from a variable in shell script

Try:

sed '2{ s/^classB = \(AB[^ ]*\) *$/\1/;q } ;d' your_fileName
  • 2 is the line number.

    • { open a sed group command.

      • s/ substitute below match
        • ^ is anchor for beginning of the line
        • \(...\) is known a capture group with \1 as its back-reference
        • [^ ]* means any character but not a space
        • \(AB[^ ]*\) capture AB followed by anything until first space seen but not spaces (back-reference is \1)
        • * means zero-or-more spaces
        • $ is anchor for end of the line
      • / with below
        • \1 back-reference of above capture group
      • / end of substitution
      • q quit to avoid reading rest of the file unnecessarily
    • } close group command.
  • d delete any other lines before seen line number 2.

get into variable:

your_variableName=$(sed '2{ s/^classB = \(AB[^ ]*\) *$/\1/;q } ;d' your_fileName)

how to extract consecutive pattern using bash script

You may use this awk:

awk '/^OAA-/ {if (dt) print "\n" dt; print; dt=""} /[0-9]{4}-/ {dt=$0} ' file

2021-04-27T05:30:13.292507-04:00
OAA-06512: at "PATCH", line 001

2021-05-27T05:30:13.292507-04:00
OAA-06513: at "PATCH", line 002

Extract string based on pattern in shell script

You may try this shorter awk:

awk '{gsub(/^.*,"|",.*/, "")} 1' file

MxMonitor_Marvel_PI49
alert_manager
MxMonitor_Marvel_PI49

So similar sed:

sed -E 's/^.*,"|",.*//g' file

How to extract content between two patterns in Unix

Given the need to handle multiple lines, you can choose sed, or awk, or one of the more complex scripting languages like Perl or Python.

With a bit of care, sed is adequate. I created a file script.4 (having created script, script2, and losing most of what little hair was left on my head**, and restarting with script.1, script.2 and script.3, which were deliberately incomplete) like this:

/from.*where/  { s/.*from *//; s/ *where.*//;          p; n; }
/from/,/where/ { s/.*from *//; s/ *where.*//; /^ *$/d; p; }

And I created a test file, data, like this:

select * from emp where empid=1;  

select *
from dep
where jkdsfj

select *
from sal
where jkdsfj

select elephants
from abject poverty
join flying tigers
where abelone = shellfish;

select mouse
from toolset
join animals where tail = cord
and buttons = legs

and ran the command like this, to get the output shown:

$ sed -n -f script.4 data
emp
dep
sal
abject poverty
join flying tigers
toolset
join animals
$

The script is 'simple'. For lines which contain both from and where, delete everything up to the from (plus any spaces after it), delete everything from the where onward (plus any spaces before it), print what's left, and go to the next line of input.

Otherwise, between a line which contains from and a line that contains where,
delete everything up to the from (plus any spaces after it), delete everything from the where onward (plus any spaces before it), if the line is empty, delete it; otherwise print it. Note that adding an n command to the second line makes the script misbehave (I need to spend time working out why), but the delete operation can be added to the first command line without doing any harm (if a line contains from where, nothing is printed).

Note that there are many SELECT statements that would be mishandled by this code.

For example:

SELECT *
FROM Table1 AS T1
JOIN (SELECT T2.A, T3.B
FROM Table2 AS T2
JOIN Table3 AS T3 ON T2.PK = T3.FK
WHERE T2.ColumnN > T3.ColumnM
) AS T4
ON T1.A = T4.B
WHERE T1.DateOfBirth > DATE(2000-01-01)

Quite apart from the upper-case keywords, the WHERE in the sub-query would be where the matching between FROM and WHERE stopped.


** In case you're curious about the cause of hair loss, look at Why does an n instead of a b or d or nothing change the behaviour of sed in this script?.

linux: extract pattern from file

I think awk is better suited for this task:

$ awk  '{for (i=1;i<=NF;i++){if ($i ~ /ref\|/){print $i}}}' FS='( )|(,)' infile
ref|name3
ref|name4
ref|name5
ref|name6

FS='( )|(,)' sets a multile FS to itinerate columns by , and blank spaces, then prints the column when it finds the ref pattern.



Related Topics



Leave a reply



Submit