How to Extract Text Which Matches Particular Fields in Text File Using Linux Commands

how to extract text which matches particular fields in text file using linux commands

GNU sed

sed -r '/title.*java/I!d;s/.*:.(.*).}$/\1/' file

java cook book.pdf
Java book.pdf

Extract text from each line from a multiple-line text file based on a condition, Linux

1st solution: To get your shown expected sample output you need not to first substitute - to - and then print, we can use power of awk here to create multiple field separators and then print needed value accordingly.

awk -F'-|_' '{print $2}' Input_file

Explanation: Simple explanation of above awk program would be, making _ and - as field separators for whole Input_file then printing 2nd field/column in it.




2nd solution: Using sed solution, using sed's back reference capability here.

sed -E 's/^[^-]*-([^_]*).*/\1/' Input_file

Explanation: Using sed's -E option here to enable ERE(extended regular expression) here. In main program of sed then from starting of value till 1st occurrence of - matching it and then creating 1st back reference(temp location in memory to be retrieved later on while performing substitution) and then matching anything till last of value. While substitution, substituting whole line value with only matched value to get desired results.




3rd solution: Using GNU grep here. Using GNU grep's -oP options here to enable PCRE regex engine in this program. In main program matching everything from starting to till - and forgetting that match with \k option of GNU grep. Then matching everything just before - and printing it.

grep -oP '^.*?-\K[^_]*' Input_file

extract text from txt file in linux using grep when there is ambguity

You can use grep -w to match a word:

grep -iw 'tony' file
name Tony Mcgill

Alternatively use word boundary in your grep:

grep -i '\<tony\>' file

OR:

grep -i '\btony\b' file

Extract a property value from a text file

Sed is better at simple matching tasks:

sed -n 's/.*committed=\([0-9]*\).*/\1/p' input_file

How to use sed/grep to extract text between two words?

sed -e 's/Here\(.*\)String/\1/'

How to extract a string after matching characters from a variable in shell script

Try:

sed '2{ s/^classB = \(AB[^ ]*\) *$/\1/;q } ;d' your_fileName
  • 2 is the line number.

    • { open a sed group command.

      • s/ substitute below match
        • ^ is anchor for beginning of the line
        • \(...\) is known a capture group with \1 as its back-reference
        • [^ ]* means any character but not a space
        • \(AB[^ ]*\) capture AB followed by anything until first space seen but not spaces (back-reference is \1)
        • * means zero-or-more spaces
        • $ is anchor for end of the line
      • / with below
        • \1 back-reference of above capture group
      • / end of substitution
      • q quit to avoid reading rest of the file unnecessarily
    • } close group command.
  • d delete any other lines before seen line number 2.

get into variable:

your_variableName=$(sed '2{ s/^classB = \(AB[^ ]*\) *$/\1/;q } ;d' your_fileName)

extracting lines if the first field matches another list saved in a different file -- shell command

If order doesn't matter then

awk 'FNR==NR{ arr[$1]; next }$1 in arr' file1 file2

Explanation

  • FNR==NR{ arr[$1]; next } Here we read first file (file1), arr is array, whose index key being first field $1.
  • $1 in arr we read second file ( file2), if array arr which was created while reading first file, has index key which is second file's first column ($1 in arr gives true, if index key exists), then print current record/row/line from file2

Test Results:

akshay@db-3325:/tmp$ cat file1
Allie
Bob
John
Laurie

akshay@db-3325:/tmp$ cat file2
Laurie 45 56 6 75
Moxipen 10 45 56 56
Allie 45 56 67 23

akshay@db-3325:/tmp$ awk 'FNR==NR{ arr[$1]; next }$1 in arr' file1 file2
Laurie 45 56 6 75
Allie 45 56 67 23

How can I extract a predetermined range of lines from a text file on Unix?

sed -n '16224,16482p;16483q' filename > newfile

From the sed manual:

p -
Print out the pattern space (to the standard output). This command is usually only used in conjunction with the -n command-line option.

n -
If auto-print is not disabled, print the pattern space, then, regardless, replace the pattern space with the next line of input. If
there is no more input then sed exits without processing any more
commands.

q -
Exit sed without processing any more commands or input.
Note that the current pattern space is printed if auto-print is not disabled with the -n option.

and

Addresses in a sed script can be in any of the following forms:

number
Specifying a line number will match only that line in the input.

An address range can be specified by specifying two addresses
separated by a comma (,). An address range matches lines starting from
where the first address matches, and continues until the second
address matches (inclusively).

Extracting Information of some columns from a large file based on ID in the another file

You can certainly do this with AWK only, reading the data into a hash-table and testing if your field is in the table, but I find this heuristic much easier:

fgrep -wf ids.txt data.txt | awk '{ print $1, $2, $4, $5, $8, $9 }'

This tells grep to use the data in ids.txt as patterns in data.txt. Then, with AWK, we filter the desired columns.



Related Topics



Leave a reply



Submit