how to extract text which matches particular fields in text file using linux commands
GNU sed
sed -r '/title.*java/I!d;s/.*:.(.*).}$/\1/' file
java cook book.pdf
Java book.pdf
Extract text from each line from a multiple-line text file based on a condition, Linux
1st solution: To get your shown expected sample output you need not to first substitute -
to -
and then print, we can use power of awk
here to create multiple field separators and then print needed value accordingly.
awk -F'-|_' '{print $2}' Input_file
Explanation: Simple explanation of above awk
program would be, making _
and -
as field separators for whole Input_file then printing 2nd field/column in it.
2nd solution: Using sed
solution, using sed
's back reference capability here.
sed -E 's/^[^-]*-([^_]*).*/\1/' Input_file
Explanation: Using sed
's -E
option here to enable ERE(extended regular expression) here. In main program of sed
then from starting of value till 1st occurrence of -
matching it and then creating 1st back reference(temp location in memory to be retrieved later on while performing substitution) and then matching anything till last of value. While substitution, substituting whole line value with only matched value to get desired results.
3rd solution: Using GNU grep
here. Using GNU grep
's -oP
options here to enable PCRE regex engine in this program. In main program matching everything from starting to till - and forgetting that match with \k
option of GNU grep
. Then matching everything just before -
and printing it.
grep -oP '^.*?-\K[^_]*' Input_file
extract text from txt file in linux using grep when there is ambguity
You can use grep -w
to match a word:
grep -iw 'tony' file
name Tony Mcgill
Alternatively use word boundary in your grep
:
grep -i '\<tony\>' file
OR:
grep -i '\btony\b' file
Extract a property value from a text file
Sed is better at simple matching tasks:
sed -n 's/.*committed=\([0-9]*\).*/\1/p' input_file
How to use sed/grep to extract text between two words?
sed -e 's/Here\(.*\)String/\1/'
How to extract a string after matching characters from a variable in shell script
Try:
sed '2{ s/^classB = \(AB[^ ]*\) *$/\1/;q } ;d' your_fileName
2
is the line number.{
open a sed group command.s/
substitute below match^
is anchor for beginning of the line\(...\)
is known a capture group with\1
as its back-reference[^ ]*
means any character but not a space\(AB[^ ]*\)
capture AB followed by anything until first space seen but not spaces (back-reference is\1
)*
means zero-or-more spaces$
is anchor for end of the line
/
with below\1
back-reference of above capture group
/
end of substitutionq
quit to avoid reading rest of the file unnecessarily
}
close group command.
d
delete any other lines before seen line number 2.
get into variable:
your_variableName=$(sed '2{ s/^classB = \(AB[^ ]*\) *$/\1/;q } ;d' your_fileName)
extracting lines if the first field matches another list saved in a different file -- shell command
If order doesn't matter then
awk 'FNR==NR{ arr[$1]; next }$1 in arr' file1 file2
Explanation
FNR==NR{ arr[$1]; next }
Here we read first file (file1
),arr
is array, whose index key being first field$1
.$1 in arr
we read second file ( file2), if arrayarr
which was created while reading first file, has index key which is second file's first column ($1 in arr
gives true, if index key exists), then print current record/row/line from file2
Test Results:
akshay@db-3325:/tmp$ cat file1
Allie
Bob
John
Laurie
akshay@db-3325:/tmp$ cat file2
Laurie 45 56 6 75
Moxipen 10 45 56 56
Allie 45 56 67 23
akshay@db-3325:/tmp$ awk 'FNR==NR{ arr[$1]; next }$1 in arr' file1 file2
Laurie 45 56 6 75
Allie 45 56 67 23
How can I extract a predetermined range of lines from a text file on Unix?
sed -n '16224,16482p;16483q' filename > newfile
From the sed manual:
p -
Print out the pattern space (to the standard output). This command is usually only used in conjunction with the -n command-line option.n -
If auto-print is not disabled, print the pattern space, then, regardless, replace the pattern space with the next line of input. If
there is no more input then sed exits without processing any more
commands.q -
Exitsed
without processing any more commands or input.
Note that the current pattern space is printed if auto-print is not disabled with the -n option.
and
Addresses in a sed script can be in any of the following forms:
number
Specifying a line number will match only that line in the input.An address range can be specified by specifying two addresses
separated by a comma (,). An address range matches lines starting from
where the first address matches, and continues until the second
address matches (inclusively).
Extracting Information of some columns from a large file based on ID in the another file
You can certainly do this with AWK only, reading the data into a hash-table and testing if your field is in the table, but I find this heuristic much easier:
fgrep -wf ids.txt data.txt | awk '{ print $1, $2, $4, $5, $8, $9 }'
This tells grep to use the data in ids.txt
as patterns in data.txt
. Then, with AWK, we filter the desired columns.
Related Topics
Embedded Linux Poll() Returns Constantly
Pipe Tar Extract into Tar Create
Using for Loop to Move Files from Subdirectories to Parent Directories
Grunt Karma Testing on Vagrant When Host Changes Sources Grunt/Karma Doesn't Detect It
How to Modify The Linux Kernel to Change The Version String That Uname Returns
When to Use Linux Kernel Add_Timer Vs Queue_Delayed_Work
Docker: Permission Denied to Local MySQL Volume
Installing a Fully Functional Postgis 2.0 on Ubuntu Linux Geos/Gdal Issues
Changing /Proc/Sys/Kernel/Core_Pattern File Inside Docker Container
Time Taken by 'Less' Command to Show Output
Echo 'The Character - (Dash) in The Unix Command Line