awk one liner select only rows based on value of a column
awk -F"\t" '$2 == "victorian" { print $1"\t"$3 }' file.in
Using awk to select rows with a specific value in column greater than x
Okay, if you have a series of files, *input.file
and you want to select those lines where $3 > 98
and then write the values to the same prefix, but with output.file
as the rest of the filename, you can use:
awk '$3 > 98 {
match (FILENAME,/input.file$/)
print $0 > substr(FILENAME,1,RSTART-1) "output.file"
}' *input.file
Which uses match
to find the index where input.file
begins and then uses substr
to get the part of the filename before that and appends "output.file"
to the substring for the final output filename.
match()
sets the RSTART
value to the index where input.file
begins in the current filename which is then used by substr
truncate the current filename at that index. See GNU awk String Functions for complete details.
For exmaple, if you had input files:
$ ls -1 *input.file
v1input.file
v2input.file
Both with your example content:
$ cat v1input.file
A chr11 98.80 83 1 0 2 84
B chr7 95.45 22 1 0 40 61
C chr7 88.89 27 0 1 46 72
D chr6 100.00 20 0 0 1 20
Running the awk
command above would results in two output files:
$ ls -1 *output.file
v1output.file
v2output.file
Containing the records where the third-field was greater than 98
:
$ cat v1output.file
A chr11 98.80 83 1 0 2 84
D chr6 100.00 20 0 0 1 20
awk: Selecting rows based on the first 3 different values of a given column
Your question is a little confusing. But if I understand right you want to print out any row that has a day of the week matching one of whatever the first 3 distinct values the script finds in the file. You could do that with awk like so
BEGIN { FS="," }
{
if(dayCount < 3 && !($2 in days)) { days[$2] = 1; ++dayCount }
if ($2 in days) { print }
}
extract rows with specific value in a specific column using bash
Please read Correct Bash and shell script variable capitalization and https://mywiki.wooledge.org/Quotes to understand some of the issues in your script and copy/paste any shell script you write into https://www.shellcheck.net/ until you get the fundamentals down.
Regarding But this produces empty files
- sure, for any give command cmd
with
for f in *; do
cmd "$f" > "out$f"
done
you're creating an output file for each input file in the shell loop so if any input file doesn't match $4==12
in your awk script (the cmd
in this case) you'll still get an output file, it'll just be empty. If you don't want that you could do:
tmp=$(mktemp)
for f in *; do
cmd "$f" > "$tmp" &&
mv -- "$tmp" "out$f"
done
and write cmd
to exit with a succ/fail status like grep
does when it finds a match (trivial in awk), or you could check the size of "$tmp"
before the mv
:
tmp=$(mktemp)
for f in *; do
cmd "$f" > "$tmp" &&
[[ -s "$tmp" ]] &&
mv -- "$tmp" "out$f"
done
You don't need a shell loop or other commands for this, though, just 1 call to awk to process all of your files at once. Using any awk in any shell on every Unix box do only this
awk -v file='random' -F'\t' '
FNR == 1 {
close(out)
f = FILENAME
sub(".*/","",f)
out = "/home/user/extra/" file "/" f
}
$4 == 12 {
print > out
}
' /home/user/data/*.txt
If you want a string instead of numeric comparison so that 12.
doesn't match 12
then do $4 == "12"
instead of $4 == 12
.
In the above file
is a poor choice of variable name to hold the name of a directory but I left it alone to avoid changing anything I didn't have to.
Filter out rows from column A based on values in column B
With an array. I assume that there are no duplicates in the first column.
awk -F ',' 'NR>1{
array[$1]++; array[$2]--
}
END{
for(i in array){ if(array[i]==1){ print i } }
}' file
As one line:
awk -F ',' 'NR>1{ array[$1]++; array[$2]-- } END{for(i in array){ if(array[i]==1){ print i } } }' file
Output:
esther@example.com
daisy@example.com
bill@example.com
Loop through files in a directory and select rows based on column value using awk for large files
Assuming:
- The input filenames have the form as "*.txt".
- The columns are separated by a tab character.
- Each of five values are compared with the target column (COL_ABC or COL_DEF) one by one and individual
result files are created according to the value. Then 15 x 5 = 75 files will be created. (If this is not what you want, please let me know.)
Then would you please try:
awk -F"\t" '
BEGIN {
values["a123"] # assign values
values["b234"]
values["c345"]
values["d456"]
values["e567"]
}
FNR==1 { # header line
for (i in values) { # loop over values
if (outfile[i] != "") close(outfile[i]) # close previous file
outfile[i] = "result_" i "_" FILENAME # filename to create
print > outfile[i] # print the header
}
abc = def = 0 # reset the indexes
for (i = 1; i <= NF; i++) { # loop over the column names
if ($i == "COL_ABC") abc = i # "COL_ABC" is found: assign abc to the index
else if ($i == "COL_DEF") def = i # "COL_DEF" is found: assign def to the index
}
next
}
{
for (i in values) {
if (abc > 0 && $abc == i || def > 0 && $def == i)
print > outfile[i] # abc_th column or def_th column matches i
}
}
' *.txt
If your 15 text files are located in the directory, e.g. /path/to/the/dir/
and you want to specify the directory as an argument, change the *.txt
in the last line to /path/to/the/dir/*.txt
.
Getting all values of various rows which have the same value in one column with awk
You may use this gnu awk
:
awk '
BEGIN {
FS = OFS = ","
printf "%s,%s,%s,%s\n", "contact","mail","count","nodes"
}
NR > 1 {
++counts[$3] # Increment count of lines.
name[$3] = $2
map[$3] = ($3 in map ? map[$3] " " : "") $1
}
END {
# Iterate over all third-column values.
PROCINFO["sorted_in"]="@ind_str_asc";
for (k in counts)
print name[k], k, counts[k], map[k]
}
' test-file.csv
Output:
contact,mail,count,nodes
Dieter,dieter@anything.com,1,CCCC
Hans,hans@anything.com,2,BBBB CCDDA
Peter,peter@anything.com,2,AAAA ABABA
how to select rows with max value based on one column and group by second column using awk?
Credits to sofan's
idea over in the comments, with some extra manipulations the below logic will do the trick as the OP wants.
sort -r -n -k2 < file| awk '!x[$1 FS $3]++' | sort -k1
- Idea is to sort the file in ascending order first (
sort -n -k2
) and reverse it (-r
) on column 2 (which now will be descending order) awk '!x[$1 FS $3]++
does the grouping of that content unique by column 1 and 3 andsort -k1
sorts/groups it by column 1 contents
Select row and element in awk
To print the second line:
awk 'FNR == 2 {print}'
To print the second field:
awk '{print $2}'
To print the third field of the fifth line:
awk 'FNR == 5 {print $3}'
Here's an example with a header line and (redundant) field descriptions:
awk 'BEGIN {print "Name\t\tAge"} FNR == 5 {print "Name: "$3"\tAge: "$2}'
There are better ways to align columns than "\t\t" by the way.
Use exit
to stop as soon as you've printed the desired record if there's no reason to process the whole file:
awk 'FNR == 2 {print; exit}'
Filtering Rows Based On Number of Columns with AWK
You need to use the NF
(number of fields) variable to control the actions, such as in the following transcript:
$ echo '0333 foo
> bar
> 23243 qux' | awk 'NF==2{print}{}'
0333 foo
23243 qux
This will print the line if the number of fields is two, otherwise it will do nothing. The reason I have the (seemingly) strange construct NF==2{print}{}
is because some implementations of awk
will print by default if no rules are matched for a line. The empty command {}
guarantees that this will not happen.
If you're lucky enough to have one of those that doesn't do this, you can get away with:
awk 'NF==2'
but the first solution above will work in both cases.
Related Topics
Current Linux Kernel Debugging Techniques
How to Sleep in the Linux Kernel Space
Understanding Bash Short-Circuiting
Exclude .Svn Directories from Grep
Centos Through a Vm - No Urls in Mirrorlist
Exclude List of Files from Find
Does Ldd Also Show Dependencies of Dependencies
Linux Serial Port Listener and Interpreter
Saving Gmon.Out Before Killing a Process
Swift on Os X Compiling for Linux
Arch Linux - Apt-Get Update Equivalent Command
Best Approach to Writing a Generic Installer for a Linux Application
How to Get Started Developing on *Nix
Can't Su to User Jenkins After Installing Jenkins
Convert All File Extensions to Lower-Case
Remove Line of Text from Multiple Files in Linux
How Is the Microsecond Time of Linux Gettimeofday() Obtained and What Is Its Accuracy
How to Get the Physical Address from the Logical One in a Linux Kernel Module