Sort a File Based on a Column in Another File

Linux shell sort file according to the second column?

If this is UNIX:

sort -k 2 file.txt

You can use multiple -k flags to sort on more than one column. For example, to sort by family name then first name as a tie breaker:

sort -k 2,2 -k 1,1 file.txt

Relevant options from "man sort":

-k, --key=POS1[,POS2]


start a key at POS1, end it at POS2 (origin 1)

POS is F[.C][OPTS], where F is the field number and C the character position in the field. OPTS is one or more single-letter ordering options, which override global ordering options for that key. If no key is given, use the entire line as the key.

-t, --field-separator=SEP


use SEP instead of non-blank to blank transition

sort a file based on a column in another file

An awk solution: store the 2nd file in memory, then loop over the first file, emitting matching lines from the 2nd file:

awk 'FNR==NR {x2[$1] = $0; next} $1 in x2 {print x2[$1]}' second first

Implementing @Barmar's comment

join -1 2 -o "1.1 1.2 2.2 2.3" <(cat -n first | sort -k2) <(sort second) | 
sort -n |
cut -d ' ' -f 2-

note to other answerers, I tested with these files:

$ cat first
foo x y
bar x y
baz x y
$ cat second
bar x1 y1
baz x2 y2
foo x3 y3

Explanation of

awk 'FNR==NR {x2[$1] = $0; next} $1 in x2 {print x2[$1]}' second first

This part reads the 1st file in the command line paramters (here, "second"):

FNR==NR {x2[$1] = $0; next}

The condition FNR == NR will be true only for the first named file. FNR is awk's "File Record Number" variable, NR is the current record number from all input sources. The current line is stored in an associative array named x2 (not a great variable name) indexed by the first field of the record.

The next condition, $1 in x2, will only start after the file "second" has been completely read. It will look at the first field of the line in file named "first", and the action prints the corresponding line from file "second", which has been stored in the array.

Note that the order of the files in the awk command is important. Since you control the output based on the file named "first", it must be the last file processed by awk.

How to sort a file according to a column in another file?

The function sorted can receive a keyword argument called key, which is a function that returns a comparable argument for each element of the list.

If you have two lists with the File_1 columns in in one and the File_2 columns in the other, you could use:

indexes = sorted(range(len(File_2Column)), key=lambda i: File_1Col4[i])
sortedFile_2Col = [File_2Column[i] for i in indexes]
# you can repeat this line for all the columns you want to be sorted by that order

Sorting lines in one file given the order in another file

Use awk to put the line number from file2 as an extra column in front of file1. Sort the result by that column. Then remove that prefix column

awk 'FNR == NR { lineno[$1] = NR; next}
{print lineno[$1], $0;}' file2 file1 | sort -k 1,1n | cut -d' ' -f2-

Sorting data in file based on first column in another file

$ cat tst.awk
NR==FNR {
if (NR==1) {
print
}
else {
map[$1] = $0
}
next
}
{ print map[$1] }

$ awk -f tst.awk dataframe1 dataframe2
N02_M N05_F N06_M N07_F N08_F N09_M N02_M N026_F N03_M
586 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364
2237895 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225
7499 0.803 0.803 0.803 0.803 0.803 0.803 0.803 0.803 0.803
35209 0.94 0.94 0.94 0.94 0.94 0.94 0.94 0.94 0.94
2255280 0.995 0.995 0.995 0.995 0.995 0.995 0.995 0.995 0.995
7294280 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478

Sorting data based on second column of a file

You can use the key option of the sort command, which takes a "field number", so if you wanted the second column:

sort -k2 -n yourfile

-n, --numeric-sort compare according to string numerical value

For example:

$ cat ages.txt 
Bob 12
Jane 48
Mark 3
Tashi 54

$ sort -k2 -n ages.txt
Mark 3
Bob 12
Jane 48
Tashi 54

Sort a file by first (or second, or else) column in python

The problem you're having is that you're not turning each line into a list. When you read in the file, you're just getting the whole line as a string. You're then sorting by the first character of each line, and this is always the same character in your input, 'E'.

To just sort by the first column, you need to split the first block off and just read that section. So your key should be this:

for line in sorted(lines, key=lambda line: line.split()[0]):

split will turn your line into a list, and then the first column is taken from that list.



Related Topics



Leave a reply



Submit