Linux Combine Two Files by Column

Linux Combine two files by column


$ awk -v OFS='\t' '
NR==1 { print $0, "Remark1", "Remark2"; next }
NR==FNR { a[$1]=$0; next }
$1 in a { print a[$1], $2, $3 }
' Test1.txt Test2.txt
ID Name Telephone Remark1 Remark2
1 John 011 Test1 Test2
2 Sam 013 Test3 Test4
3 Jena 014 Test5 Test6
4 Peter 015 Test7 Test8

Merge Two files of columns but insert columns of second file into columns of first file

You can use a loop in awk, for example

paste file_A file_B | awk '{ 
half = NF/2;
for(i = 1; i < half; i++)
{
printf("%s %s ", $i, $(i+half));
}
printf("%s %s\n", $half, $NF);
}'

or

paste file_A file_B | awk '{ 
i = 1; j = NF/2 + 1;
while(j < NF)
{
printf("%s %s ", $i, $j);
i++; j++;
}
printf("%s %s\n", $i, $j);
}'

The code assumes that the number of columns in awk's input is even.

How to merge two .txt file in unix based on one common column. Unix

Thanks for adding your own attempts to solve the problem - it makes troubleshooting a lot easier.

This answer is a bit convoluted, but here is a potential solution (GNU join):

join -t $'\t' -1 2 -2 1 <(head -n 1 File1.txt && tail -n +2 File1.txt | sort -k2,2 ) <(head -n 1 File2.txt && tail -n +2 File2.txt | sort -k1,1)

#Sam_ID Sub_ID v1 code V3 V4
#2253734 1878372 SAMN06396112 20481 NA DNA
#2275341 1884646 SAMN06432785 20483 NA DNA
#2277481 1860945 SAMN06407597 20488 NA DNA

Explanation:

  • join uses a single character as a separator, so you can't use "\t", but you can use $'\t' (as far as I know)
  • the -1 2 and -2 1 means "for the first file, use the second field" and "for the second file, use the first field" when combining the files
  • in each subprocess (<()), sort the file by the Sam_ID column but exclude the header from the sort (per Is there a way to ignore header lines in a UNIX sort?)

Edit

To specify the order of the columns in the output (to put the Sub_ID before the Sam_ID), you can use the -o option, e.g.

join -t $'\t' -1 2 -2 1 -o 1.1,1.2,1.3,2.2,2.3,2.4 <(head -n 1 File1.txt && tail -n +2 File1.txt | sort -k2,2 ) <(head -n 1 File2.txt && tail -n +2 File2.txt | sort -k1,1)

#Sub_ID Sam_ID v1 code V3 V4
#1878372 2253734 SAMN06396112 20481 NA DNA
#1884646 2275341 SAMN06432785 20483 NA DNA
#1860945 2277481 SAMN06407597 20488 NA DNA

Merging two files with unequal lengths based on two keys in linux

Your approach is correct but while printing you need to use like A[$2,$3], you are using A[$1,$2] which is NOT existing(Because 1st, 2nd columns of file1 should be compared to 2nd and 3rd columns of file2) in array A hence its printing only current line values of file2 in your file3.

awk 'NR==FNR{a[$1,$2]=$3;next} (($2,$3) in a) {print $0, a[$2,$3]}' file1 file2

Also see link(Thanks to James for providing nice link here) Why we shouldn't use variables in capital letters

How to merge two CSV files with Linux column wise?

Use paste -d , to merge the two files and > to redirect the command output to another file:

$ paste -d , file1.csv file2.csv > output.csv

E.g.:

$ cat file1.csv
A,B

$ cat file2.csv
C,D

$ paste -d , file1.csv file2.csv > output.csv

$ cat output.csv
A,B,C,D

-d , tells paste to use , as the delimiter to join the columns.

> tells the shell to write the output of the paste command to the file output.csv

How to merge two files based on one column and print both matching and non-matching?

Assuming your real files are sorted like your samples are:

$ join -o 0,1.2,2.2 -e0 -a1 -a2 tmptest1.txt tmptest2.txt
aaa 231 222
bbb 132 0
ccc 111 0
ddd 0 132

If not sorted and using bash, zsh, ksh93 or another shell that understands <(command) redirection:

join -o 0,1.2,2.2 -e0 -a1 -a2 <(sort temptest1.txt) <(sort tmptest2.txt)

Combining two columns from different files by common strings

Using GNU awk:

awk 'NR==FNR { map[$1]=$2;next } { map1[$1]=$2 } END { PROCINFO["sorted_in"]="@ind_str_asc";for (i in map) { print i"\t"map[i]"\t"map1[i] } }' file-1 file2

Explanation:

awk 'NR==FNR { 
map[$1]=$2; # Process the first file only and set up an array called map with the first space separated field as the index and the second the value
next
}
{
map1[$1]=$2 # When processing the second file, set up an second array called map1 and use the first field as the index and the second the value.
}
END {
PROCINFO["sorted_in"]="@ind_str_asc"; # Set the index ordering
for (i in map) {
print i"\t"map[i]"\t"map1[i] # Loop through the map array and print the values along with the values in map1.
}
}' file-1 file2


Related Topics



Leave a reply



Submit