Trying to join two text files based on the first column in both files and want to keep all the columns of the matches from the second file
I'm sure there are ways to do this is awk, but join is also relatively simple.
join -1 1 -2 1 List1.txt <(sort -k 1,1 List2.txt) > List3.txt
You are joining List1
based on the first column, and joining List2
also based on the first column. You then need to make sure the files are sorted in alphabetical order so join can work.
This produces the columns you want, separated by a whitespace.
List3.txt
action e KK SS @ n
adan a d @ n
adap a d a p
adapka a d a p k a
adat a d a t
yen j e n
Join on first column of two files
try this one-liner:
awk 'NR==FNR{a[$1]=$2;next}$1 in a{print $1,a[$1]}' file2 file1
joining two files based on first column IDs
Given:
$ cat file1
001 word1
002 word2
00n wordn1
$ cat file2
001 word3
002 word4
003 word_u1
004 word_u2
00n wordn2
(Note the extra 003 word_u1
and 004 word_u2
in file2...)
You can use join
that joins those files (as presented) together:
$ join file1 file2
001 word1 word3
002 word2 word4
00n wordn1 wordn2
If the files are not sorted (as you have presented them) you can sort first:
$ join <(sort file1) <(sort file2)
If you want to double up the digits, pipe to sed:
$ join file1 file2 | sed -nE 's/^([^[:space:]]*)/\1 \1/p'
001 001 word1 word3
002 002 word2 word4
00n 00n wordn1 wordn2
Or specify the join
output list:
$ join -o 1.1,2.1,1.2,2.2 file1 file2
001 001 word1 word3
002 002 word2 word4
00n 00n wordn1 wordn2
How to join two huge files based on the first two columns in awk/Bash programs?
With join
, sed
and bash
(Process Substitution):
join -t $'\t' -a 1 <(sed 's/\t/:/' file1.tsv) <(sed 's/\t/:/' file2.tsv) | sed 's/:/\t/' > file3.txt
This solution assumes that the first two columns are sorted together in ascending order in both files.
See: man join
join 2 files based on 1st & 2nd column of file AND 3rd & 4th column of second file
You may use this awk
:
awk 'FNR == NR {map[$1,$2] = $3; next} ($3,$4) in map {$NF = map[$3,$4]} 1' f1 f2 | column -t
3 22745180 rs12345 G C
12 67182999 rs78901 A T
A more readable version:
awk '
FNR == NR {
map[$1,$2] = $3
next
}
($3,$4) in map {
$NF = map[$3,$4]
}
1' file1 file2 | column -t
Used column -t
for tabular output only.
match values in first column of two files and join the matching lines in a new file
awk 'BEGIN {
FS = OFS = "\t"
}
NR == FNR {
# while reading the 1st file
# store its records in the array f
f[$1] = $0
next
}
$1 in f {
# when match is found
# print all values
print f[$1], $0
}' file1 file2
Compare first column of one file with the first column of second and print associated column of each if there was a match
Could you please try following.
awk 'FNR==NR{a[$1]=$2;next} ($1 in a){print $2,a[$1]}' Input_file1 Input_file2
Output will be as follows.
foo 1589.0
hi 33.7
Problem in your attempt: You was going good only thing in FNR==NR
condition your a[$1]
is NOT having any value it only created its index in array a
so that is why it was not able to print anything when 2nd Input_file is being read.
Inner join two files based on one column in unix when row names don't match with sort
We haven't seen a sample of your original gene2accession
file yet but let's assume it's a tab-separated field with accession
in the 2nd column and gene
in the 16th (since that's what your cut
is selecting) with a header line. Let's also assume that your Accessions
file isn't absolutely enormous.
Given that, this script should do what you want:
awk -F'\t' 'NR==FNR{a[$1];next} ($2 in a) && !seen[$2]++{print $2, $16}' Accessions gene2accession
but you could try this to see if it's faster:
awk -F'\t' 'NR==FNR{a[$1];next} $2 in a{print $2, $16}' Accessions <(sort -u -t'\t' -k2,2 gene2accession)
and if it is and you want an intermediate file for the output of the sort
to use in subsequent runs:
sort -u -t'\t' -k2,2 gene2accession > unq_gene2accession &&
awk -F'\t' 'NR==FNR{a[$1];next} $2 in a{print $2, $16}' Accessions unq_gene2accession
Related Topics
Distinguish .Shstrtab and .Strtab in Elf File
Linux. Sol_Netlink Not Defined
Bash Sort - How to Sort Using Timestamp
Need an Overview of Debugging Process from the Hardware Layer
How to Put the Current Running Linux Process in Background
Macros for Gcc/G++ to Differentiate Linux and MAC Osx
Linux-Shell: Renaming Files to Creation Time
Add a Newline Only If It Doesn't Exist
Given a Linux Username and a Password How to Test If It Is a Valid Account
Can't Remove a Directory in Unix
A Modification to %Esp Cause Sigsegv
New Scala Worksheets Not Evaluated in Eclipse
Set Filetype and Comment Key Map with .S File
Sed Command Working on Command Line But Not in Perl Script
What Is Difference Between Arm64 and Armhf
PDF Compare on Linux Command Line