Inner join on two text files
Should not the file2 contain LUA
at the end?
If yes, you can still use join
:
join -t'|' -12 <(sort -t'|' -k2 file1) file2
Inner join two files based on one column in unix when row names don't match with sort
We haven't seen a sample of your original gene2accession
file yet but let's assume it's a tab-separated field with accession
in the 2nd column and gene
in the 16th (since that's what your cut
is selecting) with a header line. Let's also assume that your Accessions
file isn't absolutely enormous.
Given that, this script should do what you want:
awk -F'\t' 'NR==FNR{a[$1];next} ($2 in a) && !seen[$2]++{print $2, $16}' Accessions gene2accession
but you could try this to see if it's faster:
awk -F'\t' 'NR==FNR{a[$1];next} $2 in a{print $2, $16}' Accessions <(sort -u -t'\t' -k2,2 gene2accession)
and if it is and you want an intermediate file for the output of the sort
to use in subsequent runs:
sort -u -t'\t' -k2,2 gene2accession > unq_gene2accession &&
awk -F'\t' 'NR==FNR{a[$1];next} $2 in a{print $2, $16}' Accessions unq_gene2accession
How to I selectively merge two text files using R?
Read the two files with '='
as separator so you have files with two columns. Keep rows in file2
which has the first column (V1
) present in file1
. Write the result
back to a new text file if needed.
file1 <- read.table('file1.txt', sep = '=', quote = '')
file2 <- read.table('file2.txt', sep = '=', quote = '')
result <- file2[file2$V1 %in% file1$V1, ]
To include all the rows in file1
irrespective if they are present in file2
you may try the join approach.
library(dplyr)
inner_join(file1 %>% select(-any_of('V2')), file2, by = 'V1') %>%
bind_rows(anti_join(file1, file2, by = 'V1')) %>%
data.frame() -> result
Write the result :
write.table(result, 'result.txt', sep = '=', col.names = FALSE, row.names = FALSE, quote = FALSE)
How to join two text files with python?
Use itertools.izip
to combine the lines from both the files, like this
from itertools import izip
with open('res.txt', 'w') as res, open('in1.txt') as f1, open('in2.txt') as f2:
for line1, line2 in izip(f1, f2):
res.write("{} {}\n".format(line1.rstrip(), line2.rstrip()))
Note: This solution will write lines from both the files only until either of the files exhaust. For example, if the second file contains 1000 lines and the first one has only 2 lines, then only two lines from each file are copied to the result. In case you want lines from the longest file even after the shortest file exhausts, you can use itertools.izip_longest
, like this
from itertools import izip_longest
with open('res.txt', 'w') as res, open('in1.txt') as f1, open('in2.txt') as f2:
for line1, line2 in izip_longest(f1, f2, fillvalue=""):
res.write("{} {}\n".format(line1.rstrip(), line2.rstrip()))
In this case, even after the smaller file exhausts, the lines from the longer file will still be copied and the fillvalue
will be used for the lines from the shorter file.
Using ADO to join and query text files
Yes. It is possible and it works. I was intrigued by your question so I tried it out myself. The text driver doesn't understand the bracketing on the fieldnames, only on the table name.
So use aliases for the field names like this:
Select tb1.[fieldname], tb2.[fieldname] From [file_name.txt] as tb1
Inner Join [file_name2.txt] as tb2
On tb1.[fieldname]=tb2.[fieldname]
What worked for me:
SELECT tb1.[Month], tb2.[Year] FROM [Text;DATABASE=E:\].[MoneyAndCreditStats 0409 to 0417.csv] as tb1
INNER JOIN [Text;DATABASE=E:\].[StackaOverFlowTest.csv] as tb2 ON
tb2.[Month] = tb1.[Month] AND
tb1.[Year] = tb2.[Year]
Text driver is a nifty tool especially when shuffling data formats/files around for Business Intelligence.
Joining multiple fields in text files on Unix
you can try this
awk '{
o1=$1;o2=$2;o3=$3
$1=$2=$3="";gsub(" +","")
_[o1 FS o2 FS o3]=_[o1 FS o2 FS o3] FS $0
}
END{ for(i in _) print i,_[i] }' file1 file2
output
$ ./shell.sh
foo 1 scaf 3 4.5
bar 2 scaf 3.3 1.00
foo 1 boo 2.3
If you want to omit uncommon lines
awk 'FNR==NR{
s=""
for(i=4;i<=NF;i++){ s=s FS $i }
_[$1$2$3] = s
next
}
{
printf $1 FS $2 FS $3 FS
for(o=4;o<NF;o++){
printf $i" "
}
printf $NF FS _[$1$2$3]"\n"
} ' file2 file1
output
$ ./shell.sh
foo 1 scaf 3 4.5
bar 2 scaf 3.3 1.00
Join on first column of two files
try this one-liner:
awk 'NR==FNR{a[$1]=$2;next}$1 in a{print $1,a[$1]}' file2 file1
Related Topics
Glibc Scanf Segmentation Faults When Called from a Function That Doesn't Align Rsp
Environment Variable Substitution in Sed
Simulate Delayed and Dropped Packets on Linux
Pipe To/From the Clipboard in a Bash Script
Use of Floating Point in the Linux Kernel
How to Setup & Run Phantomjs on Ubuntu
Is Gettimeofday() Guaranteed to Be of Microsecond Resolution
Assembly Segmentation Fault After Making a System Call, At the End of My Code
How to Get the Process Id to Kill a Nohup Process
How to Convert Hex to Ascii Characters in the Linux Shell
How to Symlink a File in Linux
Using Printf in Assembly Leads to Empty Output When Piping, But Works on the Terminal
Using Awk to Print All Columns from the Nth to the Last
Performing Http Requests With Curl (Using Proxy)
Difference Between Using 'Sh' and 'Source'