merge/join two tables fast linux command line
join -j 1 <(sort file1.txt) <(sort file2.txt)
Does your 'case 2' approach with only standard unix tools. Of course, if the files are sorted, you can drop the sort.
If you included the headers, you might rely on the ids being numerical for sorting the joined header to the top:
join -j 1 <(sort file1.txt) <(sort file2.txt) | sort -n
With
file1.txt
id city car type model
1 york subaru impreza king
2 kampala toyota corolla sissy
3 luzern chrysler gravity falconfile2.txt
id name rating
3 zanzini PG
2 tara Xoutput:
id city car type model name rating
2 kampala toyota corolla sissy tara X
3 luzern chrysler gravity falcon zanzini PG
PS To preserve the TAB separator character, pass the -t
option:
join -t' ' ...
It's kind of hard to show on SO that ' ' contained a TAB character. Type it with ^VTAB (e.g. in bash)
Use Unix JOIN command to merge two files
You used the -a
option.
-a
file_numberIn addition to the default output, produce a line for each unpairable line in file file_number.
In addition, the odd overwriting behavior indicates that you have embedded carriage returns (\r
). I would examine those fies closely with cat -v
or a text editor that doesn't try to be "smart" about Windows files.
unix awk command to merge two tables based on matching columns
I would use join
for that :
join -1 7 -o 1.1,1.2,1.3,1.4,1.5,1.6,1.7,2.3 <(sort tableA -k7) <(sort tableB -k1)
Don't forget to sort input files, -1 7
option makes the join on the seventh field of tableA, -o
option orders the output columns
Output :
OTU_142 dbj|AB021887.1| 5.05e-82 99.412 307 0 AB021887 7936
OTU_8 dbj|AB021887.1| 3.04e-84 100.000 315 0 AB021887 7936
OTU_124 gb|AF156149.1| 4.97e-25 76.106 119 0 AF156149 114741
OTU_145 gb|AF156149.1| 2.28e-33 78.319 147 0 AF156149 114741
OTU_27 gb|AF156151.1| 2.36e-18 84.000 97.1 0 AF156151 114754
MySQL merge two tables and get sum
So, instead of JOIN
what you need is UNION
. You can use "UNION ALL
" or "UNION
", it depends if you want the duplicated rows or not.
In any case, after the UNION
, group that result into a subquery to get the SUM()
SELECT
u.name,
u.code,
SUM(u.num),
FROM
(
SELECT name, code, num FROM tableA
UNION ALL
SELECT name, code, num FROM tableB
) u
GROUP BY u.name, u.code
Join two tables on several columns which are split from a string column
I would unnest the string for the join
:
select t1.*, t2.*
from table1 t1 cross join
unnest(split(t1.col2, '|')) col join
table2 t2
on t2.col_v = col
Merge two CSVs while resolving duplicates
If the suggestion to reverse the order of files to the sort
command doesn't work (see other answer), another way to do this would be to concatenate the files, file2
first, and then sort them with the -s
switch.
cat file2 file1 | sort -t"," -u -k 1,1 -k 2,2 -s
-s
forces a stable sort, meaning that identical lines will appear in the same relative order. Since the input to sort
has all of the lines from file2
before file1
, all of the duplicates in the output should come from file2
.
The sort man page doesn't explicitly state that input files will be read in the order that they're supplied on the command line, so I guess it's possible that an implementation could read the files in reverse order, or alternating lines, or whatever. But if you concatenate the files first then there's no ambiguity.
merging files based on common column in bash shell
- sort the files
- join them
- sed the output
- (columnate them if you want)
example:
$ join -j1 <(sort -k1 file1.txt) <(sort -k1 file2.txt) | sed 's/TRUE/1/g; s/FALSE/0/g' # | column -t -s' '
Note: this will however reorder your result to:
Canada 0
France 0
Italy 1
USA 0
How to merge two files using AWK?
$ awk 'FNR==NR{a[$1]=$2 FS $3;next}{ print $0, a[$1]}' file2 file1
4050 S00001 31228 3286 0 12.1 23.6
4050 S00012 31227 4251 0 12.1 23.6
4049 S00001 28342 3021 1 14.4 47.8
4048 S00001 46578 4210 0 23.2 43.9
4048 S00113 31221 4250 0 23.2 43.9
4047 S00122 31225 4249 0 45.5 21.6
4046 S00344 31322 4000 1
Explanation: (Partly based on another question. A bit late though.)
FNR
refers to the record number (typically the line number) in the current file and NR
refers to the total record number. The operator == is a comparison operator, which returns true when the two surrounding operands are equal. So FNR==NR{commands}
means that the commands inside the brackets only executed while processing the first file (file2
now).
FS
refers to the field separator and $1
, $2
etc. are the 1st, 2nd etc. fields in a line. a[$1]=$2 FS $3
means that a dictionary(/array) (named a
) is filled with $1
key and $2 FS $3
value.
;
separates the commands
next
means that any other commands are ignored for the current line. (The processing continues on the next line.)
$0
is the whole line
{print $0, a[$1]}
simply prints out the whole line and the value of a[$1]
(if $1
is in the dictionary, otherwise only $0
is printed). Now it is only executed for the 2nd file (file1
now), because of FNR==NR{...;next}
.
Merging two data tables with missing values using bash
You should use join
with the -a 1 2
, -e '0'
and -o '0,1.2,1.3,1.4,1.5,2.2,2.3,2.4,2.5'
options:
join -a 1 -a 2 -e '0' -1 1 -2 1 -o '0,1.2,1.3,1.4,1.5,2.2,2.3,2.4,2.5' -t $'\t' file1 file2 > joinedfile
Since join
needs sorted input, and you want Header line to be on the top, you have to exclude this first line and then sort:
sed -n '2,$p' file1unsorted | sort >file1
sed -n '2,$p' file2unsorted | sort >file2
After that, run the above join
command for the sorted files (notice also the -t
that specifies column delimiter - I assume you have Tab
-separated file).
Join you header separately:
head -1 file1unsorted | join -1 1 -2 1 -o '0,1.2,1.3,1.4,1.5,2.2,2.3,2.4,2.5' -t $'\t' - <(head -1 file2unsorted) >headerfile
And then "reassemble" your final file (add new header to the rest of the file):
cat headerfile joinedfile >resulfile
Update:
As to the dependence of join
on the number of columns (in case your files have more columns): yes, there is a dependence, to some degree. To be precise, the column numbers are used in the -1
and -2
options (the value for both is 1
which is the number of the column in the respective file that you are joining on; obviously it doesn't depend on the total number of columns as long as you are joining on the first column). Column numbers are also used in the -o
option that specifies output format (i.e. which columns and in which order are to be output, the format being "file#.column#", both starting from 1, and the column used for join has the special syntax of "0"). The format we specified in our example is actually the default one (first goes the column to join on, then all the rest of the columns from the 1st file, followed by all other columns of the 2nd file), but unfortunately we still cannot omit this option since -e
option requires it (it might not in your version of join
, so try omitting -o
part and see what happens).
Combine two files with unequal length on common column with multiple matches with linux command line
Using awk
Left Outer Join on file2
$ awk 'FNR==NR{a[$1]=$2FS$3; next} ($1 in a){print $0,a[$1]; next} {print $0,"NA","NA"}' file1 file2
Text1 Text4 Text5 Text6 Text2 Text3
1000 1003 19901001 1 128 128/D59
1000 1002 19901001 2 128 128/D59
1001 1003 19971005 0 116 116/A95
2000 1003 19971005 0 NA NA
FNR==NR{a[$1]=$2FS$3; next}
: To store contents of file1
in associative array a
where the key is unique field one
($1 in a){print $0,a[$1]}
: While iterating over file2
check if the first field/key exists in the array. If yes print its value alongside the record.
If key doesn't exist in array (For eg. 2000
) then just print the record which is in file2
; this will reflect the behaviour of left outer join
on file2
.
Inner Join on both files :
$ awk 'FNR==NR{a[$1]=$2FS$3; next} ($1 in a){print $0,a[$1]}' file1 file2
Text1 Text4 Text5 Text6 Text2 Text3
1000 1003 19901001 1 128 128/D59
1000 1002 19901001 2 128 128/D59
1001 1003 19971005 0 116 116/A95
Related Topics
Coreos - Get Docker Container Name by Pid
Error: Ld.So: Object 'Libgtk3-Nocsd.So.0' from Ld_Preload Cannot Be Preloaded
How to Force a Firefox Page Refresh from Linux Console
Merge/Join Two Tables Fast Linux Command Line
Compile Swift Script with Static Swift Core Library
Force Linux to Use Only Memory Over 4G
Can Malloc_Trim() Release Memory from the Middle of the Heap
Linker Cannot Find Symbols, But Libraries Are Read and Symbols Exist
Shell Script Function Return a String
How to Delete Everything in a String After a Specific Character
How to Get in Script Whether Valgrind Found Memory Leaks
What Does "< /Dev/Null >& /Dev/Null" at the End of a Command Do
Rtnetlink Answers :No Such File or Directory Error
How to Read Data from Excel Sheet in Linux Using Shell Script
Codeigniter Url Rewriting .Htaccess Is Not Working on Centos
Shell Script Printing Contents of Variable Containing Output of a Command Removes Newline Characters