join two csv files with key value
Here's how to use join in bash:
{
echo "City, Tmin, Tmax, Date, Tmin1, Tmax1"
join -t, <(sort d01.csv) <(sed 1d d02.csv | sort)
} > d03.csv
cat d03.csv
City, Tmin, Tmax, Date, Tmin1, Tmax1
Barcelona, 19.5, 29.5, 20140916, 19.9, 28.5
Lleida, 16.5, 33.5 , 20140916, 17.5, 32.5
Tarragona, 20.4, 31.5 , 20140916, 21.4, 30.5
Note that join only outputs records where the key exists in both files. To get all of them, specify that you want missing records from both files, specify the fields you want, and give a default value for the missing fields:
join -t, -a1 -a2 -o 0,1.2,1.3,2.2,2.3,2.4 -e '?' <(sort d01.csv) <(sed 1d d02.csv | sort)
Barcelona, 19.5, 29.5, 20140916, 19.9, 28.5
Girona, 17.2, 32.5,?,?,?
Lleida, 16.5, 33.5 , 20140916, 17.5, 32.5
Tarragona, 20.4, 31.5 , 20140916, 21.4, 30.5
Tortosa,?,?, 20140916, 20.5, 30.4
Vic, 17.5, 31.4,?,?,?
How to merge two CSV files with Linux column wise?
Use paste -d ,
to merge the two files and >
to redirect the command output to another file:
$ paste -d , file1.csv file2.csv > output.csv
E.g.:
$ cat file1.csv
A,B
$ cat file2.csv
C,D
$ paste -d , file1.csv file2.csv > output.csv
$ cat output.csv
A,B,C,D
-d ,
tells paste to use ,
as the delimiter to join the columns.
>
tells the shell to write the output of the paste command to the file output.csv
How to outer-join two CSV files, using shell script?
We suggest gawk
script which is standard Linux awk
:
script.awk
NR == FNR {
valsStr = sprintf("%s,%s", $2, "na");
rowsArr[$1] = valsStr;
}
NR != FNR && $1 in rowsArr {
split(rowsArr[$1],valsArr);
valsStr = sprintf("%s,%s", valsArr[1], $2);
rowsArr[$1] = valsStr;
next;
}
NR != FNR {
valsStr = sprintf("%s,%s", "na", $2);
rowsArr[$1] = valsStr;
}
END {
printf("%s,%s\n", "label", rowsArr["label"]);
for (rowName in rowsArr) {
if (rowName == "label") continue;
printf("%s,%s\n", rowName, rowsArr[rowName]);
}
}
output:
awk -F, -f script.awk input.{1,2}.txt
label,Part-A,Part-B
LMN,na,8
ABC,2,na
PQR,6,6
EFG,na,1
XYZ,3,4
BASH: Joining 2 CSV files based on common field name
The following join
command should do the trick:
join --header -t',' -j 1 file_2.csv file_1.csv
Just make sure that your CSV files are sorted on the join fields; havingtrack_id
as the first field in each file makes this easy.
You should use test data in both files and when you're satisfied that the command is doing what you want, you can run it against actual data and redirect its output to file_3.csv
.
How can I merge two CSV files from command line?
A basic merge would be
cat a.csv <(tail +2 b.csv) > c.csv
This will put all of b.csv
after a.csv
.
Edit
I've added the <(tail +2 b.csv)
. It will skip the header in the b.csv
file.
edit2
$ cat a.csv
hdr
a
b
c
$ cat b.csv
hdr
e
f
g
$ cat a.csv <(tail +2 b.csv)
hdr
a
b
c
e
f
g
IHTH
Related Topics
Evaluating Smi (System Management Interrupt) Latency on Linux-Centos/Intel MAChine
Compiling Out-Of-Tree Kernel Module Against Any Kernel Source Tree on the Filesystem
Binary Data Over Serial Terminal
Is There a Core Linux API Analogous to Windows Winapi, in Particular for Creating Gui Applications
Linux Raw Ethernet Socket Bind to Specific Protocol
How to Reset Emacs to Save Files in Utf-8-Unix Character Encoding
Docker MACvlan Network, Unable to Access Internet
Sftp on Linux Server Gives Error "Received Message Too Long"
How to Write One Script That Runs in Bash/Shell and Powershell
Scripts Launched from Udev Do Not Have Display Access Anymore
Why Does Gcc Force Pic for X64 Shared Libs
Pycharm Startup Error: Unable to Detect Graphics Environment
How to Install R 3.1.2 on Linux Mint 17.1
How to Test If an Address Is Readable in Linux Userspace App
Comparison of Gui Development Tools for Linux
How to Read Just a Single Character in Shell Script
Pipe String to Gnu Date for Conversion - How to Make It Read from Stdin