Linux - Join 2 CSV Files

join two csv files with key value

Here's how to use join in bash:

{
  echo "City, Tmin, Tmax, Date, Tmin1, Tmax1"
  join -t, <(sort d01.csv) <(sed 1d d02.csv | sort)
} > d03.csv
cat d03.csv

City, Tmin, Tmax, Date, Tmin1, Tmax1
Barcelona, 19.5, 29.5, 20140916, 19.9, 28.5
Lleida, 16.5, 33.5 , 20140916, 17.5, 32.5 
Tarragona, 20.4, 31.5 , 20140916, 21.4, 30.5

Note that join only outputs records where the key exists in both files. To get all of them, specify that you want missing records from both files, specify the fields you want, and give a default value for the missing fields:

join -t, -a1 -a2 -o 0,1.2,1.3,2.2,2.3,2.4 -e '?' <(sort d01.csv) <(sed 1d d02.csv | sort)

Barcelona, 19.5, 29.5, 20140916, 19.9, 28.5
Girona, 17.2, 32.5,?,?,?
Lleida, 16.5, 33.5 , 20140916, 17.5, 32.5 
Tarragona, 20.4, 31.5 , 20140916, 21.4, 30.5  
Tortosa,?,?, 20140916, 20.5, 30.4
Vic, 17.5, 31.4,?,?,?

How to merge two CSV files with Linux column wise?

Use paste -d , to merge the two files and > to redirect the command output to another file:

$ paste -d , file1.csv file2.csv > output.csv

E.g.:

$ cat file1.csv
A,B

$ cat file2.csv
C,D

$ paste -d , file1.csv file2.csv > output.csv

$ cat output.csv
A,B,C,D

-d , tells paste to use , as the delimiter to join the columns.

> tells the shell to write the output of the paste command to the file output.csv

How to outer-join two CSV files, using shell script?

We suggest gawk script which is standard Linux awk:

script.awk

NR == FNR {
  valsStr = sprintf("%s,%s", $2, "na");
  rowsArr[$1] = valsStr;
}
NR != FNR && $1 in rowsArr {
  split(rowsArr[$1],valsArr);
  valsStr = sprintf("%s,%s", valsArr[1], $2);
  rowsArr[$1] = valsStr;
  next;
}
NR != FNR {
  valsStr = sprintf("%s,%s", "na", $2);
  rowsArr[$1] = valsStr;
}
END {
  printf("%s,%s\n", "label", rowsArr["label"]);
  for (rowName in rowsArr) {
     if (rowName == "label") continue;
     printf("%s,%s\n", rowName, rowsArr[rowName]);
  }
}

output:

awk -F, -f script.awk input.{1,2}.txt

label,Part-A,Part-B
LMN,na,8
ABC,2,na
PQR,6,6
EFG,na,1
XYZ,3,4

BASH: Joining 2 CSV files based on common field name

The following join command should do the trick:

join --header -t',' -j 1 file_2.csv file_1.csv

Just make sure that your CSV files are sorted on the join fields; having
track_id as the first field in each file makes this easy.

You should use test data in both files and when you're satisfied that the command is doing what you want, you can run it against actual data and redirect its output to file_3.csv.

How can I merge two CSV files from command line?

A basic merge would be

 cat a.csv <(tail +2 b.csv) > c.csv

This will put all of b.csvafter a.csv.

Edit
I've added the <(tail +2 b.csv). It will skip the header in the b.csv file.

edit2

$ cat a.csv
hdr
a
b
c
$ cat b.csv
hdr
e
f
g

$ cat a.csv <(tail +2 b.csv)
hdr
a
b
c
e
f
g

IHTH