How to join two CSV files by a temporary common column in awk?
Could you please try following, tested and written on shown samples only.
awk 'BEGIN{FS=OFS=","} FNR==NR{a[$1]=$2;next} ($1 in a){print $2,a[$1]}' file2 file1
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section from here.
FS=OFS="," ##Setting field and output field separator as comma here.
}
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when file2 is being read.
a[$1]=$2 ##Creating array a with index $1 and value is $2 from current line.
next ##next will skip all further statement from here.
}
($1 in a){ ##Statements from here will be executed when file1 is being read and it's checking if $1 is present in array a then do following.
print $2,a[$1] ##Printing 2nd field and value of array a with index $1 here.
}
' file2 file1 ##Mentioning Input_file names here.
Output will be as follows for shown samples.
44,7
21,7
65,4
79,11
2nd solution: More Generic solution, where considering that your both Input_files could have duplicates in that case it will print 1st value of A in Input_file1 to first value of Input_file2 and so on.
awk '
BEGIN{
FS=OFS=","
}
FNR==NR{
a[$1]
b[$1,++c[$1]]=$2
next
}
($1 in a){
print $2,b[$1,++d[$1]]
}
' file2 file1
merge two csv files according to matching rows and add new columns in linux
awk may work for you:
kent$ awk -F, -v OFS=","
'BEGIN{print "name,Direction,Date,currentDirection,receivedDate"}
NR==FNR&&NR>1{a[$1]=$0;next}
FNR>1{printf "%s%s\n",$0,($1 in a?FS a[$1]:"")}' 2.csv 1.csv
name,Direction,Date,currentDirection,receivedDate
abc,sent,Jan 21 2014 02:06,abc,received,Jan 22 2014 02:06
xyz,sent,Nov 21 2014 01:09,xyz,received,Nov 22 2014 02:06
pqr,sent,Oct 21 2014 03:06
update
kent$ awk -F, -v OFS="," 'BEGIN{print "name,Direction,Date,currentDirection,receivedDate"}
NR==FNR&&NR>1{a[$1]=$2 FS $3;next}
FNR>1{printf "%s%s\n",$0,($1 in a?FS a[$1]:"")}' 2.csv 1.csv
name,Direction,Date,currentDirection,receivedDate
abc,sent,Jan 21 2014 02:06,received,Jan 22 2014 02:06
xyz,sent,Nov 21 2014 01:09,received,Nov 22 2014 02:06
pqr,sent,Oct 21 2014 03:06
use awk to left outer join two csv file based on multiple columns while keeping order of the first file observations
awk
to the rescue!
$ awk -F, 'NR==FNR{a[$1,$2]=$3; next} {print $0 FS a[$NF,$2]}' file2 file1
ID,Name,Gender,Salary,DOB,Children
11,Jim,M,200,90,2
12,David,M,100,89,
12,David,M,300,89,
13,Lucy,F,150,86,
14,Lily,F,200,85,0
13,Lucy,F,100,86,
join
will require sorted input and you need embellishments to recover initial ordering. You can redirect the output to a file by adding > outputfile.csv
Merging two files on a common column and printing all columns using awk
Your error is in your print
command:
{print $0 FS a[$2]}
should be
{print $0, ( $1 in a ? a[$1] : a[$6] )}
because (1) the "interesting" field is only in $2
while you're in file1, not as you traverse file2
, and (2) you need to pull data from the array based on either of your conditions. We can select which array element using a classic ternary operator. (I've added extra spacing for easier reading.)
Note the use of the comma, which separates fields by OFS
, rather than using FS
which is your input field separator.
Otherwise, your code looks fine to me.
How to merge files based on common string (in different column number) using awk
awk
to the rescue!
awk 'NR==FNR {k=$1; $1=""; a[k]=$0; next}
{k=$NF; NF--; print k,$0 a[k]}' file2 file1b
000001 c-2-3 p045 238744 1 0 0 rs333 HESN
000001 c-2-4 p042 439709 1 0 0 rs333 HESN
000002 c-2-4 p055 234744 1 0 0 rs333 POS
000003 c-2-5 p099 956755 1 0 0 rs333 POS
000004 c-2-9 p064 504435 0 1 0 rs333 POS
000005 c-1-5 p043 384029 0 0 1 rs333 NEG
000006 c-2-2 p011 434444
000009 c-1-3 p083 035905
there are some subtle tricks employed to simplify the code...
Merge CSV files using join/awk/sed
You could use multiple calls to join
:
join -t , -a 1 -o auto template.csv file1.csv | join -t , -a 1 -o auto - file2.csv | join -t , -a 1 -o auto - file3.csv
Or more clearer :
alias myjoin='join -t , -a 1 -o auto'
myjoin template.csv file1.csv | myjoin - file2.csv | myjoin - file3.csv
Explanation :
-t ,
specifies the field separator (,
)-a 1
instructs to print unpairable lines coming from the first file (an assumption is made that the header file contains all possible headers)-o auto
controls formatting and is necessary to print the empty fields
Proof :
$ join -t , -a 1 -o auto template.csv file1.csv | join -t , -a 1 -o auto - file2.csv | join -t , -a 1 -o auto - file3.csv
header,value1,value2,value3
2000-01-01 00:00:00,,value21,
2000-01-01 00:15:00,value12,value22,value32
2000-01-01 00:30:00,value13,value23,
2000-01-01 00:45:00,,value24,value34
2000-01-01 01:00:00,,,
2000-01-01 01:15:00,,,value36
2000-01-01 01:30:00,value17,,value37
2000-01-01 01:45:00,value18,,value38
2000-01-01 02:00:00,value19,,
Note :
For this to work, the files MUST be sorted on the join fields (the header in your case). You can use the sort
command if this is not the case.
Related Topics
Replace Line with Space and Backslash with a String Containing Spaces
How Come _Exit(0) (Exiting by Syscall) Prevents Me from Receiving Any Stdout Content
How to Increase the Maximum Number of Characters That Ksh Variable Accepts
How Provide Nested Mount of Overlayfs
Adding Timestamps to Packet Payload with Tcpreplay
Bash-Script Error 0=1: Command Not Found
Visual Studio Code Doesnt Run on Ubuntu
Making Proprietary Elf Binaries Portable on Linux
Bash - Calculate the Average of Numbers Inputted
Shell Script Issue with Filenames Containing Spaces
Linux Script Start,Stop,Restart
How to Extract Patterns Form a Text Files in Shell Bash
How to Set Firefox Binary Path of Firefox in Selenium in Linux
Find Files with a Certain Extension That Exceeds a Certain File Size