bash join multiple files with empty replacement (-e option)
It's poorly documented, but when using join
the -e
option only works in conjunction with the -o
option. The order string needs to be amended each time around the loop. The following code should generate your desired output.
i=3
orderl='0,1.2'
orderr=',2.2'
for k in $(ls file?)
do
if [ -a final.results ]
then
join -a1 -a2 -e "0" -o "$orderl$orderr" final.results $k > tmp.res
orderl="$orderl,1.$i"
i=$((i+1))
mv tmp.res final.results
else
cp $k final.results
fi
done
As you can see, it starts to become messy. If you need to extend this much further it might be worth deferring to a beefier tool such as awk or python.
Join loop on multiple files, filling empty fields
Just use R and you can change the desired extension as necessary:
Here are the files I used as an example:
f1.txt
a 1
b 4
c 6
e 3
f2.txt
c 1
d 4
f 5
z 3
f3.txt
a 1
b 4
c 5
e 7
g 12
R code:
#!/bin/env/Rscript
ext='.ext' #can alter this to desired extension
files <- list.files(pattern=ext) #get name of files in a directory
listOfFiles <- lapply(files, function(x){ read.table(x, row.names=1) } )
#The big reduction of all the files into a table
tbl <- Reduce(function(...) data.frame(merge(..., all = T, by = 0), row.names=1), listOfFiles)
tbl[is.na(tbl)] <- 0 #set all NA vals to 0
colnames(tbl) <- files #set the columns to the corresponding filenames (optional)
tbl #print out the table
Output:
f1.ext f2.ext f3.ext
a 1 0 1
b 4 0 4
c 6 1 5
d 0 4 0
e 3 0 7
f 0 5 0
g 0 0 12
z 0 3 0
Merging multiple files with two common columns, and replace the blank to 0
One more variant, could you please try following, written and teste with shown samples.
awk '
{
if(!a[FILENAME]++){
file[++count]=FILENAME
}
b[$1 OFS $2 OFS FILENAME]=$NF
c[$1 OFS $2]++
if(!d[$1 OFS $2]++){
e[++count1]=$1 OFS $2
}
}
END{
for(i=1;i<=length(c);i++){
printf("%s ",e[i])
for(j=1;j<=count;j++){
printf("%s %s",(b[e[i] OFS file[j]]!=""?b[e[i] OFS file[j]]:0),j==count?ORS:OFS)
}
}
}
' file{1..4} | sort -k1
Output will be as follows.
chr1 111001 234 42 92 129
chr1 430229 0 267 0 0
chr2 22099 108 0 0 442
chr5 663800 0 0 311 0
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
{
if(!a[FILENAME]++){ ##Checking condition if FILENAME is present in a then do following.
file[++count]=FILENAME ##Creating file with index of count and value is current file name.
}
b[$1 OFS $2 OFS FILENAME]=$NF ##Creating array b with index of 1st 2nd and filename and which has value as last field.
c[$1 OFS $2]++ ##Creating array c with index of 1st and 2nd field and keep increasing its value with 1.
if(!d[$1 OFS $2]++){ ##Checking condition if 1st and 2nd field are NOT present in d then do following.
e[++count1]=$1 OFS $2 ##Creating e with index of count1 with increasing value of 1 and which has first and second fields here.
}
}
END{ ##Starting END block of this awk program from here.
for(i=1;i<=length(c);i++){ ##Starting for loop which runs from i=1 to till length of c here.
printf("%s ",e[i]) ##Printing value of array e with index i here.
for(j=1;j<=count;j++){ ##Starting for loop till value of count here.
printf("%s %s",(b[e[i] OFS file[j]]!=""?b[e[i] OFS file[j]]:0),j==count?ORS:OFS) ##Printing value of b with index of e[i] OFS file[j] if it present then print else print 0, print new line if j==count or print space.
}
}
}
' file{1..4} | sort -k1 ##Mentioning Input_files 1 to 4 here and sorting output with 1st field here.
EDIT: As per GREAT regex GURU @anubhava sir's comments adding solution with ARGC
and ARGV
with GNU awk
.
awk '
{
b[$1 OFS $2 OFS FILENAME]=$NF
c[$1 OFS $2]++
if(!d[$1 OFS $2]++){
e[++count1]=$1 OFS $2
}
}
END{
count=(ARGC-1)
for(i=1;i<=length(c);i++){
printf("%s ",e[i])
for(j=1;j<=(ARGC-1);j++){
printf("%s %s",(b[e[i] OFS ARGV[j]]!=""?b[e[i] OFS ARGV[j]]:0),j==count?ORS:OFS)
}
}
}
' file{1..4} | sort -k1
join multiple files
man join
:
NAME
join - join lines of two files on a common field
SYNOPSIS
join [OPTION]... FILE1 FILE2
it only works with two files.
if you need to join three, maybe you can first join the first two, then join the third.
try:
join file1 file2 | join - file3 > output
that should join the three files without creating an intermediate temp file. -
tells the join command to read the first input stream from stdin
Merging many files based on matching column
With GNU awk for true multi-dimensional arrays and sorted_in:
$ cat tst.awk
FNR==1 { numCols = colNr }
{
key = $1
for (i=2; i<=NF; i++) {
colNr = numCols + i - 1
val = $i
lgth = length(val)
vals[key][colNr] = val
wids[colNr] = (lgth > wids[colNr] ? lgth : wids[colNr])
}
}
END {
numCols = colNr
PROCINFO["sorted_in"] = "@ind_num_asc"
for (key in vals) {
printf "%s", key
for (colNr=1; colNr<=numCols; colNr++) {
printf "%s%*d", OFS, wids[colNr], vals[key][colNr]
}
print ""
}
}
$ awk -f tst.awk file*
1001 1 2 0 5 0 102
1002 1 2 7 3 10 0
1003 3 5 8 0 0 305
1004 6 7 0 0 60 0
1005 8 9 0 0 0 809
1007 0 0 0 0 4 0
1009 2 3 0 0 0 0
Linux - Join multiple CSV files into one
UNIX join
should get you a long way:
join -a 1 -e '0' "-t " -j 1
<(sort <(join -a 1 -e '0' "-t " -j 1 <(sort file1) <(sort file2)))
<(sort file3)
(all on one line). Note that "-t "
has the TAB character within quotes. Enter it using ^V<Tab>
.
If you know the input is sorted, it would be better to use
join -a 1 -e '0' "-t " -j 1
<(join -a 1 -e '0' "-t " -j 1 file1 file2)
file3
(all on one line) prints:
id header1 header2 header3 header1 header2 header3 header1 header2 header3
result_A 10 11 12
result_B 13 14 15 40 41 42
result_C 16 17 18 60 61 62
result_D 19 20 21 63 64 65
result_E 22 23 24
result_F 25 26 27 43 44 45 66 67 68
Now, as you can see, on my Cygwin system -e '0'
apparently doesn't work as advertised. I'd suggest trying this on a different system though, as I don't imagine having uncovered such an essential bug in a standard UNIX utility.
Concatenating multiple text files into a single file in Bash
This appends the output to all.txt
cat *.txt >> all.txt
This overwrites all.txt
cat *.txt > all.txt
Best way to do a find/replace in several files?
find . -type f -print0 | xargs -0 -n 1 sed -i -e 's/from/to/g'
The first part of that is a find command to find the files you want to change. You may need to modify that appropriately. The xargs
command takes every file the find found and applies the sed
command to it. The sed
command takes every instance of from and replaces it with to. That's a standard regular expression, so modify it as you need.
If you are using svn beware. Your .svn-directories will be search and replaced as well. You have to exclude those, e.g., like this:
find . ! -regex ".*[/]\.svn[/]?.*" -type f -print0 | xargs -0 -n 1 sed -i -e 's/from/to/g'
or
find . -name .svn -prune -o -type f -print0 | xargs -0 -n 1 sed -i -e 's/from/to/g'
Related Topics
What Does Two Dots Before a Slash Mean? (../)
Setting Up Jenkins Slave on MAC Os
Bash Script to Run a Constant Number of Jobs in the Background
How to Run Command During Docker Build Which Requires a Tty
Objdump and Resolving Linkage of Local Function Calls
How to Extract One Column from Multiple Files, and Paste Those Columns into One File
Best Way to Make Linux Web Services
Compare Md5 Sums in Bash Script
Is There an Scp Variant of Mv Command
How to Create a File with Any Given Size in Linux
What Does It Take to Be Durable on Linux
Cron Error with Using Backquotes
How to Copy a File with '$' in Name in Linux
Bash (Or Other Shell): Wrap All Commands with Function/Script
Is It Good Practice to Use Mkdir as File-Based Locking on Linux