Bash join command
First sort
both files. Then use join
to join on the first field of both files. You also need to pipe the output through sed
if you want to remove the space and thus convert a a
into aa
. This is shown below:
$ join -t " " -1 1 -2 1 -a 1 -a 2 <(sort file1) <(sort file2) | sed 's/ \([a-z]\) / \1/g'
1 aa
2 b
3 c
4 d
5 e
6 ff
7 g
8 h
Running multiple commands in one line in shell
You are using |
(pipe) to direct the output of a command into another command. What you are looking for is &&
operator to execute the next command only if the previous one succeeded:
cp /templates/apple /templates/used && cp /templates/apple /templates/inuse && rm /templates/apple
Or
cp /templates/apple /templates/used && mv /templates/apple /templates/inuse
To summarize (non-exhaustively) bash's command operators/separators:
|
pipes (pipelines) the standard output (stdout
) of one command into the standard input of another one. Note thatstderr
still goes into its default destination, whatever that happen to be.|&
pipes bothstdout
andstderr
of one command into the standard input of another one. Very useful, available in bash version 4 and above.&&
executes the right-hand command of&&
only if the previous one succeeded.||
executes the right-hand command of||
only it the previous one failed.;
executes the right-hand command of;
always regardless whether the previous command succeeded or failed. Unlessset -e
was previously invoked, which causesbash
to fail on an error.
Join two files including unmatched lines in Shell
Could you please try following.
awk '
FNR==NR{
a[$1]=$2
next
}
($1 in a){
print $0,a[$1]
b[$1]
next
}
{
print $1,$2 " ----- "
}
END{
for(i in a){
if(!(i in b)){
print i" ----- "a[i]
}
}
}
' Input_file2 Input_file1
Output will be as follows.
207.46.13.90 37556 62343
157.55.39.51 34268 58451
40.77.167.109 21824 21824
157.55.39.253 19683 -----
157.55.39.200 ----- 37675
Join command for two big files based on one column gives empty output
Based on the sample inputs the general issue with the join -j2
is that field #2 in file2
has an 'extra' prefix of >
, eg:
# file1 / line #1 / field #2
lcl|NC_003197.2_prot_NP_463122.1_4111
# file2 / line #1 / field #2
>lcl|NC_003197.2_prot_NP_463122.1_4111
Because of the 'extra' >
no joins can be made.
Short of adding (or removing?) the 'extra' >
during pre-processing, one small change to OP's sample awk
:
awk 'NR==FNR {a[$2]=$1; next} (substr($2,2) in a) {$2=substr($2,2);print $0,a[$2]}' file1 file2
NOTE: one big issue with using awk
arrays and 'massive' files is that you could hit an Out Of Memory (OOM) error (depends on actual volume of data that will need to be stored in the awk
arrays).
Going back to pre-processing ... OP could look at stripping the >
prefix from file2's
2nd field.
One idea using sed
to strip out the first >
it encounters in file2
(assumes this will always be first character of field #2):
sed 's/>//' file2
Adding this into OP's sample join
:
join -j2 -o1.1,2.1,1.2,1.3,1.4,1.5 <(sort -k2 file1) <(sed 's/>//' file2|sort -k2)
Which generates:
SiiA Salmonella_enterica_subsp_enterica_Typhimurium_LT2 lcl|NC_003197.2_prot_NP_463122.1_4111 100.000 100 MEDESNPWPSFVDTFSTVLCIFIFLMLVFALNNMIIMYDNSIKVYKANIENKTKSTAQNSGANDDSNPNEIVNKEVNTQDVSDGMTTMSGKEVGVYDIADGQKTDITSTKNELVITYHGRLRSFSEEDTYKIKAWLEDKINSNLLIEMVIPQADISFSDSLRLGYERGIILMKEIKKIYPDVVIDMSVNSAASSTTSKAIITTINKKVSE
SiiB Salmonella_enterica_subsp_enterica_Typhimurium_LT2 lcl|NC_003197.2_prot_NP_463123.1_4112 100.000 100 MKYINHYRYLFVCFFLAILPFFALSFPGIREYVFDNFMVSAIYNGVIIAIYITGSLCALFTILKNISAKDILIAQDASRKNSILSNLNQVLFAGESKQCDFNLLMELDDNVSTARNQRLSFIMSCSNVSTLVGLLGTFAGLSITIGSIGNLLSSPSDVGGDNASNTLNMIVTMVASLSEPLKGMNTAFVSSIYGVVCAILLTSQSVFVRSSYSLVSTEIKKLKIISNRANNKQRSLRVESETLVEFKELFKAFFDNYLTVENLRTQDEEKKREMLSDSFVTLQNRLLDNSAKLEQISTLIDGYLVSSNENLKKLSDGVITITSRLSEGNILLADNNARLEAMSTIQNIIDKKNDSIMTSV DKCYQESLSHGKTINDIAAGSADISHTLDGLRKEMDEDMNNVHLALSDLSATDKKIIANTKEISAEMVSYRDTYMPLMEKITSMHQEIVKQRLLNKEEKNED
SiiA Salmonella_bongori lcl|NZ_CP053416.1_prot_WP_079774927.1_2027 77.619 100 MEDESNPWPSFVDTFSTVLCIFIFLMLVFALNNMLIMYDNSIKVYKTNIEKHANSKDEKSGDNKKENTNEKVENETISKDSSAESTEMSGKEIGIYDIADDQRIDITSEEKELVITYRGRLRSFSKEDLNKITVWLEDKANSNLLIEMIIPQADISFSDSLRLGYERGIILMKEIKKIYPDVVIDMSVNSTASSSTSKAIITTTNKKVPE
NOTE: OP's join
format (-o...
) will place single spaces between the fields while OP's desired output is showing multiple spaces (or are those tabs?); I'll leave it up to OP to work out the differences in white space.
join command leaving out a row of numbers
As an alternative to 2 sort
commands (can be very expensive for big files) and then a join
, you can use this single awk
command to get your output:
awk 'FNR == NR{a[$3]=$0; next} $3 in a{print $3, a[$3], $1, $2, $4}' file1 file2
3 4 5 3 1 2 4
c c c c a b d
Explanation:
NR == FNR { # While processing the first file
a[$3] = $0 # store the whole line in array a using $3 as key
next
}
$3 in a { # while processing the 2nd file, when $3 is found in array
print $3,a[$3],$1,$2,$4 # print relevant fields from file2 and the remembered
# value from the first file.
}
unix join command to return all columns in one file
I'm not aware of wildcards in the format string.
From your desired output I think that what you want may be achievable like so without having to specify all the enumerations:
grep -f <(awk '{print $1}' file2.tsv ) file1.tsv
1 a ant
2 b bat
3 c cat
Or as an awk-only solution:
awk '{if(NR==FNR){a[$1]++}else{if($1 in a){print}}}' file2.tsv file1.tsv
1 a ant
2 b bat
3 c cat
Ignore header in join command (outdated coreutils)
If you can't use --header
, help yourself out with tail
join <(tail -n+2 file1) <(tail -n+2 file2)
Alternative to join command in bash
You might want to check out q
with which you can perform sql on a structured text file (here you can find some examples).
join command in linux says that files aren't sorted but they are
I suggest to remove sort's option -n
.
From man join
:
Important: FILE1 and FILE2 must be sorted on the join fields. E.g., use
sort -k 1b,1
ifjoin
has no options, or usejoin -t ''
ifsort
has no options. Note, comparisons honor the rules specified byLC_COLLATE
. If the input is not sorted and some lines cannot be joined, a warning message will be given.
Related Topics
How to Add Chromedriver to Path in Linux
How to Use Both 64 Bit and 32 Bit Instructions in the Same Executable in 64 Bit Linux
Url Encoding a String in Bash Script
Redirecting Output to a File in C
Can't Run Sonar Server Caused by Elasticsearch Cannot Running as Root
/Usr/Bin/Ld: Skipping Incompatible Foo.So When Searching for Foo
Capture Both Exit Status and Output from a System Call in R
How to Disable or Change the Timeout Limit for the Gpu Under Linux
How Does Execve Call Dynamic Linker/Loader (Ld-Linux.So.2)
Shell Script Get Ctrl+Z with Trap
Does Cron Expression in Unix/Linux Allow Specifying Exact Start and End Dates
How to Rebuild Rootfs in Buildroot
How to Execute Parallel "For" Loops in Bash
Branch-Specific Configuration File Maintenance with Git
Find Out If File Has Been Modified Within the Last 2 Minutes