Linux Join Utility Complains About Input File Not Being Sorted

How do I pipe comm outputs to a file?

You've sorted numerically; comm works on lexically sorted files.

For instance, in file2, the line 103 is dramatically out of order with the lines 21..87. Your files must be 'plain sort sorted'.

If you've got bash (4.x), you can use process substitution:

comm <(sort file1) <(sort file2)

This runs the two commands and ensures that the comm process gets to read their standard output as if they were files.

Failing that:

(
sort -o file1 file1 &
sort -o file2 file2 &
wait
comm file1 file2
)

This uses parallelism to get the file sorted at the same time. The sub-shell (in ( ... )) ensures that you don't end up waiting for other background processes to finish.

Sorting with filehandle perl

You could directly sort the output of <> in array context to remove a loop and make it a lot easier to read in my opinion.

If you are sorting lines, there is no need to chomp the end of line. If you leave it there then it cleans up the print statement by removing the manual newline character.

Also if you you lexical variables (eg my $input) instead of file handle (eg 'INPUT') for the open function, the file descriptors are automatically closed at the end of the scope.

use strict;
use warnings;

open my $input, "<", "input.txt";
open my $output, ">", "output.txt";

my @lines=sort <$input>; #Use array context to read all lines in file


for (@lines) {
print $output $_;
}


Related Topics



Leave a reply



Submit