Sort Across Multiple Files in Linux

sort across multiple files in linux

I don't know about a command doing in-place sorting, but I think a faster "merge sort" is possible:

for file in *.txt; do
sort -o $file $file
done
sort -m *.txt | split -d -l 1000000 - output
  • The sort in the for loop makes sure the content of the input files is sorted. If you don't want to overwrite the original, simply change the value after the -o parameter. (If you expect the files to be sorted already, you could change the sort statement to "check-only": sort -c $file || exit 1)
  • The second sort does efficient merging of the input files, all while keeping the output sorted.
  • This is piped to the split command which will then write to suffixed output files. Notice the - character; this tells split to read from standard input (i.e. the pipe) instead of a file.

Also, here's a short summary of how the merge sort works:

  1. sort reads a line from each file.
  2. It orders these lines and selects the one which should come first. This line gets sent to the output, and a new line is read from the file which contained this line.
  3. Repeat step 2 until there are no more lines in any file.
  4. At this point, the output should be a perfectly sorted file.
  5. Profit!

Using linux sort on multiple files

I assume you have many input files, and you want to create a sorted version of each of them. I would do this using something like

for f in file*
do
sort $f > $f.sort
done

Now, this has the small problem that if you run it again, if will not only sort all the files again, it will also create file1.sort.sort to go with file1.sort. There are various ways to fix that. We can fix the second problem by creating sorted files thate don't have names beginning with "file":

for f in file*
do
sort $f > sorted.$f
done

But that's kind of weird, and I wouldn't want files named like that. Alternatively, we could use a slightly more clever script that checks whether the file needs sorting, and avoids both problems:

for f in file*
do
if expr $f : '.*\.sort' > /dev/null
then
: no need to sort
elif test -e $f.sort
then
: already sorted
else
sort -nr -k 2 $f > $f.sort
fi
done

How to sort multiple files? Unix

With GNU sort you can do:

$ sort file -o file 

You could use xargs instead of looping like:

$ ls | xargs -i% -n1 sort % -o % 

If you don't have the -o option:

$ sort file > tmp && mv tmp file 

$ ls | xargs -i% -n1 sort % > tmp && mv tmp %

Sort two files in Linux and find lines unique to each file

Using awk you can do this without sorting:

awk 'FNR==NR {
a[$0]
next
}
{
if ($0 in a)
delete a[$0]
else
print
}
END {
for (i in a)
print i
}' file1 file2

Similarly using grep you can get the same using:

{ grep -vxFf file1 file2; grep -vxFf file2 file1; }

Grep keyword in multiple files and sort results by files modified date or name

Just sort with ls and pass the results into grep or ag, e.g. to sort by date:

grep "keyword" $(ls -1rt)

To sort by name, agin use ls. A caveat worth mentioning for the MacOS: you'll need to use GNU's ls (brew install coreutils) with its -U flag:

ag "keyword" $(gls -U --color) #sort by name on MacOS



Related Topics



Leave a reply



Submit