sort across multiple files in linux
I don't know about a command doing in-place sorting, but I think a faster "merge sort" is possible:
for file in *.txt; do
sort -o $file $file
done
sort -m *.txt | split -d -l 1000000 - output
- The
sort
in the for loop makes sure the content of the input files is sorted. If you don't want to overwrite the original, simply change the value after the-o
parameter. (If you expect the files to be sorted already, you could change the sort statement to "check-only":sort -c $file || exit 1
) - The second
sort
does efficient merging of the input files, all while keeping the output sorted. - This is piped to the
split
command which will then write to suffixed output files. Notice the-
character; this tells split to read from standard input (i.e. the pipe) instead of a file.
Also, here's a short summary of how the merge sort works:
sort
reads a line from each file.- It orders these lines and selects the one which should come first. This line gets sent to the output, and a new line is read from the file which contained this line.
- Repeat step 2 until there are no more lines in any file.
- At this point, the output should be a perfectly sorted file.
- Profit!
Using linux sort on multiple files
I assume you have many input files, and you want to create a sorted version of each of them. I would do this using something like
for f in file*
do
sort $f > $f.sort
done
Now, this has the small problem that if you run it again, if will not only sort all the files again, it will also create file1.sort.sort to go with file1.sort. There are various ways to fix that. We can fix the second problem by creating sorted files thate don't have names beginning with "file":
for f in file*
do
sort $f > sorted.$f
done
But that's kind of weird, and I wouldn't want files named like that. Alternatively, we could use a slightly more clever script that checks whether the file needs sorting, and avoids both problems:
for f in file*
do
if expr $f : '.*\.sort' > /dev/null
then
: no need to sort
elif test -e $f.sort
then
: already sorted
else
sort -nr -k 2 $f > $f.sort
fi
done
How to sort multiple files? Unix
With GNU sort
you can do:
$ sort file -o file
You could use xargs
instead of looping like:
$ ls | xargs -i% -n1 sort % -o %
If you don't have the -o
option:
$ sort file > tmp && mv tmp file
$ ls | xargs -i% -n1 sort % > tmp && mv tmp %
Sort two files in Linux and find lines unique to each file
Using awk you can do this without sorting:
awk 'FNR==NR {
a[$0]
next
}
{
if ($0 in a)
delete a[$0]
else
print
}
END {
for (i in a)
print i
}' file1 file2
Similarly using grep
you can get the same using:
{ grep -vxFf file1 file2; grep -vxFf file2 file1; }
Grep keyword in multiple files and sort results by files modified date or name
Just sort with ls
and pass the results into grep
or ag
, e.g. to sort by date:
grep "keyword" $(ls -1rt)
To sort by name, agin use ls
. A caveat worth mentioning for the MacOS: you'll need to use GNU's ls
(brew install coreutils) with its -U flag:
ag "keyword" $(gls -U --color)
#sort by name on MacOS
Related Topics
Allocate Writable Memory in the .Text Section
Differencebetween './Example.Sh' and 'Sh Example.Sh'
Determine If There Is Data Left on the Socket and Discard It
How to Look Up a Variable by Name with #!/Bin/Sh (Posix Sh)
How to Extract Characters Between the Delimiters Using Sed
How to Get a List of All Valid Ip Addresses in a Local Network
How to Run Dos2Unix on an Entire Directory
How to Create a File in Linux from Terminal Window
How to Convert Spaces to Tabs in Vim or Linux
How to See Top Processes Sorted by Actual Memory Usage
How to List the Contents of a Package Using Yum
How to Calculate an Md5 Checksum of a Directory
Attach to a Processes Output for Viewing
How to Send Data to Local Clipboard from a Remote Ssh Session
How to List All Users in a Linux Group
Fastest Way to Tell If Two Files Have the Same Contents in Unix/Linux