The Sort -R Command Doesn't Sort Lines Randomly in Linux

How can I shuffle the lines of a text file on the Unix command line or in a shell script?

You can use shuf. On some systems at least (doesn't appear to be in POSIX).

As jleedev pointed out: sort -R might also be an option. On some systems at least; well, you get the picture. It has been pointed out that sort -R doesn't really shuffle but instead sort items according to their hash value.

[Editor's note: sort -R almost shuffles, except that duplicate lines / sort keys always end up next to each other. In other words: only with unique input lines / keys is it a true shuffle. While it's true that the output order is determined by hash values, the randomness comes from choosing a random hash function - see manual.]

How can I randomize the lines in a file using standard tools on Red Hat Linux?

And a Perl one-liner you get!

perl -MList::Util -e 'print List::Util::shuffle <>'

It uses a module, but the module is part of the Perl code distribution. If that's not good enough, you may consider rolling your own.

I tried using this with the -i flag ("edit-in-place") to have it edit the file. The documentation suggests it should work, but it doesn't. It still displays the shuffled file to stdout, but this time it deletes the original. I suggest you don't use it.

Consider a shell script:

#!/bin/sh

if [[ $# -eq 0 ]]
then
echo "Usage: $0 [file ...]"
exit 1
fi

for i in "$@"
do
perl -MList::Util -e 'print List::Util::shuffle <>' $i > $i.new
if [[ `wc -c $i` -eq `wc -c $i.new` ]]
then
mv $i.new $i
else
echo "Error for file $i!"
fi
done

Untested, but hopefully works.

How to randomly sort one key while the other is kept in its original sort order with GNU sort

You can do this with awk pretty easily.

As a one-liner:

awk -F: 'BEGIN{cmd="sort -R"} $1 != key {close(cmd)} {key=$1; print | cmd}' input.txt

Or, broken apart for easier explanation:

  • -F: - Set awk's field separator to colon.
  • BEGIN{cmd="sort -R"} - before we start, set a variable that is a command to do the "randomized sort". This one works for me on FreeBSD. Should work with GNU sort as well.
  • $1 != key {close(cmd)} - If the current line has a different first field than the last one processed, close the output pipe...
  • {key=$1; print | cmd} - And finally, set the "key" var, and print the current line, piping output through the command stored in the cmd variable.

This usage takes advantage of a bit of awk awesomeness. When you pipe through a string (be it stored in a variable or not), that pipe is automatically created upon use. You can close it any time, and a subsequent use will reopen a new command.

The impact of this is that each time you close(cmd), you print the current set of randomly sorted lines. And awk closes cmd automatically once you come to the end of the file.

Of course, for this solution to work, it's vital that all lines with a shared first field are grouped together.

How to shuffle a list in vim?

You could go "UNIX style" and use the shuf command from the coreutils package:

:10,20!shuf<CR>


Related Topics



Leave a reply



Submit