How to Sort Very Large Files

Sort a file with huge volume of data given memory constraint

It looks like what you are looking for is
external sorting.

Basically, you sort small chunks of data first, write it back to the disk and then iterate over those to sort all.

How do I sort very large files

That isn't exactly a Java problem. You need to look into an efficient algorithm for sorting data that isn't completely read into memory. A few adaptations to Merge-Sort can achieve this.

Take a look at this:
http://en.wikipedia.org/wiki/Merge_sort

and:
http://en.wikipedia.org/wiki/External_sorting

Basically the idea here is to break the file into smaller pieces, sort them (either with merge sort or another method), and then use the Merge from merge-sort to create the new, sorted file.

How can I sort a very large log file, too large to load into main memory?

If you have GNU sort, use it. It knows how to deal with large files. For details, see the answers to How to sort big files on Unix SE. You will of course need sufficient free disk space.

How could the UNIX sort command sort a very large file?

The Algorithmic details of UNIX Sort command says Unix Sort uses an External R-Way merge sorting algorithm. The link goes into more details, but in essence it divides the input up into smaller portions (that fit into memory) and then merges each portion together at the end.

Fastest way to sort huge (50-100 GB) files when you have enough memory

Use parallel sorting algorithms for huge data.

Useful topic:
Which parallel sorting algorithm has the best average case performance?

How to sort a large file on two levels efficiently?

The UNIX sort utility can handle sorting large data (e.g. larger than your working 16GB of RAM) by creating temporary working files on disk space.

So, I'd recommend simply using UNIX sort for this as you've suggested, invoking the option -T tmp_dir, and making sure that tmp_dir has enough disk space to hold all of the temporary working files that will be created there.

By the way, this is discussed in a previous SO question.



Related Topics



Leave a reply



Submit