How to Sort a 10Gb File

Sorting 10GB Data in 1 GB memory. How will I do it?

split the file into parts (buffers) that you can sort in-place

then when all buffers are sorted take 2 (or more) at the time and merge them (like merge sort) until there's only 1 buffer remaining which will be the sorted file

How can I sort a very large log file, too large to load into main memory?

If you have GNU sort, use it. It knows how to deal with large files. For details, see the answers to How to sort big files on Unix SE. You will of course need sufficient free disk space.

Sort a file with huge volume of data given memory constraint

It looks like what you are looking for is
external sorting.

Basically, you sort small chunks of data first, write it back to the disk and then iterate over those to sort all.

How do I sort very large files

That isn't exactly a Java problem. You need to look into an efficient algorithm for sorting data that isn't completely read into memory. A few adaptations to Merge-Sort can achieve this.

Take a look at this:
http://en.wikipedia.org/wiki/Merge_sort

and:
http://en.wikipedia.org/wiki/External_sorting

Basically the idea here is to break the file into smaller pieces, sort them (either with merge sort or another method), and then use the Merge from merge-sort to create the new, sorted file.

Sort the contents of a large file with low RAM

When we sort items where all the items fit in the memory, we call it internal sorting. When we sort items where items is too big to store in the memory, we call it external sorting.

Art of Computer Programming Vol 3: Sorting and Searching on Page 248 discuss detail algorithm for external sorting (one is merge sort).

You also mention that file contains 50GB of numbers. Maybe there is a lot of duplicated number. You might as well using counting sort if there are a lotof duplicated.



Related Topics



Leave a reply



Submit