How to Sort a File In-Place

How to sort a file in-place?

You can use the -o, --output=FILE option of sort to indicate the same input and output file:

sort -o file file

Without repeating the filename (with bash brace expansion)

sort -o file{,}

⚠️ Important note: a common mistake is to try to redirect the output to the same input file
(e.g. sort file > file). This does not work as the shell is making the redirections (not the sort(1) program) and the input file (as being the output also) will be erased just before giving the sort(1) program the opportunity of reading it.

How to sort content of a file in-place by a column?

It's easy if you give up on sorting in place:

sort -k 1 original > by_col_1
sort -k 2 original > by_col_2

Sorting a file in-place in Python

The problem is with the 2nd split function having [] instead of () and your sort_list_key method. I have provided a more comprehensive, simple working example with lambda.

file_content = ['0.wav, stop', 
'1.wav, no',
'10.wav, up',
'100.wav, yes',
'1000.wav, bed',
'1001.wav, four',
'1002.wav, three',
'1003.wav, five',
'1004.wav, nine',
'1005.wav, go' ]

file_content.sort(key = lambda f: f.split(',')[0].split('.')[0], reverse=False)
print(file_content)

sort a text file by a certain field in place in shell/python

From the bash command line, try the sort(1) command:

$ sort -k2,2 -n -o log.dat log.dat

How to sort an output file based on second parameter?

with open(filepath) as file:
r = file.readlines()

#splits based on ":" and then sort using the second value ie binary numbers
s = sorted([line.split(":") for line in r[1:-1]], key=lambda x: int(x[1]))

s.insert(0,r[0])
s.append(r[-1])

#Write 's' into File

sorting file in place with Python on unix system

If you don't want to create temporary files, you can use subprocess as in:

import sys
import subprocess

fname = sys.argv[1]
proc = subprocess.Popen(['sort', fname], stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
with open(fname, 'w') as f:
f.write(stdout)

How to sort a file based on key name instead of its position in unix?

sort doesn't have a concept of named keys, but you can perform a Schwartzian transform to temporarily add the key as a prefix to the line, sort on the first field, then discard it.

sed 's/\(.*\)\(party_id="[^"]*"\)/\2    \1\2/' file |
sort -t ' ' -k1,1 |
cut -f2-

(where the whitespace between the two first back references and in the sort -t argument is a literal tab, which however Stack Overflow renders as a sequence of spaces).

C++ sort index file in-place (with heapsort)

Since the first few elements of the array are accessed the most I decided to load the first elements into RAM until I reach the limit (which is passed a parameter). The achieve that I modified my code like that:

// ...
size_t arraySize = 0;
IndexEntry* cacheArray;

void readIntoArray( size_t numElements ) {
if ( arraySize != 0 )
writeFromArray();

arraySize = numElements;
cacheArray = new IndexEntry[arraySize];
file->seekg( 0 );

for ( size_t i = 0; i < arraySize; i++ ) {
file->read( (char*)(cacheArray + i), writeSize );
}
}

void writeFromArray() {
file->seekp( 0 );

for ( size_t i = 0; i < arraySize; i++ ) {
file->write( (char*)(cacheArray + i), writeSize );
}

arraySize = 0;
delete[] cacheArray;
}

void sortIDX( string idxFile, size_t cacheSize, bool quiet ) {
// ...

cacheSize /= writeSize;
readIntoArray( min(cacheSize, numDataSets) );

sorterThread = new thread( heapifyIDX, heapifyLimit );

// ...

sorterThread->join();
delete sorterThread;

writeFromArray();

file->close();
delete file;
}

void readData( IndexEntry* entry, size_t pos ) {
if ( pos < arraySize ) {
*entry = cacheArray[pos];
} else {
file->seekg( pos * writeSize );
file->read( (char*)entry, writeSize );
}
}

void writeData( IndexEntry* entry, size_t pos ) {
if ( pos < arraySize ) {
cacheArray[pos] = *entry;
} else {
file->seekp( pos * writeSize );
file->write( (char*)entry, writeSize );
}
}


Related Topics



Leave a reply



Submit