How to Remove X Bytes from the End of a Large File Without Reading the Whole File

How to remove X bytes from the end of a large file without reading the whole file?

use the function truncate

http://linux.die.net/man/2/truncate

int truncate(const char *path, off_t length);
int ftruncate(int fd, off_t length);

truncate takes the file name

ftruncate takes an open file descriptor

both of these set the file length to length so it either truncates or elongates (in the latter case, the rest of the file will be filled with NULL/ZERO)

[edit]

truncate (linux shell command) will work also

**SYNTAX**

truncate -s integer <filename>
**OPTIONS**

-s number specify the new file length. If the new length is smaller than the current filelength data is lost. If the new length is greater the file is padded with 0. You can specify a magnitude character to ease large numbers:
b or B size is bytes.
k size is 1000 bytes.
K size is 1024 bytes.
m size is 10^6 bytes.
M size is 1024^2 bytes.
g size is 10^9 bytes.
G size is 1024^3 bytes.


**EXAMPLES**

To shrink a file to 10 bytes:

truncate -s 10 /tmp/foo

To enlarge or shrink a file to 345 Megabytes:

truncate -s 345M /tmp/foo

[/edit]

fstream delete N bytes from the end of a binary file

I'm not aware of a generic C++ (platform independent) way to do this without writing a new file. However, on POSIX systems (Linux, etc.) you can use the ftruncate() function. On Windows, you can use SetEndOfFile().

This also means you'll need to open the file using the native functions instead of fstream since you need the native descriptor/handle for those functions.

EDIT: If you are able to use the Boost library, it has a resize_file() function in its Filesystem library which would do what you want.

Node.js v0.10: Replace certain bytes in file without reading whole file

As you've discovered, fs.write with r+ mode allows you to overwrite bytes. This suffices for the case where the added and deleted pieces are exactly the same length.

When the added text is shorter than the deleted text, I advise that you not fill in with \x00 bytes, as you suggest in one of your edits. Those are perfectly valid characters in most types of files (in source code, they will usually cause the compiler/interpreter to throw an error).

The short answer is that this is not generally possible. This is not an abstraction issue; at the file system level, files are stored in chunks of contiguous bytes. There is no generic way to insert/remove from the middle of a file.

The correct way to do this is to seek to the first byte you need to change, and then write the rest of the file (unless you get to a point at which you've added/deleted the same number of bytes, in which case you can stop writing).

In order to avoid issues with crashing during a long write or something like that, it is common to write to a temporary file location and then mv the temporary file in place of the actual file you wish to save.

Add/Remove bytes from end of file on Windows

You truncate a file by calling SetFilePointer or SetFilePointerEx to the desired location followed by SetEndOfFile. The following shows how a truncate function can be implemented:

bool truncate( HANDLE hFile, LARGE_INTEGER NewSize ) {
LARGE_INTEGER Size = { 0 };
if ( GetFileSizeEx( hFile, &Size ) ) {
LARGE_INTEGER Distance = { 0 };
// Negative values move the pointer backward in the file
Distance.QuadPart = NewSize.QuadPart - Size.QuadPart;
return ( SetFilePointerEx( hFile, Distance, NULL, FILE_END ) &&
SetEndOfFile( hFile ) );
}
return false;
}

// Helper function taking a file name instead of a HANDLE
bool truncate( const std::wstring& PathName, LARGE_INTEGER NewSize ) {
HANDLE hFile = CreateFileW( PathName.c_str(), GENERIC_WRITE, FILE_SHARE_READ,
NULL, OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL, NULL );
if ( hFile == INVALID_HANDLE_VALUE ) {
return false;
}
bool Success = truncate( hFile, NewSize );
CloseHandle( hFile );
return Success;
}

EDIT: Shorter Version
The truncate function can be shortened to the following:

bool truncate( HANDLE hFile, LARGE_INTEGER NewSize ) {
return ( SetFilePointerEx( hFile, NewSize, NULL, FILE_BEGIN ) &&
SetEndOfFile( hFile ) );
}

If you would rather want to pass the amount of bytes by which to shrink the file, the following implementation can be used:

bool truncate( HANDLE hFile, LARGE_INTEGER ShrinkBy ) {
ShrinkBy.QuadPart = -ShrinkBy.QuadPart;
return ( SetFilePointerEx( hFile, ShrinkBy, NULL, FILE_END ) &&
SetEndOfFile( hFile ) );
}

To grow a file, open the file using CreateFile with a dwDesiredAccess that contains FILE_APPEND_DATA. Using SetFilePointer again to set the file pointer to the end of file you can then write new data calling WriteFile. For an example, see Appending One File to Another File.


EDIT: Growing a file without writing to it

If you don't care about the file contents beyond the original file size you can apply the same sequence as shown for truncating a file to extend it:

bool SetFileSize( HANDLE hFile, LARGE_INTEGER NewSize ) {
return ( SetFilePointerEx( hFile, NewSize, NULL, FILE_BEGIN ) &&
SetEndOfFile( hFile ) );
}

This is documented behavior for SetEndOfFile:

The SetEndOfFile function can be used to truncate or extend a file. If the file is extended, the contents of the file between the old end of the file and the new end of the file are not defined.

Remove number of bytes from beginning of file


f = open('filename.ext', 'rb')
f.seek(255) # skip the first 255 bytes
rest = f.read() # read rest

Is there an efficient way to crop a file (remove x number of bytes from tail)?

RandomAccessFile.setLength(long newLength)

Remove a single row from a csv without copying files

You have a fundamental problem here. No current filesystem (that I am aware of) provides a facility to remove a bunch of bytes from the middle of a file. You can overwrite existing bytes, or write a new file. So, your options are:

  • Create a copy of the file without the offending line, delete the old one, and rename the new file in place. (This is the option you want to avoid).
  • Overwrite the bytes of the line with something that will be ignored. Depending on exactly what is going to read the file, a comment character might work, or spaces might work (or possibly even \0). If you want to be completely generic though, this is not an option with CSV files, because there is no defined comment character.
  • As a last desperate measure, you could:

    • read up to the line you want to remove
    • read the rest of the file into memory
    • and overwrite the line and all subsequent lines with the data you want to keep.
    • truncate the file as the final position (filesystems usually allow this).

The last option obviously doesn't help much if you are trying to remove the first line (but it is handy if you want to remove a line near the end). It is also horribly vulnerable to crashing in the middle of the process.



Related Topics



Leave a reply



Submit