Linux Set End of File (Shrink, Truncate, Cut Out Some Data @ End)

Linux set end of file (shrink, truncate, cut out some data @ end)

ftruncate(fd, 10);

(The lseek call isn't needed.)

man 2 ftruncate

How to remove X bytes from the end of a large file without reading the whole file?

use the function truncate

http://linux.die.net/man/2/truncate

int truncate(const char *path, off_t length);
int ftruncate(int fd, off_t length);

truncate takes the file name

ftruncate takes an open file descriptor

both of these set the file length to length so it either truncates or elongates (in the latter case, the rest of the file will be filled with NULL/ZERO)

[edit]
truncate (linux shell command) will work also

**SYNTAX**

truncate -s integer <filename>
**OPTIONS**

-s number specify the new file length. If the new length is smaller than the current filelength data is lost. If the new length is greater the file is padded with 0. You can specify a magnitude character to ease large numbers:
b or B size is bytes.
k size is 1000 bytes.
K size is 1024 bytes.
m size is 10^6 bytes.
M size is 1024^2 bytes.
g size is 10^9 bytes.
G size is 1024^3 bytes.

**EXAMPLES**

To shrink a file to 10 bytes:

truncate -s 10 /tmp/foo

To enlarge or shrink a file to 345 Megabytes:

truncate -s 345M /tmp/foo

[/edit]

How do I limit (or truncate) text file by number of lines?

In-place truncation

To truncate the file in-place with sed, you can do the following:

sed -i '50001,$ d' filename
  • -i means in place.
  • d means delete.
  • 50001,$ means the lines from 50001 to the end.

You can make a backup of the file by adding an extension argument to -i, for example, .backup or .bak:

sed -i.backup '50001,$ d' filename

In OS-X or FreeBSD you must provide an argument to -i - so to do this while avoiding making a backup:

sed -i '' '50001,$ d' filename

The long argument name version is as follows, with and without the backup argument:

sed --in-place '50001,$ d' filename
sed --in-place=.backup '50001,$ d' filename

New File

To create a new truncated file, just redirect from head to the new file:

head -n50000 oldfilename > newfilename
  • -n50000 means the number of lines, head otherwise defaults to 10.
  • > means to redirect into, overwriting anything else that might be there.
  • Substitute >> for > if you mean to append into the new file.

It is unfortunate that you cannot redirect into the same file, which is why sed is recommended for in-place truncation.

No sed? Try Python!

This is a bit more typing than sed. Sed is short for "Stream Editor" after all, and that's another reason to use it, it's what the tool is suited for.

This was tested on Linux and Windows with Python 3:

from collections import deque
from itertools import islice

def truncate(filename, lines):
with open(filename, 'r+') as f:
blackhole = deque((),0).extend
file_iterator = iter(f.readline, '')
blackhole(islice(file_iterator, lines))
f.truncate(f.tell())

To explain the Python:

The blackhole works like /dev/null. It's a bound extend method on a deque with maxlen=0, which is the fastest way to exhaust an iterator in Python (that I'm aware of).

We can't simply loop over the file object because the tell method would be blocked, so we need the iter(f.readline, '') trick.

This function demonstrates the context manager, but it's a bit superfluous since Python would close the file on exiting the function. Usage is simply:

>>> truncate('filename', 50000)

Removing a newline character at the end of a file

Take advantage of the fact that a) the newline character is at the end of the file and b) the character is 1 byte large: use the truncate command to shrink the file by one byte:

# a file with the word "test" in it, with a newline at the end (5 characters total)
$ cat foo
test

# a hex dump of foo shows the '\n' at the end (0a)
$ xxd -p foo
746573740a

# and `stat` tells us the size of the file: 5 bytes (one for each character)
$ stat -c '%s' foo
5

# so we can use `truncate` to set the file size to 4 bytes instead
$ truncate -s 4 foo

# which will remove the newline at the end
$ xxd -p foo
74657374
$ cat foo
test$

You can also roll the sizing and math into a one line command:

truncate -s $(($(stat -c '%s' foo)-1)) foo

How to delete older contents of file that is being continuously written to?

As Carl mentioned in the comments, you cannot really do this on an actively written log file. However, if the initial data is not relevant to you, you can do the following (though beware that you will loose all data)

> out.txt

For future, you can use a utility called logrotate(8)

How to truncate a file in C?

In Windows systems there's no header <unistd.h> but yet you can truncate a file by using

 _chsize( fileno(f), size);

How to truncate the end of a binary file past known address using PowerShell?

To simply truncate a file, i.e. to remove any content beyond a given byte offset, you can use System.IO.File's static OpenWrite() method to obtain a System.IO.FileStream instance and call its .SetLength() method:

$inputFile  = 'C:\StartFile.dat'
$outputFile = 'C:\EndFile_test.dat'

# First, copy the input file to the output file.
Copy-Item -LiteralPath $inputFile -Destination $outputFile

# Open the output file for writing.
$fs = [System.IO.File]::OpenWrite($outputFile)

# Set the file length based on the desired byte offset
# in order to truncate it (assuming it is larger).
$fs.SetLength(0x5A08B0)

$fs.Close()

Note: If the given offset amounts to increasing the size of the file, it seems like the additional space is filled with NUL (0x0) bytes, as a quick test on macOS and Windows suggests; however, it seems like this behavior is not guaranteed, judging by the .SetLength() documentation:

If the stream is expanded, the contents of the stream between the old and the new length are undefined.

Remove very last character in file

Use fileobject.seek() to seek 1 position from the end, then use file.truncate() to remove the remainder of the file:

import os

with open(filename, 'rb+') as filehandle:
filehandle.seek(-1, os.SEEK_END)
filehandle.truncate()

This works fine for single-byte encodings. If you have a multi-byte encoding (such as UTF-16 or UTF-32) you need to seek back enough bytes from the end to account for a single codepoint.

For variable-byte encodings, it depends on the codec if you can use this technique at all. For UTF-8, you need to find the first byte (from the end) where bytevalue & 0xC0 != 0x80 is true, and truncate from that point on. That ensures you don't truncate in the middle of a multi-byte UTF-8 codepoint:

with open(filename, 'rb+') as filehandle:
# move to end, then scan forward until a non-continuation byte is found
filehandle.seek(-1, os.SEEK_END)
while filehandle.read(1) & 0xC0 == 0x80:
# we just read 1 byte, which moved the file position forward,
# skip back 2 bytes to move to the byte before the current.
filehandle.seek(-2, os.SEEK_CUR)

# last read byte is our truncation point, move back to it.
filehandle.seek(-1, os.SEEK_CUR)
filehandle.truncate()

Note that UTF-8 is a superset of ASCII, so the above works for ASCII-encoded files too.



Related Topics



Leave a reply



Submit