How to Find The Byte Position of Specific Line in a File

How to print the whole line that contains a specified byte offset in a file?

With GNU awk, keep the number of bytes read so far in a variable, and when it reaches your byte offset print the current line and exit. E.g.:

$ awk -b '{ nb += length + 1 } nb >= 80 { print; exit }' file
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut

The keyword length is a shorthand for length($0), which returns the length of the current line in bytes (thanks to -b). We need to add 1 to it as awk strips off the line terminator.

Get line from file at specified byte offset

with open(filename, 'r') as f:    
for offset in offsets:
f.seek(offset)
print(f.readline())

References:

  • with statement
  • open
  • seek
  • readline

How to read a file lines between two bytes position

First you need to find the byte position of the start of a line at the 1/4, 1/2, and 3/4 points. To do that:

  • fseek to the approximate position (e.g fseek(filesize/4))
  • call fgets to read up to the next newline
  • call ftell to determine the offset

The offset returned is the end of one quarter and the beginning of the next.

To read one quarter of the file:

  • fseek to the beginning of the quarter
  • call fgets to read a line
  • call ftell to see if you've reached the end of the quarter

Getting byte offset of line in a text file?

Another approach would be to count the bytes of each line line this

        BufferedReader br = null;   
try {

String line;
// in my test each character was one byte
ArrayList<Integer> byteoffset = new ArrayList<Integer>();

br = new BufferedReader(new FileReader("numbers.txt"));
Integer l = 0;
while ((line = br.readLine()) != null) {
System.out.println(line);
Integer num_bytes = line.getBytes().length;
System.out.println(num_bytes);
byteoffset.add( l==0 ? num_bytes : byteoffset.get(l-1)+num_bytes );
l++;
}

} catch ( Exception e) {

}

in this example you would also need to add the size of the newline character \n to size of each line

How to know the byte position of a row of a CSV file in python?

If by "byte position" you mean the byte position as if you had read the file in as a normal text file, then my suggestion is to do just that. Read in the file line by line as text, and get the position within the line that way. You can still parse the CSV data row by row yourself using the csv module:

for line in myfile:
row = csv.reader([line]).next()

I think it is perfectly good design for the CSV reader to not provide a byte position of this kind, because it really doesn't make much sense in a CSV context. After all, "data" and data are the exact same four bytes of data as far as CSV is concerned, but the d might be the 2nd byte or the 1st byte depending on whether the optional surrounding quotes were used.

Go to a specific line in file and read it

This code appears to be using line numbers as byte offsets. If you seek to offset "1" the file seeks forward 1 byte, not 1 line. If you seek to offset 2, the file seeks forward 2 bytes, not 2 lines.

To seek to a specific line you need to read the file and count the number of line breaks until you get to the line you want. There is code that already does this, for example std::getline(). If you don't already know the exact byte offset of the line you want, you can call std::getline() the number of times equal to the line number you want.

Also remember that the first byte of a file is at offset 0 not offset 1, and that different platforms use different bytes to indicate the end of a line (E.g. on Windows it's "\r\n", on Unix it's "\n"). If you're using a library function to read lines, the line ending should be taken care of for you.

find specific Byte in File and read until specific byte in Lua

You can specify a file position using file:seek and read a certain number of characters (bytes) by providing an integer to file:read

local file = file:open(somePath)
if file then
-- set cursor to -5 bytes from the file's end
file:seek("end", -5)
-- read 3 bytes
print(file:read(3))
file:close()
end

You cannot search in a file without reading it. If you don't want to read the entire file you can read it in chunks either by reading it linewise (if there are lines in your file) or by reading a specific number of bytes each time until you find something.
Of course you can also read it byte-wise.

You can argue if it makes more sense to read a 64 byte file as a whole or in chunks. I mean in most scenarios you won't notice any difference.

So you could file:read(1) in a loop that terminates once you found a V or reach the end of the file.

local file = io.open(somePath)
if file then
local data = ""
for i = 1, 64 do
local b = file:read(1)
if not b then print("no V in file") data = nil break end
data = data .. b
if b == "V" then print(data) break end
end
file:close()
end

vs

local file = io.open("d:/test.txt", "r")
if file then
local data = file:read("a")
local pos = data:find("V")
if pos then
print(data:sub(1, pos))
end
file:close()
end

C : Best way to go to a known line of a file

You cannot access directly to a given line of a textual file (unless all lines have the same size in bytes; and with UTF8 everywhere a Unicode character can take a variable number of bytes, 1 to 6; and in most cases lines have various length - different from one line to the next). So you cannot use fseek (because you don't know in advance the file offset).

However (at least on Linux systems), lines are ending with \n (the newline character). So you could read byte by byte and count them:

int c= EOF;
int linecount=1;
while ((c=fgetc(file)) != EOF) {
if (c=='\n')
linecount++;
}

You then don't need to store the entire line.

So you could reach the line #45 this way (using while ((c=fgetc(file)) != EOF) && linecount<45) ...) and only then read entire lines with fgets or better yet getline(3) on POSIX systems (see this example). Notice that the implementation of fgets or of getline is likely to be built above fgetc, or at least share some code with it. Remember that <stdio.h> is buffered I/O, see setvbuf(3) and related functions.


Another way would be to read the file in two passes. A first pass stores the offset (using ftell(3)...) of every line start in some efficient data structure (a vector, an hashtable, a tree...). A second pass use that data structure to retrieve the offset (of the line start), then use fseek(3) (using that offset).


A third way, POSIX specific, would be to memory-map the file using mmap(2) into your virtual address space (this works well for not too huge files, e.g. of less than a few gigabytes). With care (you might need to mmap an extra ending page, to ensure the data is zero-byte terminated) you would then be able to use strchr(3) with '\n'

In some cases, you might consider parsing your textual file line by line (using appropriately fgets, or -on Linux- getline, or generating your parser with flex and bison) and storing each line in a relational database (such as PostGreSQL or sqlite).

PS. BTW, the notion of lines (and the end-of-line mark) vary from one OS to the next. On Linux the end-of-line is a \n character. On Windows lines are rumored to end with \r\n, etc...



Related Topics



Leave a reply



Submit