Ftell on a File Descriptor

ftell on a file descriptor?

Just use:

position = lseek(fd, 0, SEEK_CUR);

Use ftell to find the file size

File positions are like the cursor in a text entry widget: they are in between the bytes of the file. This is maybe easiest to understand if I draw a picture:

Depiction of a file that is four characters long. There are five boxes in a row; from left to right, they contain the letters "a", "b", "c", and "d", and the fifth one has its area X-ed out.  Below the boxes, aligned with the vertical lines to the left of and in between the boxes, are the numbers 0, 1, 2, 3, and 4.

This is a hypothetical file. It contains four characters: a, b, c, and d. Each character gets a little box to itself, which we call a "byte". (This file is ASCII.) The fifth box has been crossed out because it's not part of the file yet, but but if you appended a fifth character to the file it would spring into existence.

The valid file positions in this file are 0, 1, 2, 3, and 4. There are five of them, not four; they correspond to the vertical lines before, after, and in between the boxes. When you open the file (assuming you don't use "a"), you start out on position 0, the line before the first byte in the file. When you seek to the end, you arrive at position 4, the line after the last byte in the file. Because we start counting from zero, this is also the number of bytes in the file. (This is one of the several reasons why we start counting from zero, rather than one.)

I am obliged to warn you that there are several reasons why

fseek(fp, 0, SEEK_END);
long int nbytes = ftell(fp);

might not give you the number you actually want, depending on what you mean by "file size" and on the contents of the file. In no particular order:

  • On Windows, if you open a file in text mode, the numbers you get from ftell on that file are not byte offsets from the beginning of the file; they are more like fgetpos cookies, that can only be used in a subsequent call to fseek. If you need to seek around in a text file on Windows you may be better off opening the file in binary mode and dealing with both DOS and Unix line endings yourself — this is actually my recommendation for production code in general, because it's perfectly possible to have a file with DOS line endings on a Unix system, or vice versa.

  • On systems where long int is 32 bits, files can easily be bigger than that, in which case ftell will fail, return −1 and set errno to EOVERFLOW. POSIX.1-2001-compliant systems provide a function called ftello that returns an off_t quantity that can represent larger file sizes, provided you put #define _FILE_OFFSET_BITS 64 at the very top of all your source files (before any #includes). I don't know what the Windows equivalent is.

  • If your file contains characters that are beyond ASCII, then the number of bytes in the file is very likely to be different from the number of characters in the file. (For instance, if the file is encoded in UTF-8, the character will take up three bytes, Ä will take up either two or three bytes depending on whether it's "composed", and జ్ఞా will take up twelve bytes because, despite being a single grapheme, it's a string of four Unicode code points.) ftell(o) will still tell you the correct number to pass to malloc, if your goal is to read the entire file into memory, but iterating over "characters" will not be so simple as for (i = 0; i < len; i++).

  • If you are using C's "wide streams" and "wide characters", then, just like text streams on Windows, the numbers you get from ftell on that file are not byte offsets and may not be useful for anything other than subsequent calls to fseek. But wide streams and characters are a bad design anyway; you're actually more likely to be able to handle all the world's languages correctly if you stick to processing UTF-8 by hand in narrow streams and characters.

How to get current offset of stream or file descriptor?

There is no equivalent to ftell() or fseek() in node.js and I'm not really sure why. Instead, you generally specify the position you want to read or write at whenever you read or write with fs.read() or fs.write(). If you want to just write a bunch of data sequentially or you want buffered writing, then you would more typically use a stream which buffers and sequences for you.

Instead, if you want to know where data will be appended, you can fetch the current file length and then use that current file length to know if you're at the beginning of the file after opening it for appending.

Here's node.js code that does something similar to your C code.

const fs = require('fs');

async function myFunc() {
let handle = await fs.promises.open("test.txt");
try {
const {size} = await handle.stat();
await handle.appendFile(size ? "Subsequent line\n" : "First line\n");
} finally {
await handle.close();
}
}

And, if you call this three times like this:

async function test() {
await myFunc();
await myFunc();
await myFunc();
}

test();

You will get your desired three lines in the file:

First line
Subsequent line
Subsequent line

Relationship between file descriptors, file pointers and file position indicators

Use fseek and fread or lseek and read, but do not mix the two APIs, it won't work.

A FILE* has its own internal buffer. fseek may or may not move the internal buffer pointer only. It is not guaranteed that the real file position indicator (one that lseek is responsible for) changes, and if it does, it is not known by how much.

Get position for file descriptor in Python

So, the answer seems to be quite easy. I had to use os.lseek with SEEK_CUR flag:

import os
print(os.lseek(fd, 0, os.SEEK_CUR))

I don't know if it is the only approach, but at least it works fine.

INTERPRETED: ftell on a file descriptor?

Difference between result of ftell(FILE* fd) and lseek(int fd, off_t offset, int whence)

It is dangerous to use both standard library file operations (e.g. fread(3), fseek(3)) along with low-level system calls (e.g. read(2), lseek(3)).

The reason this is problematic is because the standard library will buffer things, and not actually write them out (to the file descriptor) immediately (depending on the buffering mode).

If you need to access the underlying file descriptor, you should be sure to fflush the stream before getting its fileno. I've thrown something like this in a header file:

/**
* Safely get the file descriptor associated with FILE,
* by fflush()ing its contents first.
*/
static inline int safe_fileno(FILE *f)
{
fflush(f);
return fileno(f);
}

Also, once you call this, you probably shouldn't go back to using the FILE* again, because you've changed the kernel file pointer, but the standard library may assume it is unchanged (since you last fflushed it). As was mentioned in a comment, you may be able to re-sync the FILE* with the file descriptor like this:

/**
* Re-synchronize the file offset of a FILE with the
* file offset of its underlying file descriptor.
*/
static inline void fresync(FILE *f)
{
off_t off = lseek(fileno(f), 0, SEEK_CUR);
fseek(f, off, SEEK_SET);
}

interchangeably using FILE pointer and file descriptor

This problem is more complex than you think:

The FILE*-based API (e.g. fread()) uses an internal buffer and depending on the C library (version) you use, you have to consider that ...

  • ... fopen() might already read bytes into the buffer, so the file pointer of fp is 0, but the file pointer of fd is not 0.

    This means that you have to ensure that fd is in sync with fp before using fd.

  • ... the C library might assume that the file pointer of fd is only modified by FILE-based calls.

    This means that you have to restore the file pointer of fd between using read() and calling any function based on FILE.

So the following code might work:

fp = fopen(...)
fd = fileno(fp)
/* Remember file position of fd */
oldpos = lseek(fd, 0, SEEK_CUR);
/* Ensure file position of fd is in sync with fp */
lseek(fd, ftell(fp), SEEK_SET);
read(...,fd)
/* Get new file position of fd */
newpos = lseek(fd, 0, SEEK_CUR);
/* Restore old file position of fd */
lseek(fd, oldpos, SEEK_SET);
/* Keep fp in sync with fd */
fseek(fp, newpos, SEEK_SET);

(However, there is no guarantee that this code works with every C library that exists.)



Related Topics



Leave a reply



Submit