Read a File Backwards Line by Line Using Fseek

Read a file backwards line by line using fseek

The question is asking using fseek, so can only assume that performance is an issue and file() is not the solution. Here is a simple approach using fseek:

My file.txt

#file.txt
Line 1
Line 2
Line 3
Line 4
Line 5

And the code:

<?php

$fp = fopen('file.txt', 'r');

$pos = -2; // Skip final new line character (Set to -1 if not present)

$lines = array();
$currentLine = '';

while (-1 !== fseek($fp, $pos, SEEK_END)) {
$char = fgetc($fp);
if (PHP_EOL == $char) {
$lines[] = $currentLine;
$currentLine = '';
} else {
$currentLine = $char . $currentLine;
}
$pos--;
}

$lines[] = $currentLine; // Grab final line

var_dump($lines);

Output:

array(5) {
[0]=>
string(6) "Line 5"
[1]=>
string(6) "Line 4"
[2]=>
string(6) "Line 3"
[3]=>
string(6) "Line 2"
[4]=>
string(6) "Line 1"
}

You don't have to append to the $lines array like I am, you can print the output straight away if that is the purpose of your script. Also it is easy to introduce a counter if you want to limit the number of lines.

$linesToShow = 3;
$counter = 0;
while ($counter <= $linesToShow && -1 !== fseek($fp, $pos, SEEK_END)) {
// Rest of code from example. After $lines[] = $currentLine; add:
$counter++;
}

Read file lines backwards (fgets) with php

First way:

$file = file("test.txt");
$file = array_reverse($file);
foreach($file as $f){
echo $f."<br />";
}

Second Way (a):

To completely reverse a file:


$fl = fopen("\some_file.txt", "r");
for($x_pos = 0, $output = ''; fseek($fl, $x_pos, SEEK_END) !== -1; $x_pos--) {
$output .= fgetc($fl);
}
fclose($fl);
print_r($output);

Second Way (b):
Of course, you wanted line-by-line reversal...


$fl = fopen("\some_file.txt", "r");
for($x_pos = 0, $ln = 0, $output = array(); fseek($fl, $x_pos, SEEK_END) !== -1; $x_pos--) {
$char = fgetc($fl);
if ($char === "\n") {
// analyse completed line $output[$ln] if need be
$ln++;
continue;
}
$output[$ln] = $char . ((array_key_exists($ln, $output)) ? $output[$ln] : '');
}
fclose($fl);
print_r($output);

read file backwards (last line first)

It goes like this:

  1. Seek to one byte before the end of the file using fseek. There's no guarantee that the last line will have an EOL so the last byte doesn't really matter.
  2. Read one byte using fgetc.
  3. If that byte is an EOL then the last line is a single empty line and you have it.
  4. Use fseek again to go backwards two bytes and check that byte with fgetc.
  5. Repeat the above until you find an EOL. When you have an EOL, the file pointer will be at the beginning of the next (from the end) line.
  6. ...
  7. Profit.

Basically you have to keep doing (4) and (5) while keeping track of where you were when you found the beginning of a line so that you can seek back there before starting your scan for the beginning of the next line.

As long as you open your file in text mode you shouldn't have have to worry about multibyte EOLs on Windows (thanks for the reminder Mr. Lutz).

If you happen to be given a non-seekable input (such as a pipe), then you're out of luck unless you want to dump your input to a temporary file first.

So you can do it but it is rather ugly.

You could do pretty much the same thing using mmap and a pointer if you have mmap available and the "file" you're working with is mappable. The technique would be pretty much the same: start at the end and go backwards to find the end of the previous line.


Re: "I am the one creating this file. So, can I create in a way its in the reverse order? Is that possible?"

You'll run into the same sorts of problems but they'll be worse. Files in C are inherently sequential lists of bytes that start at the beginning and go to the end; you're trying to work against this fundamental property and going against the fundamentals is never fun.

Do you really need your data in a plain text file? Maybe you need text/plain as the final output but all the way through? You could store the data in an indexed binary file (possibly even an SQLite database) and then you'd only have to worry about keeping (or windowing) the index in memory and that's unlikely to be a problem (and if it is, use a "real" database); then, when you have all your lines, just reverse the index and away you go.

Searching a text file backwards from the end

As noted by commentators, doing IO one character at a time (+ fseek + ftell) is very expensive. Remember that each time the code fgets the first character, the C library will 'read ahead' - reading BUFSIZ characters.

As an alternative, consider 'read before' logic, based on ideas provided above.

  • find out length of pattern pl
  • Allocate a buffer size of bs
  • Read blocks of bs backward. Each block will overlap with previous block for pl characters, eliminating the possibility that a match will occur in between blocks.
  • Assuming bs much larger the pl, cost of overlap reads is minimal, and performance close to optimal.

Error checks reduced to keep code clear.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>

#define BS 100000
int main(int argc, char **argv)
{
FILE *fp = fopen(argv[1], "r");
if ( !fp ) { perror("fopen") ; exit (1) ; };

fseek(fp, 0, SEEK_END) ;
long pos = ftell(fp) ;
fprintf(stderr, "S=%ld\n", pos) ;

const char *pattern = argv[2] ;
const int PL = strlen(pattern) ;

// Read backward
char buff[BS+1] ;
long match = -1;
while ( pos > 0 ) {
pos = pos - (BS-PL) ;
if ( pos < 0 ) pos = 0 ;
fseek(fp, pos, SEEK_SET) ;
int n = fread(buff, sizeof(buff[0]), BS, fp) ;
if ( n > 0 ) {
buff[n] = 0 ;
char *loc = strstr(buff, pattern) ;
if ( loc ) ) { match = pos + (loc-buff) ; break ;} ;
} ;
} ;
fclose(fp) ;
printf("MATCH=%ld\n", match) ;
}

Note that solution will only handle TEXT files. The 'strstr' will NOT work on data loaded from binary file which may contain NUL characters.

Updated 2019-11-12 to correct file position calculation when match is in first block.

Reading large files from end

It's not pure PHP, but the common solution is to use the tac command which is the revert of cat and loads the file in reverse. Use exec() or passthru() to run it on the server and then read the results. Example usage:

<?php
$myfile = 'myfile.txt';
$command = "tac $myfile > /tmp/myfilereversed.txt";
exec($command);
$currentRow = 0;
$numRows = 20; // stops after this number of rows
$handle = fopen("/tmp/myfilereversed.txt", "r");
while (!feof($handle) && $currentRow <= $numRows) {
$currentRow++;
$buffer = fgets($handle, 4096);
echo $buffer."<br>";
}
fclose($handle);
?>

How can I read lines from bottom up using C?

It's important to understand that no modern operating system tracks the position of line breaks within a file. (VMS could, and I'm pretty sure so could some IBM mainframe operating systems, but you're probably not using any of those.) So it's not possible to seek to a line boundary. It is also not possible to read byte-by-byte in reverse order.

Therefore, the simplest way to read the last three numbers in the file, in reverse, is to read the entire file in forward order, keeping the most recently seen three numbers in a buffer. When you hit EOF, just process that buffer backward.

A more efficient, but significantly more complicated, technique is to guess a position close to but before the last three numbers in the file; seek to that position, then discard characters till you hit a line break; and use the technique in the previous paragraph from that point. If you guessed wrong and the buffer winds up with fewer than three numbers in it, guess again.

And a third approach would be to use fseek (with SEEK_END) and fread to read the last 1024 or so bytes of the file, set a pointer to the end of the block, and parse it backward. This would be quite efficient, but would have even more headache-inducing corner cases to get right than the previous suggestion. (What exactly do you do if the last three lines of the file, collectively, are more than 1024 bytes long?)

FYI, the correct way to read floating-point numbers in C is to use fgets and strtod. DO NOT use either atof or scanf for this; atof doesn't tell you about syntax errors, and scanf triggers undefined behavior on overflow.

P.S. If you have the shell utility tac (which is a GNUism), the easiest option of all would be to write your program to process the first three numbers on standard input, and then invoke it as tac < input.file | ./a.out. Skimming the code leads me to believe that tac implements my "third approach", with some additional cleverness.

Reading a text file backwards in C

I recommend a more portable (hopefully) way of file size determination since fseek(binaryStream, offset, SEEK_END) is not guaranteed to work. See the code below.

I believe that files should be at least minimally buffered at the kernel level (e.g. buffering at least one block per file by default), so seeks should not incur significant amount of extra I/O and should only advance the file position internally. If the default buffering is not satisfactory, you may try to use setvbuf() to speed up the I/O.

#include <limits.h>
#include <string.h>
#include <stdio.h>

/* File must be open with 'b' in the mode parameter to fopen() */
long fsize(FILE* binaryStream)
{
long ofs, ofs2;
int result;

if (fseek(binaryStream, 0, SEEK_SET) != 0 ||
fgetc(binaryStream) == EOF)
return 0;

ofs = 1;

while ((result = fseek(binaryStream, ofs, SEEK_SET)) == 0 &&
(result = (fgetc(binaryStream) == EOF)) == 0 &&
ofs <= LONG_MAX / 4 + 1)
ofs *= 2;

/* If the last seek failed, back up to the last successfully seekable offset */
if (result != 0)
ofs /= 2;

for (ofs2 = ofs / 2; ofs2 != 0; ofs2 /= 2)
if (fseek(binaryStream, ofs + ofs2, SEEK_SET) == 0 &&
fgetc(binaryStream) != EOF)
ofs += ofs2;

/* Return -1 for files longer than LONG_MAX */
if (ofs == LONG_MAX)
return -1;

return ofs + 1;
}

/* File must be open with 'b' in the mode parameter to fopen() */
/* Set file position to size of file before reading last line of file */
char* fgetsr(char* buf, int n, FILE* binaryStream)
{
long fpos;
int cpos;
int first = 1;

if (n <= 1 || (fpos = ftell(binaryStream)) == -1 || fpos == 0)
return NULL;

cpos = n - 1;
buf[cpos] = '\0';

for (;;)
{
int c;

if (fseek(binaryStream, --fpos, SEEK_SET) != 0 ||
(c = fgetc(binaryStream)) == EOF)
return NULL;

if (c == '\n' && first == 0) /* accept at most one '\n' */
break;
first = 0;

if (c != '\r') /* ignore DOS/Windows '\r' */
{
unsigned char ch = c;
if (cpos == 0)
{
memmove(buf + 1, buf, n - 2);
++cpos;
}
memcpy(buf + --cpos, &ch, 1);
}

if (fpos == 0)
{
fseek(binaryStream, 0, SEEK_SET);
break;
}
}

memmove(buf, buf + cpos, n - cpos);

return buf;
}

int main(int argc, char* argv[])
{
FILE* f;
long sz;

if (argc < 2)
{
printf("filename parameter required\n");
return -1;
}

if ((f = fopen(argv[1], "rb")) == NULL)
{
printf("failed to open file \'%s\'\n", argv[1]);
return -1;
}

sz = fsize(f);
// printf("file size: %ld\n", sz);

if (sz > 0)
{
char buf[256];
fseek(f, sz, SEEK_SET);
while (fgetsr(buf, sizeof(buf), f) != NULL)
printf("%s", buf);
}

fclose(f);
return 0;
}

I've only tested this on windows with 2 different compilers.



Related Topics



Leave a reply



Submit