How to Read a Large File Line by Line

How can I read large text files partially (out-of-core)?

I provided this answer because Keith's, while succinct, doesn't close the file explicitly

with open("log.txt") as infile:
for line in infile:
do_something_with(line)

How to read a large file - line by line?

The correct, fully Pythonic way to read a file is the following:

with open(...) as f:
for line in f:
# Do something with 'line'

The with statement handles opening and closing the file, including if an exception is raised in the inner block. The for line in f treats the file object f as an iterable, which automatically uses buffered I/O and memory management so you don't have to worry about large files.

There should be one -- and preferably only one -- obvious way to do it.

How to read a large text file line-by-line and append this stream to a file line-by-line in Ruby?

One of the great things about Ruby is that you can do file IO in a block:

File.open("test.txt", "r").each_line do |row|
puts row
end # file closed here

so things get cleaned up automatically. Maybe it doesn't matter on a little script but it's always nice to know you can get it for free.

Reading a large text file (over 4 million lines) and parsing each line in .NET

You probably get the exception at LogEntries.Add in ProcessLine, because you have so many log entries that this collection gets too large for memory.

So you should store the entries into database immediately without adding them to the list.

But you should read only one line, then process it, then read the next line and forget the previous one. File.ReadAllLines will read all lines at once into a string[] which will occupy the memory(or cause an OutOfMemoryException).

You could use a StreamReader os File.ReadLines instead.

Read large files line by line in Rust

You want to use the buffered reader, BufRead, and specifically the function BufReader.lines():

use std::fs::File;
use std::io::{self, prelude::*, BufReader};

fn main() -> io::Result<()> {
let file = File::open("foo.txt")?;
let reader = BufReader::new(file);

for line in reader.lines() {
println!("{}", line?);
}

Ok(())
}

Note that you are not returned the linefeed, as said in the documentation.


If you do not want to allocate a string for each line, here is an example to reuse the same buffer:

fn main() -> std::io::Result<()> {
let mut reader = my_reader::BufReader::open("Cargo.toml")?;
let mut buffer = String::new();

while let Some(line) = reader.read_line(&mut buffer) {
println!("{}", line?.trim());
}

Ok(())
}

mod my_reader {
use std::{
fs::File,
io::{self, prelude::*},
};

pub struct BufReader {
reader: io::BufReader<File>,
}

impl BufReader {
pub fn open(path: impl AsRef<std::path::Path>) -> io::Result<Self> {
let file = File::open(path)?;
let reader = io::BufReader::new(file);

Ok(Self { reader })
}

pub fn read_line<'buf>(
&mut self,
buffer: &'buf mut String,
) -> Option<io::Result<&'buf mut String>> {
buffer.clear();

self.reader
.read_line(buffer)
.map(|u| if u == 0 { None } else { Some(buffer) })
.transpose()
}
}
}

Playground

Or if you prefer a standard iterator, you can use this Rc trick I shamelessly took from Reddit:

fn main() -> std::io::Result<()> {
for line in my_reader::BufReader::open("Cargo.toml")? {
println!("{}", line?.trim());
}

Ok(())
}

mod my_reader {
use std::{
fs::File,
io::{self, prelude::*},
rc::Rc,
};

pub struct BufReader {
reader: io::BufReader<File>,
buf: Rc<String>,
}

fn new_buf() -> Rc<String> {
Rc::new(String::with_capacity(1024)) // Tweakable capacity
}

impl BufReader {
pub fn open(path: impl AsRef<std::path::Path>) -> io::Result<Self> {
let file = File::open(path)?;
let reader = io::BufReader::new(file);
let buf = new_buf();

Ok(Self { reader, buf })
}
}

impl Iterator for BufReader {
type Item = io::Result<Rc<String>>;

fn next(&mut self) -> Option<Self::Item> {
let buf = match Rc::get_mut(&mut self.buf) {
Some(buf) => {
buf.clear();
buf
}
None => {
self.buf = new_buf();
Rc::make_mut(&mut self.buf)
}
};

self.reader
.read_line(buf)
.map(|u| if u == 0 { None } else { Some(Rc::clone(&self.buf)) })
.transpose()
}
}
}

Playground

read huge text file line by line in C++ with buffering

I've translated my own buffering code from my java project and it does what I need. I had to put defines to overcome problems with M$VC 2010 compiler tellg, that always gives wrong negative values on huge files. This algorithm gives desired speed ~100MB/s, though it does some usless new[].

void readFileFast(ifstream &file, void(*lineHandler)(char*str, int length, __int64 absPos)){
int BUF_SIZE = 40000;
file.seekg(0,ios::end);
ifstream::pos_type p = file.tellg();
#ifdef WIN32
__int64 fileSize = *(__int64*)(((char*)&p) +8);
#else
__int64 fileSize = p;
#endif
file.seekg(0,ios::beg);
BUF_SIZE = min(BUF_SIZE, fileSize);
char* buf = new char[BUF_SIZE];
int bufLength = BUF_SIZE;
file.read(buf, bufLength);

int strEnd = -1;
int strStart;
__int64 bufPosInFile = 0;
while (bufLength > 0) {
int i = strEnd + 1;
strStart = strEnd;
strEnd = -1;
for (; i < bufLength && i + bufPosInFile < fileSize; i++) {
if (buf[i] == '\n') {
strEnd = i;
break;
}
}

if (strEnd == -1) { // scroll buffer
if (strStart == -1) {
lineHandler(buf + strStart + 1, bufLength, bufPosInFile + strStart + 1);
bufPosInFile += bufLength;
bufLength = min(bufLength, fileSize - bufPosInFile);
delete[]buf;
buf = new char[bufLength];
file.read(buf, bufLength);
} else {
int movedLength = bufLength - strStart - 1;
memmove(buf,buf+strStart+1,movedLength);
bufPosInFile += strStart + 1;
int readSize = min(bufLength - movedLength, fileSize - bufPosInFile - movedLength);

if (readSize != 0)
file.read(buf + movedLength, readSize);
if (movedLength + readSize < bufLength) {
char *tmpbuf = new char[movedLength + readSize];
memmove(tmpbuf,buf,movedLength+readSize);
delete[]buf;
buf = tmpbuf;
bufLength = movedLength + readSize;
}
strEnd = -1;
}
} else {
lineHandler(buf+ strStart + 1, strEnd - strStart, bufPosInFile + strStart + 1);
}
}
lineHandler(0, 0, 0);//eof
}

void lineHandler(char*buf, int l, __int64 pos){
if(buf==0) return;
string s = string(buf, l);
printf(s.c_str());
}

void loadFile(){
ifstream infile("file");
readFileFast(infile,lineHandler);
}


Related Topics



Leave a reply



Submit