C++ High Performance File Reading and Writing (C++14)

C++ High Performance File Reading and Writing (C++14)

The fastest option, if you have the memory to do it, is to read the entire file into a buffer with 1 read, process the buffer in memory, and write it all out again with 1 write.

Read it all:

std::string buffer;

std::ifstream f("file.txt");
f.seekg(0, std::ios::end);
f.read(buffer.data(), buffer.size());

Then process it

Then write it all:

std::ofstream f("file.txt");
f.write(buffer.data(), buffer.size());

Fastest file reading in C

If you are willing to go beyond the C spec into OS specific code, memory mapping is generally considered the most efficient way.

For Posix, check out mmap and for Windows check out OpenFileMapping

Fast textfile reading in c++

Updates: Be sure to check the (surprising) updates below the initial answer

Memory mapped files have served me well1:

#include <boost/iostreams/device/mapped_file.hpp> // for mmap
#include <algorithm> // for std::find
#include <iostream> // for std::cout
#include <cstring>

int main()
boost::iostreams::mapped_file mmap("input.txt", boost::iostreams::mapped_file::readonly);
auto f = mmap.const_data();
auto l = f + mmap.size();

uintmax_t m_numLines = 0;
while (f && f!=l)
if ((f = static_cast<const char*>(memchr(f, '\n', l-f))))
m_numLines++, f++;

std::cout << "m_numLines = " << m_numLines << "\n";

This should be rather quick.


In case it helps you test this approach, here's a version using mmap directly instead of using Boost: see it live on Coliru

#include <algorithm>
#include <iostream>
#include <cstring>

// for mmap:
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>

const char* map_file(const char* fname, size_t& length);

int main()
size_t length;
auto f = map_file("test.cpp", length);
auto l = f + length;

uintmax_t m_numLines = 0;
while (f && f!=l)
if ((f = static_cast<const char*>(memchr(f, '\n', l-f))))
m_numLines++, f++;

std::cout << "m_numLines = " << m_numLines << "\n";

void handle_error(const char* msg) {

const char* map_file(const char* fname, size_t& length)
int fd = open(fname, O_RDONLY);
if (fd == -1)

// obtain file size
struct stat sb;
if (fstat(fd, &sb) == -1)

length = sb.st_size;

const char* addr = static_cast<const char*>(mmap(NULL, length, PROT_READ, MAP_PRIVATE, fd, 0u));
if (addr == MAP_FAILED)

// TODO close fd at some point in time, call munmap(...)
return addr;


The last bit of performance I could squeeze out of this I found by looking at the source of GNU coreutils wc. To my surprise using the following (greatly simplified) code adapted from wc runs in about 84% of the time taken with the memory mapped file above:

static uintmax_t wc(char const *fname)
static const auto BUFFER_SIZE = 16*1024;
int fd = open(fname, O_RDONLY);
if(fd == -1)

/* Advise the kernel of our access pattern. */
posix_fadvise(fd, 0, 0, 1); // FDADVICE_SEQUENTIAL

char buf[BUFFER_SIZE + 1];
uintmax_t lines = 0;

while(size_t bytes_read = read(fd, buf, BUFFER_SIZE))
if(bytes_read == (size_t)-1)
handle_error("read failed");
if (!bytes_read)

for(char *p = buf; (p = (char*) memchr(p, '\n', (buf + bytes_read) - p)); ++p)

return lines;

1 see e.g. the benchmark here: How to parse space-separated floats in C++ quickly?

Fastest way to do many small, blind writes on a huge file (in C++)?

I've tried partially sorting the record numbers by putting the A->B and B->A mappings in a sparse array, and flushing the densest clusters of entries to disk whenever I run out of memory.
it seems that it will incur extremely high syscall overhead.

You can use memory mapped access to the file to avoid syscall overhead. mmap() on *NIX, and CreateFileMapping() on Windows.

Split file logically into blocks, e.g. 32MB. If somethings needs to be changed in the block, mmap() it , modify data, optionally msync() if desired, munmap() and then move to the next block.

That would have been something I have tried first. OS would automatically read whatever needs to be read (on first access to the data), and it will queue IO anyway it likes.

Important things to keep in mind is that the real IO isn't that fast. Performance-wise limiting factors for random access are (1) the number of IOs per second (IOPS) storage can handle and (2) the number of disk seeks. (Usual IOPS is in hundreds range. Usual seek latency is 3-5ms.) Storage for example can read/write 50MB/s: one continuous block of 50MB in one second. But if you would try to patch byte-wise 50MB file, then seek times would simply kill the performance. Up to some limit, it is OK to read more and write more, even if to update only few bytes.

Another limit to observe is the OS' max size of IO operation: it depends on the storage but most OSs would split IO tasks larger than 128K. The limit can be changed and best if it is synchronized with the similar limit in the storage.

Also keep in mind the storage. Many people forget that storage is often only one. I'm trying here to say that starting crapload of threads doesn't help IO, unless you have multiple storages. Even single CPU/core is capable of easily saturating RAID10 with its 800 read IOPS and 400 write IOPS limits. (But a dedicated thread per storage at least theoretically makes sense.)

Hope that helps. Other people here often mention Boost.Asio which I have no experience with - but it is worth checking.

P.S. Frankly, I would love to hear other (more informative) responses to your question. I was in the boat several times already, yet had no chance to really get down to it. Books/links/etc related to IO optimizations (regardless of platform) are welcome ;)

C Programming File Reading/Writing Technique

This is a classic case you'll encounter time and time again in programming: do I optimize for speed or memory usage?

And, like all such conundrums, there is no "correct" answer or perfect solution. In other words, you and your classmate are both right in your solutions to the problem.

With your solution of loading all of the records into memory, you "spend" memory in order to make accessing and modifying each of those records faster at run time. Storing all of the records in an array in memory takes up space, but because memory access is almost infinitely faster than disk access, your approach is going to run a lot faster than your classmate's.

By way of contrast, your classmate conserves RAM by waiting to load the data on demand from the hard disk. But that's going to cost her: hitting the hard disk is a terribly expensive process compared to fetching data that's already in memory, and she's going to be stuck doing this each time the user makes a change. Think about how long it takes to start a program versus switching to one that's already open.

And therein lies the tradeoff. Some of the important things to ask yourself here are:

  1. Is the data set (in the common configurations you'll be dealing with) too large (or going to become too large) to fit completely in memory? If you're dealing with typically small sets of data, computers now have enough RAM that it's probably worth it.

  2. How fast do you need to be able to access the data? Is real-time access important? Is it a particularly large or complex data set that would take too long to load from the hard disk on demand? What kind of performance do your users expect?

  3. What kind of system is your application targeting? Sometimes embedded systems and other special cases necessitate their own unique design approaches. You might have an abundance of RAM and very limited amounts of fixed storage, or you might have exactly the opposite. If you're using standard, modern PC hardware, what do your users want/need/already have? If most of your target users are using relatively "beefy" hardware already, you might make different design decisions than if you're aiming to target a larger potential audience—you've surely seen these trade offs made explicit before through a program's expressed system requirements.

  4. Do you need to allow for special situations? Things like concurrent access by multiple users make keeping all of your data in memory much more difficult. How are other users going to be able to read in the data that's only stored in memory on a local computer? Sharing a common file (perhaps even on a shared server) is probably going to be necessary here.

  5. Are there certain portions of your data that are accessed more frequently than others? Consider keeping those specific portions always in memory and lazy-loading the rest (meaning, you only attempt to fetch them into memory when/if they are accessed by the user).

And as that last point hints, something of a balanced or combined approach is probably about as close as you'll come to an "ideal" solution. You could store as much of the data in RAM as possible, while periodically writing any edits or modifications back to the file on disk during your application's idle state. There's plenty of time that the average program spends waiting on the user to do something, as opposed to the other way around. You can take advantage of these idle CPU cycles to flush out things being held in memory back to the disk without incurring any noticeable speed penalty. This approach is used all the time in software development, and helps to avoid the pitfall pointed out by EClaesson's answer. If your application crashes or otherwise quits unexpectedly, only a very small portion of data is likely to be lost because most of it was already committed to disk behind the scenes.

Postscript: Of course, Dark Falcon's answer is correct that in a production application, you would more than likely use something like a database to handle the data. But since this appears to be for educational purposes, I think understanding the basic trade offs behind each approach is far more important.

Related Topics

Leave a reply