What Is the Most Elegant Way to Read a Text File with C++

What is the most elegant way to read a text file with c++?

There are many ways, you pick which is the most elegant for you.

Reading into char*:

ifstream file ("file.txt", ios::in|ios::binary|ios::ate);
if (file.is_open())
{
file.seekg(0, ios::end);
size = file.tellg();
char *contents = new char [size];
file.seekg (0, ios::beg);
file.read (contents, size);
file.close();
//... do something with it
delete [] contents;
}

Into std::string:

std::ifstream in("file.txt");
std::string contents((std::istreambuf_iterator<char>(in)),
std::istreambuf_iterator<char>());

Into vector<char>:

std::ifstream in("file.txt");
std::vector<char> contents((std::istreambuf_iterator<char>(in)),
std::istreambuf_iterator<char>());

Into string, using stringstream:

std::ifstream in("file.txt");
std::stringstream buffer;
buffer << in.rdbuf();
std::string contents(buffer.str());

file.txt is just an example, everything works fine for binary files as well, just make sure you use ios::binary in ifstream constructor.

Using fread() to read a text based file - best practices

this is not a safe method of performing fread() since there might be an overflow if we try to read an extremely large string. Is this opinion valid?

fread() does not care about strings (null character terminated arrays). It reads data as if it was in multiples of unsigned char*1 with no special concern to the data content if the stream opened in binary mode and perhaps some data processing (e.g. end-of-line, byte-order-mark) in text mode.

Are my assumptions here correct?

Failed assumptions:

  • Assuming ftell() return value equals the sum of fread() bytes.
    The assumption can be false in text mode (as OP opened the file) and fseek() to the end is technical undefined behavior in binary mode.

  • Assuming not checking the return value of fread() is OK. Use the return value of fread() to know if an error occurred, end-of-file and how many multiples of bytes were read.

  • Assuming error checking is not required. , ftell(), fread(), fseek() instead of rewind() all deserve error checks. In particular, ftell() readily fails on streams that have no certain end.

  • Assuming no null characters are read. A text file is not certainly made into one string by reading all and appending a null character. Robust code detects and/or copes with embedded null characters.

  • Multi-byte: assuming input meets the encoding requirements. Example: robust code detects (and rejects) invalid UTF8 sequences - perhaps after reading the entire file.

  • Extreme: Assuming a file length <= LONG_MAX, the max value returned from ftell(). Files may be larger.

but each byte is being read as a single char, so the text is distorted somehow? How is fread() handling reading of multi-byte chars?

fread() does not function on multi-byte boundaries, only multiples of unsigned char. A given fread() may end with a portion of a multi-byte and the next fread() will continue from mid-multi-byte.


Instead of of 2 pass approach consider 1 single pass

// Pseudo code
total_read = 0
Allocate buffer, say 4096

forever
if buffer full
double buffer_size (`realloc()`)
u = unused portion of buffer
fread u bytes into unused portion of buffer
total_read += number_just_read
if (number_just_read < u)
quit loop

Resize buffer total_read (+ 1 if appending a '\0')

Alternatively consider the need to read the entire file in before processing the data. I do not know the higher level goal, but often processing data as it arrives makes for less resource impact and faster throughput.


Advanced

Text files may be simple ASCII only, 8-bit code page defined, one of various UTF encodings (byte-order-mark, etc. The last line may or may not end with a '\n'. Robust text processing beyond simple ASCII is non-trivial.

ASCII and UTF-8 are the most common. IMO, handle 1 or both of those and error out on anything that does not meet their requirements.


*1 fread() reads in multiple of bytes as per the 3rd argument, which is 1 in OP's case.

//                       v --- multiple of 1 byte
fread(content, filesize, 1, fp);

Recommended ways to read certain things in a text file

If your input file structure is static that means it wont change the order; you can use the below instead of your //ReadLines code.

        var allLines = File.ReadAllLines(path);
var dataSet = allLines.Select(line => line.Trim().Split(' ')[1]).ToArray();
// Add conditional checks regarding the length of the dataset and any thing else.
var userName = dataSet[0];
var accesscode = Convert.ToInt32(dataSet[1]);
var value = Convert.ToInt32(dataSet[2]);
var email = dataSet[3];

// Then your console.writeline statements here.

If you are unsure of the order, you can use dictionary to store the both parts of line split one for key and other for value. And then print them.

How to read the content of a file to a string in C?

I tend to just load the entire buffer as a raw memory chunk into memory and do the parsing on my own. That way I have best control over what the standard lib does on multiple platforms.

This is a stub I use for this. you may also want to check the error-codes for fseek, ftell and fread. (omitted for clarity).

char * buffer = 0;
long length;
FILE * f = fopen (filename, "rb");

if (f)
{
fseek (f, 0, SEEK_END);
length = ftell (f);
fseek (f, 0, SEEK_SET);
buffer = malloc (length);
if (buffer)
{
fread (buffer, 1, length, f);
}
fclose (f);
}

if (buffer)
{
// start to process your data / extract strings here...
}

Fastest file reading in C

If you are willing to go beyond the C spec into OS specific code, memory mapping is generally considered the most efficient way.

For Posix, check out mmap and for Windows check out OpenFileMapping

In C, how should I read a text file and print all strings

The simplest way is to read a character, and print it right after reading:

int c;
FILE *file;
file = fopen("test.txt", "r");
if (file) {
while ((c = getc(file)) != EOF)
putchar(c);
fclose(file);
}

c is int above, since EOF is a negative number, and a plain char may be unsigned.

If you want to read the file in chunks, but without dynamic memory allocation, you can do:

#define CHUNK 1024 /* read 1024 bytes at a time */
char buf[CHUNK];
FILE *file;
size_t nread;

file = fopen("test.txt", "r");
if (file) {
while ((nread = fread(buf, 1, sizeof buf, file)) > 0)
fwrite(buf, 1, nread, stdout);
if (ferror(file)) {
/* deal with error */
}
fclose(file);
}

The second method above is essentially how you will read a file with a dynamically allocated array:

char *buf = malloc(chunk);

if (buf == NULL) {
/* deal with malloc() failure */
}

/* otherwise do this. Note 'chunk' instead of 'sizeof buf' */
while ((nread = fread(buf, 1, chunk, file)) > 0) {
/* as above */
}

Your method of fscanf() with %s as format loses information about whitespace in the file, so it is not exactly copying a file to stdout.

Reading from a text file, best way to store data C++

I would be writing something like this if I were you. Note, this is just prototype code, and it was not even tested.

The fundamental idea is to read twice in a line, but with different delimiters. You would read with the tab delimiter first, and then just the default line end.

You need to make sure to gracefully quit the loop when you do not have anything more to read, hence the breaks, albeit the second could be enough if your file is "correct".

You will also need to make sure to convert to the proper type that your vector class expects. I assumed here that is int, but if it is string, you do not need the conversion I have put in place.

#include <string>
#include <fstream>

using namespace std;

void yourFunction()
{
..
ifstream myfile("myfile.txt");
string xword, yword;
while (1) {
if (!getline(myfile, xword, '\t'))
break;
if (!getline(myfile, yword))
break;
myVector.push_back(stoi(xword), stoi(yword));
}
...
}


Related Topics



Leave a reply



Submit