Why Is Reading Lines from Stdin Much Slower in C++ Than Python

Why is reading lines from stdin much slower in C++ than Python?

tl;dr: Because of different default settings in C++ requiring more system calls.

By default, cin is synchronized with stdio, which causes it to avoid any input buffering. If you add this to the top of your main, you should see much better performance:

std::ios_base::sync_with_stdio(false);

Normally, when an input stream is buffered, instead of reading one character at a time, the stream will be read in larger chunks. This reduces the number of system calls, which are typically relatively expensive. However, since the FILE* based stdio and iostreams often have separate implementations and therefore separate buffers, this could lead to a problem if both were used together. For example:

int myvalue1;
cin >> myvalue1;
int myvalue2;
scanf("%d",&myvalue2);

If more input was read by cin than it actually needed, then the second integer value wouldn't be available for the scanf function, which has its own independent buffer. This would lead to unexpected results.

To avoid this, by default, streams are synchronized with stdio. One common way to achieve this is to have cin read each character one at a time as needed using stdio functions. Unfortunately, this introduces a lot of overhead. For small amounts of input, this isn't a big problem, but when you are reading millions of lines, the performance penalty is significant.

Fortunately, the library designers decided that you should also be able to disable this feature to get improved performance if you knew what you were doing, so they provided the sync_with_stdio method. From this link (emphasis added):

If the synchronization is turned off, the C++ standard streams are allowed to buffer their I/O independently, which may be considerably faster in some cases.

Why is splitting a string slower in C++ than Python?

As a guess, Python strings are reference counted immutable strings, so that no strings are copied around in the Python code, while C++ std::string is a mutable value type, and is copied at the smallest opportunity.

If the goal is fast splitting, then one would use constant time substring operations, which means only referring to parts of the original string, as in Python (and Java, and C#…).

The C++ std::string class has one redeeming feature, though: it is standard, so that it can be used to pass strings safely and portably around where efficiency is not a main consideration. But enough chat. Code -- and on my machine this is of course faster than Python, since Python's string handling is implemented in C which is a subset of C++ (he he):

#include <iostream>                                                              
#include <string>
#include <sstream>
#include <time.h>
#include <vector>

using namespace std;

class StringRef
{
private:
char const* begin_;
int size_;

public:
int size() const { return size_; }
char const* begin() const { return begin_; }
char const* end() const { return begin_ + size_; }

StringRef( char const* const begin, int const size )
: begin_( begin )
, size_( size )
{}
};

vector<StringRef> split3( string const& str, char delimiter = ' ' )
{
vector<StringRef> result;

enum State { inSpace, inToken };

State state = inSpace;
char const* pTokenBegin = 0; // Init to satisfy compiler.
for( auto it = str.begin(); it != str.end(); ++it )
{
State const newState = (*it == delimiter? inSpace : inToken);
if( newState != state )
{
switch( newState )
{
case inSpace:
result.push_back( StringRef( pTokenBegin, &*it - pTokenBegin ) );
break;
case inToken:
pTokenBegin = &*it;
}
}
state = newState;
}
if( state == inToken )
{
result.push_back( StringRef( pTokenBegin, &*str.end() - pTokenBegin ) );
}
return result;
}

int main() {
string input_line;
vector<string> spline;
long count = 0;
int sec, lps;
time_t start = time(NULL);

cin.sync_with_stdio(false); //disable synchronous IO

while(cin) {
getline(cin, input_line);
//spline.clear(); //empty the vector for the next line to parse

//I'm trying one of the two implementations, per compilation, obviously:
// split1(spline, input_line);
//split2(spline, input_line);

vector<StringRef> const v = split3( input_line );
count++;
};

count--; //subtract for final over-read
sec = (int) time(NULL) - start;
cerr << "C++ : Saw " << count << " lines in " << sec << " seconds." ;
if (sec > 0) {
lps = count / sec;
cerr << " Crunch speed: " << lps << endl;
} else
cerr << endl;
return 0;
}

//compiled with: g++ -Wall -O3 -o split1 split_1.cpp -std=c++0x

Disclaimer: I hope there aren't any bugs. I haven't tested the functionality, but only checked the speed. But I think, even if there is a bug or two, correcting that won't significantly affect the speed.

In python, why is reading from an array slower than reading from list?

It takes time to wrap a raw integer into a Python int.

Python faster than C++? How does this happen?

There isn't anything obvious here. Since Python's written in C, it must use something like printf to implement print. C++ I/O Streams, like cout, are usually implemented in a way that's much slower than printf. If you want to put C++ on a better footing, you can try changing to:

#include <cstdio>
int main()
{
int x=0;
while(x!=1000000)
{
++x;
std::printf("%d\n", x);
}
return 0;
}

I did change to using ++x instead of x++. Years ago people thought that this was a worthwhile "optimization." I will have a heart attack if that change makes any difference in your program's performance (OTOH, I am positive that using std::printf will make a huge difference in runtime performance). Instead, I made the change simply because you aren't paying attention to what the value of x was before you incremented it, so I think it's useful to say that in code.

C++ is quite slower than python in opencv

I figured what the problem is, I have customized OpenCV for c++ to gain advantage of the CUDA cores in my Jetson Orin, yet the python uses general OpenCV stored in other directory, which doesn't have CUDA support. When I changed the OpenCV compilation for C++ to the general one, it worked fast as expected since in my customized compilation I also customized the CPU parallelization which seems to be slower than the default one.



Related Topics



Leave a reply



Submit