Why is std::cout so time consuming?
std::cout
ultimately results in the operating system being invoked.
If you want something to compute fast, you have to make sure that no external entities are involved in the computation, especially entities that have been written with versatility more than performance in mind, like the operating system.
Want it to run faster? You have a few options:
Replace
<< std::endl;
with<< '\n'
. This will refrain from flushing the internal buffer of the C++ runtime to the operating system on every single line. It should result in a huge performance improvement.Use
std::ios::sync_with_stdio(false);
as user Galik Mar suggests in a comment.Collect as much as possible of your outgoing text in a buffer, and output the entire buffer at once with a single call.
Write your output to a file instead of the console, and then keep that file displayed by a separate application such as Notepad++ which can keep track of changes and keep scrolling to the bottom.
As for why it is so "time consuming", (in other words, slow,) that's because the primary purpose of std::cout
(and ultimately the operating system's standard output stream) is versatility, not performance. Think about it: std::cout
is a C++ library function which will invoke the operating system; the operating system will determine that the file being written to is not really a file, but the console, so it will send the data to the console subsystem; the console subsystem will receive the data and it will start invoking the graphics subsystem to render the text in the console window; the graphics subsystem will be drawing font glyphs on a raster display, and while rendering the data, there will be scrolling of the console window, which involves copying large amounts of video RAM. That's an awful lot of work, even if the graphics card takes care of some of it in hardware.
As for the C# version, I am not sure exactly what is going on, but what is probably happening is something quite different: In C# you are not invoking Console.Out.Flush()
, so your output is cached and you are not suffering the overhead incurred by C++'s std::cout << std::endl
which causes each line to be flushed to the operating system. However, when the buffer does become full, C# must flush it to the operating system, and then it is hit not only by the overhead represented by the operating system, but also by the formidable managed-to-native and native-to-managed transition that is inherent in the way it's virtual machine works.
C++: Does cout statement makes code slower
As already mentioned, writing to the terminal is almost definitely going to be slower. Why?
depending on your OS,
std::cout
may use line buffering - which means each line may be sent to the terminal program separately. When you usestd::endl
rather than '\n' it definitely flushes the buffer. Writing the data in smaller chunks means extra system calls and rendering efforts that slow things down significantly.some operating systems / compilers are even slower - for example, Visual C++: https://connect.microsoft.com/VisualStudio/feedback/details/642876/std-wcout-is-ten-times-slower-than-wprintf-performance-bug-in-c-library
terminals displaying output need to make calls to wipe out existing screen content, render the fonts, update the scroll bar, copy the lines into the history/buffer. Especially when they get new content in small pieces, they can't reliably guess how much longer they'd have to wait for some more and are likely to try to update the screen for the little bit they've received: that's costly, and a reason excessive flushing or unbuffered output is slow.
Some terminals offer the option of "jump scrolling" which means if they find they're say 10 pages behind they immediately render the last page and the earlier 9 pages of content never appear on the screen: that can be nice and fast. Still, "jump scrolling" is not always used or wanted, as it means output is never presented to the end users eyes: perhaps the program is meant to print a huge red error message in some case - with jump scrolling there wouldn't even be a flicker of it to catch the user's attention, but without jump scrolling you'd probably notice it.
when I worked for Bloomberg we had a constant stream of log file updates occupying several monitors - at times the displayed output would get several minutes behind; a switch from the default Solaris xterm to rxvt ensured it always kept pace
redirecting output to /dev/null is a good way to see how much your particular terminal is slowing things down
printf more than 5 times faster than std::cout?
For a true apples-to-apples comparison, re-write your test so that the only thing changing between the test cases is the print function being used:
int main(int argc, char* argv[])
{
const char* teststring = "Test output string\n";
std::clock_t start;
double duration;
std::cout << "Starting std::cout test." << std::endl;
start = std::clock();
for (int i = 0; i < 1000; i++)
std::cout << teststring;
/* Display timing results, code trimmed for brevity */
for (int i = 0; i < 1000; i++) {
std::printf(teststring);
std::fflush(stdout);
}
/* Display timing results, code trimmed for brevity */
return 0;
}
With that, you will be testing nothing but the differences between the printf
and cout
function calls. You won't incur any differences due to multiple <<
calls, etc. If you try this, I suspect that you'll get a much different result.
C++ cout printing slowly
NOTE: This experimental result is valid for MSVC. In some other implementation of library, the result will vary.
printf
could be (much) faster than cout
. Although printf
parses the format string in runtime, it requires much less function calls and actually needs small number of instruction to do a same job, comparing to cout
. Here is a summary of my experimentation:
The number of static instruction
In general, cout
generates a lot of code than printf
. Say that we have the following cout
code to print out with some formats.
os << setw(width) << dec << "0x" << hex << addr << ": " << rtnname <<
": " << srccode << "(" << dec << lineno << ")" << endl;
On a VC++ compiler with optimizations, it generates around 188 bytes code. But, when you replace it printf
-based code, only 42 bytes are required.
The number of dynamically executed instruction
The number of static instruction just tells the difference of static binary code. What is more important is the actual number of instruction that are dynamically executed in runtime. I also did a simple experimentation:
Test code:
int a = 1999;
char b = 'a';
unsigned int c = 4200000000;
long long int d = 987654321098765;
long long unsigned int e = 1234567890123456789;
float f = 3123.4578f;
double g = 3.141592654;
void Test1()
{
cout
<< "a:" << a << “\n”
<< "a:" << setfill('0') << setw(8) << a << “\n”
<< "b:" << b << “\n”
<< "c:" << c << “\n”
<< "d:" << d << “\n”
<< "e:" << e << “\n”
<< "f:" << setprecision(6) << f << “\n”
<< "g:" << setprecision(10) << g << endl;
}
void Test2()
{
fprintf(stdout,
"a:%d\n"
"a:%08d\n"
"b:%c\n"
"c:%u\n"
"d:%I64d\n"
"e:%I64u\n"
"f:%.2f\n"
"g:%.9lf\n",
a, a, b, c, d, e, f, g);
fflush(stdout);
}
int main()
{
DWORD A, B;
DWORD start = GetTickCount();
for (int i = 0; i < 10000; ++i)
Test1();
A = GetTickCount() - start;
start = GetTickCount();
for (int i = 0; i < 10000; ++i)
Test2();
B = GetTickCount() - start;
cerr << A << endl;
cerr << B << endl;
return 0;
}
Here is the result of Test1 (cout):
- # of executed instruction: 423,234,439
- # of memory loads/stores: approx. 320,000 and 980,000
- Elapsed time: 52 seconds
Then, what about printf
? This is the result of Test2:
- # of executed instruction: 164,800,800
- # of memory loads/stores: approx. 70,000 and 180,000
- Elapsed time: 13 seconds
In this machine and compiler, printf
was much faster cout
. In both number of executed instructions, and # of load/store (indicates # of cache misses) have 3~4 times differences.
I know this is an extreme case. Also, I should note that cout
is much easier when you're handling 32/64-bit data and require 32/64-platform independence. There is always trade-off. I'm using cout
when checking type is very tricky.
Okay, cout
in MSVS just sucks :)
C++ Performance with/without cout
Most of this is a simple matter of optimization: as long as you don't produce any output from the loops, the compiler determines that the loops are basically dead code, and simply doesn't execute them at all. It has to do a little bit of pre-computation to determine the value that vector[5].y
would have following the loops, but it can do that entirely at compile time, so at run-time, it's basically just printing out a fixed number.
When you produce visible output inside the loops, the compiler can't just eliminate executing them, so the code runs dramatically slower.
Related Topics
Is It Illegal to Invoke a Std::Function<Void(Args...)> Under the Standard
C++ Dynamic Array Initialization with Declaration
Inconsistent Use of Const Qualifier Between Declaration and Definition
Why Add Void to Method Parameter List
G++ -Wall Not Warning About Double-> Int Cast
Element Count of an Array in C++
C++: What's the Simplest Way to Read and Write Bmp Files Using C++ on Windows
Subtraction of Two Nullptr Values Guaranteed to Be Zero
Injected Class Name Compiler Discrepancy
Convert Std::String to Qstring
What Is the Purpose of Max_Digits10 and How Is It Different from Digits10
C++ Get Description of an Exception Caught in Catch(...) Block
Why Can't I Return Bigger Values from Main Function