Why is printing to stdout so slow? Can it be sped up?
Thanks for all the comments! I've ended up answering it myself with your help. It feels dirty answering your own question, though.
Question 1: Why is printing to stdout slow?
Answer: Printing to stdout is not inherently slow. It is the terminal you work with that is slow. And it has pretty much zero to do with I/O buffering on the application side (eg: python file buffering). See below.
Question 2: Can it be sped up?
Answer: Yes it can, but seemingly not from the program side (the side doing the 'printing' to stdout). To speed it up, use a faster different terminal emulator.
Explanation...
I tried a self-described 'lightweight' terminal program called wterm
and got significantly better results. Below is the output of my test script (at the bottom of the question) when running in wterm
at 1920x1200 in on the same system where the basic print option took 12s using gnome-terminal:
-----
timing summary (100k lines each)
-----
print : 0.261 s
write to file (+fsync) : 0.110 s
print with stdout = /dev/null : 0.050 s
0.26s is MUCH better than 12s! I don't know whether wterm
is more intelligent about how it renders to screen along the lines of how I was suggesting (render the 'visible' tail at a reasonable frame rate), or whether it just "does less" than gnome-terminal
. For the purposes of my question I've got the answer, though. gnome-terminal
is slow.
So - If you have a long running script that you feel is slow and it spews massive amounts of text to stdout... try a different terminal and see if it is any better!
Note that I pretty much randomly pulled wterm
from the ubuntu/debian repositories. This link might be the same terminal, but I'm not sure. I did not test any other terminal emulators.
Update: Because I had to scratch the itch, I tested a whole pile of other terminal emulators with the same script and full screen (1920x1200). My manually collected stats are here:
wterm 0.3s
aterm 0.3s
rxvt 0.3s
mrxvt 0.4s
konsole 0.6s
yakuake 0.7s
lxterminal 7s
xterm 9s
gnome-terminal 12s
xfce4-terminal 12s
vala-terminal 18s
xvt 48s
The recorded times are manually collected, but they were pretty consistent. I recorded the best(ish) value. YMMV, obviously.
As a bonus, it was an interesting tour of some of the various terminal emulators available out there! I'm amazed my first 'alternate' test turned out to be the best of the bunch.
C++ does printing to terminal significantly slow down code?
Yes, rendering to screen takes longer than writing to file.
In windows its even slower as the program rendering is not the program that is running, so there are constantly messages sent between processes to get it drawn.
I guess its same in linux since virtual terminal is on a different process than the one that is running.
Printing to the console vs writing to a file (speed)
Writing to a file would be much faster. This is especially true since you are flushing the buffer after every line with endl
.
On a side note, you could speed the printing significantly by doing repeating cout << "text!\n";
5000 times, then flushing the buffer using flush()
.
Slow print waiting too long before printing
You don't define the flush of the print stdout. By not including the flush=True in the print command, it will just store all the characters in the buffer until the function call resolves, and it all prints in a single instance.
import time
def print_slow(str):
for letter in str:
print(letter, end='', flush=True)
time.sleep(.4)
print_slow("junk")
How to speed up printf in C
Beneath is a slightly unoptimized implementation (although I skipped the intermediate list and print directly) of what I think you were supposed to do. Running that program on an AMD A8-6600K with a small load (mainly a Youtube music-video for some personal entertainment) results in
real 0m1.211s
user 0m0.047s
sys 0m0.122s
averaged over a couple of runs. So the problem lies in your implementation of the sieve or you are hiding some essential facts about your hardware.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <inttypes.h>
#include <limits.h>
#include <string.h>
/* I call it a general bitset. Others might call it an abomination. YMMV. */
# define ERAT_BITS (sizeof(uint32_t)*CHAR_BIT)
# define GET_BIT(s,n) ((*(s+(n/ERAT_BITS)) & ( 1<<( n % ERAT_BITS ))) != 0)
# define SET_BIT(s,n) (*(s+(n/ERAT_BITS)) |= ( 1<<( n % ERAT_BITS )))
# define CLEAR_BIT(s,n) (*(s+(n/ERAT_BITS)) &= ~( 1<<( n % ERAT_BITS )))
# define TOG_BIT(s,n) (*(s+(n/ERAT_BITS)) ^= ( 1<<( n % ERAT_BITS )))
/* size is the size in bits, the overall size might be bigger */
typedef struct mp_bitset_t {
uint32_t size;
uint32_t *content;
} mp_bitset_t;
# define mp_bitset_alloc(bst, n) \
do {\
(bst)->content=malloc(( n /(sizeof(uint32_t)) + 1 ));\
if ((bst)->content == NULL) {\
fprintf(stderr, "memory allocation for bitset failed");\
exit(EXIT_FAILURE);\
}\
(bst)->size = n;\
} while (0)
# define mp_bitset_size(bst) ((bst)->size)
# define mp_bitset_setall(bst) memset((bst)->content,~(uint32_t)(0),\
(bst->size /(sizeof(uint32_t) ) +1 ))
# define mp_bitset_clearall(bst) memset((bst)->content,0,\
(bst->size /(sizeof(uint32_t) ) +1 ))
# define mp_bitset_clear(bst,n) CLEAR_BIT((bst)->content, n)
# define mp_bitset_set(bst,n) SET_BIT((bst)->content, n)
# define mp_bitset_get(bst,n) GET_BIT((bst)->content, n)
# define mp_bitset_free(bst) \
do {\
free((bst)->content);\
free(bst);\
} while (0)
uint32_t mp_bitset_nextset(mp_bitset_t * bst, uint32_t n);
uint32_t mp_bitset_prevset(mp_bitset_t * bst, uint32_t n);
void mp_eratosthenes(mp_bitset_t * bst);
/* It's called Hallek's method but it has many inventors*/
static uint32_t isqrt(uint32_t n)
{
uint32_t s, rem, root;
if (n < 1)
return 0;
/* This is actually the highest square but it goes
* downward from this, quite fast */
s = 1 << 30;
rem = n;
root = 0;
while (s > 0) {
if (rem >= (s | root)) {
rem -= (s | root);
root >>= 1;
root |= s;
} else {
root >>= 1;
}
s >>= 2;
}
return root;
}
uint32_t mp_bitset_nextset(mp_bitset_t *bst, uint32_t n)
{
while ((n < mp_bitset_size(bst)) && (!mp_bitset_get(bst, n))) {
n++;
}
return n;
}
/*
* Standard method, quite antique now, but good enough for the handful
* of primes needed here.
*/
void mp_eratosthenes(mp_bitset_t *bst)
{
uint32_t n, k, r, j;
mp_bitset_setall(bst);
mp_bitset_clear(bst, 0);
mp_bitset_clear(bst, 1);
n = mp_bitset_size(bst);
r = isqrt(n);
for (k = 4; k < n; k += 2)
mp_bitset_clear(bst, k);
k = 0;
while ((k = mp_bitset_nextset(bst, k + 1)) < n) {
if (k > r) {
break;
}
for (j = k * k; j < n; j += k * 2) {
mp_bitset_clear(bst, j);
}
}
}
#define UPPER_LIMIT 1000000 /* one million */
int main(void) {
mp_bitset_t *bst;
uint32_t n, k, j;
bst = malloc(sizeof(mp_bitset_t));
if(bst == NULL) {
fprintf(stderr, "failed to allocate %zu bytes\n",sizeof(mp_bitset_t));
exit(EXIT_FAILURE);
}
mp_bitset_alloc(bst, UPPER_LIMIT);
mp_bitset_setall(bst);
mp_bitset_clear(bst, 0); // 0 is not prime b.d.
mp_bitset_clear(bst, 1); // 1 is not prime b.d.
n = mp_bitset_size(bst);
for (k = 4; k < n; k += 2) {
mp_bitset_clear(bst, k);
}
k = 0;
while ((k = mp_bitset_nextset(bst, k + 1)) < n) {
printf("%" PRIu32 "\n", k);
for (j = k * k; j < n; j += k * 2) {
mp_bitset_clear(bst, j);
}
}
mp_bitset_free(bst);
return EXIT_SUCCESS;
}
Compiled with
gcc-4.9 -O3 -g3 -W -Wall -Wextra -Wuninitialized -Wstrict-aliasing -pedantic -std=c11 tests.c -o tests
(GCC is gcc-4.9.real (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4
)
R writing to stdout very slow. Any ways to improve?
Binary reads are fast. Printing to stdout is slow for two reasons:
- formatting
- actual printing
You can benchmark / profile either. But if you really want to be "fast", stay away from formatting for printing lots of data.
Compiled code can help make the conversion faster. But again, the fastest solution will to
- remain with binary
- not write to stdout out or file (but use eg something like Redis).
Related Topics
Python Subprocess.Popen "Oserror: [Errno 12] Cannot Allocate Memory"
How to Update-Alternatives to Python 3 Without Breaking Apt
How to Clone a List So That It Doesn't Change Unexpectedly After Assignment
How to Read a File Line-By-Line into a List
What Is the Purpose of the Single Underscore "_" Variable in Python
Pip' Is Not Recognized as an Internal or External Command
How to Check If a String Is a Number (Float)
Difference Between Class and Instance Attributes
Why Does Concatenation of Dataframes Get Exponentially Slower
Python: How to Kill Child Process(Es) When Parent Dies
Os.Walk Without Hidden Folders
How to Parse an Iso 8601-Formatted Date
Why Does "Pip Install" Inside Python Raise a Syntaxerror
Split Strings into Words With Multiple Word Boundary Delimiters