Setting Smaller Buffer Size for Sys.Stdin

Setting smaller buffer size for sys.stdin?

You can completely remove buffering from stdin/stdout by using python's -u flag:

-u     : unbuffered binary stdout and stderr (also PYTHONUNBUFFERED=x)
         see man page for details on internal buffering relating to '-u'

and the man page clarifies:

   -u     Force stdin, stdout and stderr to  be  totally  unbuffered.   On
          systems  where  it matters, also put stdin, stdout and stderr in
          binary mode.  Note that there is internal  buffering  in  xread-
          lines(),  readlines()  and  file-object  iterators ("for line in
          sys.stdin") which is not influenced by  this  option.   To  work
          around  this, you will want to use "sys.stdin.readline()" inside
          a "while 1:" loop.

Beyond this, altering the buffering for an existing file is not supported, but you can make a new file object with the same underlying file descriptor as an existing one, and possibly different buffering, using os.fdopen. I.e.,

import os
import sys
newin = os.fdopen(sys.stdin.fileno(), 'r', 100)

should bind newin to the name of a file object that reads the same FD as standard input, but buffered by only about 100 bytes at a time (and you could continue with sys.stdin = newin to use the new file object as standard input from there onwards). I say "should" because this area used to have a number of bugs and issues on some platforms (it's pretty hard functionality to provide cross-platform with full generality) -- I'm not sure what its state is now, but I'd definitely recommend thorough testing on all platforms of interest to ensure that everything goes smoothly. (-u, removing buffering entirely, should work with fewer problems across all platforms, if that might meet your requirements).

Disable buffering of sys.stdin in Python 3

The trick is to use tty.setcbreak(sys.stdin.fileno(), termios.TCSANOW) and before that store the terminal attributes via termios.getattr in variable to restore the default behavior. With cbreak set, sys.stdin.read(1) is unbuffered. This also suppress the ansi controll code response from the terminal.

def getpos():

    buf = ""
    stdin = sys.stdin.fileno()
    tattr = termios.tcgetattr(stdin)

    try:
        tty.setcbreak(stdin, termios.TCSANOW)
        sys.stdout.write("\x1b[6n")
        sys.stdout.flush()

        while True:
            buf += sys.stdin.read(1)
            if buf[-1] == "R":
                break

    finally:
        termios.tcsetattr(stdin, termios.TCSANOW, tattr)

    # reading the actual values, but what if a keystroke appears while reading
    # from stdin? As dirty work around, getpos() returns if this fails: None
    try:
        matches = re.match(r"^\x1b\[(\d*);(\d*)R", buf)
        groups = matches.groups()
    except AttributeError:
        return None

    return (int(groups[0]), int(groups[1]))

Python 3 on Windows: extend stdin.readline() line buffer size

This is a bug: https://bugs.python.org/issue41849

sys.stdin.readline() has 512-character buffer, indeed
input() has 16K-character buffer

So currently input() can be used as a workaround.

Win32 buffer size for read from stdin

You don’t say which version of the runtime and OS you use, but I cannot reproduce this problem with MSVC 19.16.27031.1 on Windows 10. There are a few documented reasons it might fail. From the MSDN documentation of ReadFile:

Characters can be read from the console input buffer by using ReadFile with a handle to console input. The console mode determines the exact behavior of the ReadFile function. By default, the console mode is ENABLE_LINE_INPUT, which indicates that ReadFile should read until it reaches a carriage return. If you press Ctrl+C, the call succeeds, but GetLastError returns ERROR_OPERATION_ABORTED. For more information, see CreateFile.

There’s another way you could be getting this error, relating to asynchronous I/O, but that does not seem to be the problem here. You probably want to turn off the ENABLE_LINE_INPUT flag with SetConsoleMode. The documentation also says the call could fail with ERROR_NOT_ENOUGH_QUOTA if the memory pages of the buffer cannot be locked. However, you use a static buffer that should not have this problem.

If you’re reading a file on disk, and not a console stream, you might map it to memory, which eliminates any intermediate buffering and loads the sections of files as needed, by the same mechanism as virtual memory.

How often does sys.stdin generate data?

OK, so here's what worked for me:

import sys

while True:
    print sys.stdin.readline()

And start the script with python -u ....

I'll admit that Thomas' link to the other thread helped me find out that .readline() should be used directly in order for -u to have any effect.

Explanation: -u disables process-level buffering of stdin (as in "the standard input" and not the sys.stdin object specifically), and using .readline() instead of for line in sys.stdin avoids the internal buffering of sys.stdin.

UPDATE: As to your question about this one-liner: "How is it assumed that interpreter will cross this line if t > e: every one second?"... the "one liner" under observation is:

import sys, time
l = 0
e = int(time.time())
for line in sys.stdin:
    t = int(time.time())
    l += 1
    if t > e:
        e = t
        print l
        l = 0

time.time() returns the current time in seconds as float; converting it to int basically just rounds it down to full seconds; and the first moment int(time.time()) is greater than e, which was also set to be int(time.time()), is when almost exactly one second has passed.

But the snippet still suffers from the exact same input buffering issue your original snippet; also, it's invoked without the -u flag, so I cannot imagine why it would ever work reliably on any system, unless the buffering semantics on that system were different at both the Python process STDIN level as well as in the implementation of sys.stdin.

Why scanf can read more than 1024 character while stdin stream buffer is 1024 bytes only?

As noted in a comment, when scanf() gets to the end of the first buffer full, if it still needs more data, it goes back to the system to get more, possibly many times. The buffer is merely a convenience and optimization measure.

taking multiline input with sys.stdin

So, took your code out of the function and ran some tests.

import sys
buffer = []
while run:
    line = sys.stdin.readline().rstrip('\n')
    if line == 'quit':
        run = False
    else:
        buffer.append(line)

print buffer

Changes:

Removed the 'for' loop
Using 'readline' instead of 'readlines'
strip'd out the '\n' after input, so all processing afterwards is much easier.

Another way:

import sys
buffer = []
while True:
    line = sys.stdin.readline().rstrip('\n')
    if line == 'quit':
        break
    else:
        buffer.append(line)
print buffer

Takes out the 'run' variable, as it is not really needed.

Setting Smaller Buffer Size for Sys.Stdin