How to Write Custom Input Stream in C++

How to write custom input stream in C++

The proper way to create a new stream in C++ is to derive from std::streambuf and to override the underflow() operation for reading and the overflow() and sync() operations for writing. For your purpose you'd create a filtering stream buffer which takes another stream buffer (and possibly a stream from which the stream buffer can be extracted using rdbuf()) as argument and implements its own operations in terms of this stream buffer.

The basic outline of a stream buffer would be something like this:

class compressbuf
    : public std::streambuf {
    std::streambuf* sbuf_;
    char*           buffer_;
    // context for the compression
public:
    compressbuf(std::streambuf* sbuf)
        : sbuf_(sbuf), buffer_(new char[1024]) {
        // initialize compression context
    }
    ~compressbuf() { delete[] this->buffer_; }
    int underflow() {
        if (this->gptr() == this->egptr()) {
            // decompress data into buffer_, obtaining its own input from
            // this->sbuf_; if necessary resize buffer
            // the next statement assumes "size" characters were produced (if
            // no more characters are available, size == 0.
            this->setg(this->buffer_, this->buffer_, this->buffer_ + size);
        }
        return this->gptr() == this->egptr()
             ? std::char_traits<char>::eof()
             : std::char_traits<char>::to_int_type(*this->gptr());
    }
};

How underflow() looks exactly depends on the compression library being used. Most libraries I have used keep an internal buffer which needs to be filled and which retains the bytes which are not yet consumed. Typically, it is fairly easy to hook the decompression into underflow().

Once the stream buffer is created, you can just initialize an std::istream object with the stream buffer:

std::ifstream fin("some.file");
compressbuf   sbuf(fin.rdbuf());
std::istream  in(&sbuf);

If you are going to use the stream buffer frequently, you might want to encapsulate the object construction into a class, e.g., icompressstream. Doing so is a bit tricky because the base class std::ios is a virtual base and is the actual location where the stream buffer is stored. To construct the stream buffer before passing a pointer to a std::ios thus requires jumping through a few hoops: It requires the use of a virtual base class. Here is how this could look roughly:

struct compressstream_base {
    compressbuf sbuf_;
    compressstream_base(std::streambuf* sbuf): sbuf_(sbuf) {}
};
class icompressstream
    : virtual compressstream_base
    , public std::istream {
public:
    icompressstream(std::streambuf* sbuf)
        : compressstream_base(sbuf)
        , std::ios(&this->sbuf_)
        , std::istream(&this->sbuf_) {
    }
};

(I just typed this code without a simple way to test that it is reasonably correct; please expect typos but the overall approach should work as described)

Custom input stream. Stream buffer and underflow method

From my old C++ experience a stream buf is the underlying buffer for the stream. When the stream needs more data it calls underflow. Inside this method you are suppose to read from your source and setg. When the stream has data to be written back to the source it calls overflow. Inside this method you read from the stream,write back to your source and setp. For example if you are reading the data from a socket in your streambuf

socketbuf::int_type socketbuf::underflow(){
  int bytesRead = 0;
  try{
    bytesRead = soc->read(inbuffer,BUFFER_SIZE-1,0);
    if( bytesRead <= 0 ){
      return traits_type::eof();
    }
  }catch(IOException ioe){
    cout<<"Unable to read data"<<endl;
    return traits_type::eof();
  }
  setg(inbuffer,inbuffer,inbuffer+bytesRead);
  return traits_type::to_int_type(inbuffer[0]);
}

socketbuf::int_type socketbuf::overflow(socketbuf::int_type c){
  int bytesWritten = 0;
  try{
    if(pptr() - pbase() > 0){
      bytesWritten = soc->write(pbase(),(pptr() - pbase()),0);
      if( bytesWritten <= 0 )  return traits_type::not_eof(c);
    }
  }catch(IOException ioe){
    cout<<"Unable to write data"<<endl;
    return traits_type::eof();
  }
  outbuffer[0] = traits_type::to_char_type(c);
  setp(outbuffer,outbuffer+1,outbuffer+BUFFER_SIZE);
  return traits_type::not_eof(c);
}

Now coming to your code, you added

result = traits_type::to_int_type('+'); // <-- this was added

A stream reads a string until it sees a LF(line feed). So when the LF character come you are over writing that with a '+' so the stream will wait (for LF) forever.By adding this check your code should do what you are expecting. output '+++' if you input 'abc'

if (result != 10)// <-- add this in addition
    result = traits_type::to_int_type('+'); // <-- this was added

Hope it helps you.

How to create stream which handles both input and output in C++?

Creating a class that behaves like a stream is easy. Let's say we want to create such class with the name MyStream , the definition of the class will be as simple as:

#include <istream> // class "basic_iostream" is defined here

class MyStream : public std::basic_iostream<char> {
private:
    std::basic_streambuf buffer; // your streambuf object
public:
    MyStream() : std::basic_iostream<char>(&buffer) {} // note that ampersand
};

The constructor of your class should call the constructor of std::basic_iostream<char> with a pointer to a custom std::basic_streambuf<char> object. std::basic_streambuf is just a template class which defines the structure of a stream buffer. So you have to get your own stream buffer. You can get it in two ways:

From another stream: Every stream has a member rdbuf which takes no arguments and returns a pointer to the stream buffer being used by it. Example:

...
std::basic_streambuf* buffer = std::cout.rdbuf(); // take from std::cout
...

Create your own: You can always create a buffer class by deriving from std::basic_streambuf<char> and customize it as you want.

Now we defined and implemented MyStream class, we need the stream buffer. Let's select option 2 from above and create our own stream buffer and name this MyBuffer . We will need the following:

Constructor to initialize the object.
Continuous memory block to store output by program temporarily.
Continuous memory block to store input from the user (or something other) temporarily.
Method overflow , which is called when allocated memory for storing output is full.
Method underflow , which is called when all input is read by the program and more input requested.
Method sync , which is called when output is flushed.

As we know what things are needed to create a stream buffer class, let's declare it:

class MyBuffer : public std::basic_streambuf<char> {
private:
    char inbuf[10];
    char outbuf[10];

    int sync();
    int_type overflow(int_type ch);
    int_type underflow();
public:
    MyBuffer();
};

Here inbuf and outbuf are two arrays which will store input and output respectively. int_type is a special type which is like char and created to support multiple character types like char , wchar_t , etc.

Before we jump into the implementation of our buffer class, we need to know how the buffer will work.

To understand how buffers work, we need to know how arrays work. Arrays are nothing special but pointers to continuous memory. When we declare a char array with two elements, the operating system allocate 2 * sizeof(char) memory for our program. When we access an element from the array with array[n] , it is converted to *(array + n) , where n is index number. When you add n to an array, it jumps to next n * sizeof(<the_type_the_array_points_to>) (figure 1). If you don't know what pointer arithmetics I would recommend you to learn that before you continue. cplusplus.com has a good article on pointers for beginners.

             array    array + 1
               \        /
------------------------------------------
  |     |     | 'a' | 'b' |     |     |
------------------------------------------
    ...   105   106   107   108   ...
                 |     |
                 -------
                    |
            memory allocated by the operating system

                     figure 1: memory address of an array

As we know much about pointers now, let's see how stream buffers work. Our buffer contains two arrays inbuf and outbuf . But how the standard library would know input must be stored to inbuf and output must be stored to outbuf ? So, there two areas called get area and put area which is input and output area respectively.

Put area is specified with the following three pointers (figure 2):

pbase() or put base: start of put area
epptr() or end put pointer: end of put area
pptr() or put pointer: where next character will be put

These are actually functions which return the corresponding pointer. These pointers are set by setp(pbase, epptr) . After this function call, pptr() is set to pbase() . To change it we'll use pbump(n) which repositions pptr() by n character, n can be positive or negative. Note that the stream will write to the previous memory block of epptr() but not epptr() .

  pbase()                         pptr()                       epptr()
     |                              |                             |
------------------------------------------------------------------------
  | 'H' | 'e' | 'l' | 'l' | 'o'  |     |     |     |     |     |     |
------------------------------------------------------------------------
     |                                                      |
     --------------------------------------------------------
                                 |
                   allocated memory for the buffer

           figure 2: output buffer (put area) with sample data

Get area is specified with the following three pointers (figure 3):

eback() or end back, start of get area
egptr() or end get pointer, end of get area
gptr() or get pointer, the position which is going to be read

These pointers are set with setg(eback, gptr, egptr) function. Note that the stream will read the previous memory block of egptr() but not egptr().

  eback()                         gptr()                       egptr()
     |                              |                             |
------------------------------------------------------------------------
  | 'H' | 'e' | 'l' | 'l' | 'o'  | ' ' | 'C' | '+' | '+' |     |     |
------------------------------------------------------------------------
     |                                                      |
     --------------------------------------------------------
                                 |
                   allocated memory for the buffer

           figure 3: input buffer (get area) with sample data

Now that we have discussed almost all we need to know before creating a custom stream buffer, it's time to implement it! We'll try to implement our stream buffer such way that it will work like std::cout !

Let's start with the constructor:

MyBuffer() {
    setg(inbuf+4, inbuf+4, inbuf+4);
    setp(outbuf, outbuf+9);
}

Here we set all three get pointers to one position, which means there are no readable characters, forcing underflow() when input wanted. Then we set put pointer in such a way so the stream can write to whole outbuf array except the last element. We'll preserve it for future use.

Now, let's implement sync() method, which is called when output is flushed:

int sync() {
    int return_code = 0;

    for (int i = 0; i < (pptr() - pbase()); i++) {
        if (std::putchar(outbuf[i]) == EOF) {
            return_code = EOF;
            break;
        }
    }

    pbump(pbase() - pptr());
    return return_code;
}

This does it's work very easily. First, it determines how many characters there are to print, then prints one by one and repositions pptr() (put pointer). It returns EOF or -1 if character any character is EOF, 0 otherwise.

But what to do if put area is full? So, we need overflow() method. Let's implement it:

int_type overflow(int_type ch) {
    *pptr() = ch;
    pbump(1);

    return (sync() == EOF ? EOF : ch);
}

Not very special, this just put the extra character into the preserved last element of outbuf and repositions pptr() (put pointer), then calls sync() . It returns EOF if sync() returned EOF, otherwise the extra character.

Everything is now complete, except input handling. Let's implement underflow() , which is called when all characters in input buffer are read:

int_type underflow() {
    int keep = std::max(long(4), (gptr() - eback()));
    std::memmove(inbuf + 4 - keep, gptr() - keep, keep);

    int ch, position = 4;
    while ((ch = std::getchar()) != EOF && position <= 10) {
        inbuf[position++] = char(ch);
        read++;
    }
    
    if (read == 0) return EOF;
    setg(inbuf - keep + 4, inbuf + 4 , inbuf + position);
    return *gptr();
}

A little difficult to understand. Let's see what's going on here. First, it calculates how many characters it should preserve in buffer (which is at most 4) and stores it in the keep variable. Then it copies last keep number characters to the start of the buffer. This is done because characters can be put back into the buffer with unget() method of std::basic_iostream . Program can even read next characters without extracting it with peek() method of std::basic_iostream . After the last few characters are put back, it reads new characters until it reaches the end of the input buffer or gets EOF as input. Then it returns EOF if no characters are read, continues otherwise. Then it repositions all get pointers and return the first character read.

As our stream buffer is implemented now, we can setup our stream class MyStream so it uses our stream buffer. So we change the private buffer variable:

...
private:
    MyBuffer buffer;
public:
...

You can now test your own stream, it should take input from and show output from terminal.

Note that this stream and buffer can only handle char based input and output. Your class must derive from corresponding class to handle other types of input and output (e.g std::basic_streambuf<wchar_t> for wide characters) and implement member functions or method to so they can handle that type of character.

C++ custom stream

If you look at how streams work, it's just a case of overloading operator<< for both your stream object, and the various things you want to send to it. There's nothing special about <<, it just reads nicely, but you could use + or whatever else you want.

Custom buffered input stream. End of input

Pressing Enter sends character to the i/o buffer. That doesn't mean 'end of input'.
In your file you can easily have something like

Dear Mr. Smith,<CR><EOL>I am writing to you this message.<CR><EOL>Kind regards,<CR><EOL>Your Name<EOF>

The standard stream gives you a lot of flexibility in how to read this input.

For example:

istream get() will return you 'D'
istream operator >> will return "Dear"
istream getline () will return "Dear Mr. Smith,"
streambuf sgetn (6) will return "Dear M"

You can also adjust their behaviour to your needs. So you can read as much or as little as you want.

In your code the reading operation is:

std::streamsize read = m_stream_buffer->sgetn(m_buffer, m_size);

which means "give me m_size characters or less if end of input occurred".
Have a look at the documentation of streambuf for a better explanation.
http://www.cplusplus.com/reference/streambuf/streambuf

std::streambuf works on per character basis. No getline() or operator>> here.
If you want to stop at a particular character (e.g. ) you will probably need a loop with sgetc().

Writing a custom input manipulator

You did not show your input, but I don't think getline() would be appropriate to use in this situation. operator>> is meant to read a single word, not a whole line.

In any case, you are leaking both char[] arrays that you allocate. You need to delete[] them when you are done using them. For the str array (which FYI, you don't actually need, as you could just copy characters from the temp string directly into res instead), you can just delete[] it before exiting. But for res, the membuf would have to hold on to that pointer and delete[] it when the membuf itself is no longer being used.

But, more importantly, your use of membuf is simply wrong. You are creating it as a local variable of skipchar(), so it will be destroyed when skipchar() exits, leaving the stream with a dangling pointer to an invalid object. The streambuf* pointer you assign to the stream must remain valid for the entire duration that it is assigned to the istream, which means creating the membuf object with new, and then the caller will have to remember to manually delete it at a later time (which kind of defeats the purpose of using operator>>). However, a stream manipulator really should not change the rdbuf that the stream is pointing at in the first place, since there is not a good way to restore the previous streambuf after subsequent read operations are finished (unless you define another manipulator to handle that, ie cin >> skipchar >> str >> stopskipchar;).

In this situation, I would suggest a different approach. Don't make a stream manipulator that assigns a new streambuf to the stream, thus affecting all subsequent operator>> calls. Instead, make a manipulator that takes a reference to the output variable, and then reads from the stream and outputs only what is needed (similar to how standard manipulators like std::quoted and std::get_time work), eg:

struct skipchars
{
    string &str;
};

istream& operator>>(istream& stream, skipchars output)
{
    string temp;
    if (stream >> temp) {
        for (size_t i = 0; i < temp.size(); i += 10) {
            output.str += temp.substr(i, 5);
        }
    }
    return stream;
}

int main()
{
    string str;
    cout << "enter smth:\n";
    cin >> skipchars{str};
    cout << "entered string: " << str;
    return 0;
}

Online Demo

Alternatively:

struct skipcharsHelper
{
    string &str;
};

istream& operator>>(istream& stream, skipcharsHelper output)
{
    string temp;
    if (stream >> temp) {
        for (size_t i = 0; i < temp.size(); i += 10) {
            output.str += temp.substr(i, 5);
        }
    }
    return stream;
}

skipcharsHelper skipchars(string &str)
{
    return skipcharsHelper{str};
}

int main()
{
    string str;
    cout << "enter smth:\n";
    cin >> skipchars(str);
    cout << "entered string: " << str;
    return 0;
}

Online Demo

How to write custom input function for Flex in C++ mode?

The simple solution, if you just want to provide a string input, is to make the string into a std::istringstream, which is a valid std::istream. The simplicity of this solution reduces the need for an equivalent to yy_scan_string.

On the other hand, if you have a data source you want to read from which is not derived from std::istream, you can easily create a lexical scanner which does whatever is necessary. Just subclass yyFlexLexer, add whatever private data members you will need and a constructor which initialises them, and override int LexerInput(char* buffer, size_t maxsize); to read at least one and no more than maxsize bytes into buffer, returning the number of characters read. (YY_INPUT also works in the C++ interface, but subclassing is more convenient precisely because it lets you maintain your own reader state.)

Notes:

If you decide to subclass and override LexerInput, you need to be aware that "interactive" mode is actually implemented in LexerInput. So if you want your lexer to have an interactive mode, you'll have to implement it in your override, too. In interactive mode, LexerInput always reads exactly one character (unless, of course, it's at the end of the file).
As you can see in the Flex code repository, a future version of Flex will use refactored versions of these functions, so you might need to be prepared to modify your code in the future, although Flex generally maintains backwards compatibility for a long time.

implementing simple input stream

Have you looked at boost.iostreams? It does most of the grunt work for you (possibly not for your exact use case, but for C++ standard library streams in general).

How to Write Custom Input Stream in C++