Read Serial Data Without High CPU Use

Read serial data without high CPU use

The OP has probably long since solved this, but for the sake of anyone who gets here by google:

#include <sys/poll.h>

struct pollfd fds[1];
fds[0].fd = serial_fd;
fds[0].events = POLLIN ;
int pollrc = poll( fds, 1, 1000);
if (pollrc < 0)
{
    perror("poll");
}
else if( pollrc > 0)
{
    if( fds[0].revents & POLLIN )
    {
        char buff[1024];
        ssize_t rc = read(serial_fd, buff, sizeof(buff) );
        if (rc > 0)
        {
            /* You've got rc characters. do something with buff */
        }
    }
}

Make sure the serial port is opened in nonblocking mode as poll() can sometimes return when there are no characters waiting.

Does cpu usage affect reading of serial data?

If you are hitting the ceiling of your CPU usage, it is possible that the PySerial process that is reading the data is not able to complete each cycle before the next sampling window (e.g. your code wants a sample per second, but the cycle takes two seconds to complete). Adding too many processes or CPU-heavy processes will eventually lead to the CPU being a bottleneck.

Why does Linux C serial read only work with high CPU?

As your code doesn't work in a common condition that you can't prevent from happening, the why of "Why [it] only work[s] with high CPU?" doesn't really matter. It's probably interesting to spend a lot of time and effort finding out the "Why?", but I'd think you're going to have to change your code because anything that stops working when CPU load goes down is, IMO, waaaaay too fragile to trust to work for any time.

First, is threading even useful on your system? If there's only one CPU that can run only one thread at a time, creating multiple threads will be counterproductive. Have you tried a simple single-threaded solution and actually found that it doesn't work?

If you have tried a single-threaded solution and it doesn't work, the first thing I note is that your currently posted code is doing a tremendous amount of extra work it doesn't need to do, and it's likely contending over a single lock when that doesn't help much at all.

So eliminate your extraneous copying of data along with all the unnecessary bookkeeping you're doing.

You also probably have a lot of contention with just a single mutex and condition variable. There's no need to not read because the logging thread is doing something, or the logging thread not processing because the read thread is doing some bookkeeping. You'd almost certainly benefit from finer lock granularity.

I'd do something like this:

#define CHUNK_SIZE ( 4 * 1024 )
#define NUM_BUFFERS 16

struct dataStruct
{
    sem_t full;
    sem_t empty;
    ssize_t count;
    char data[ CHUNK_SIZE ]
};

struct dataStruct dataArray[ NUM_BUFFERS ];

void initDataArray( void )
{
    for ( int ii = 0; ii < NUM_BUFFERS; ii++ )
    {
        // buffers all start empty
        sem_init( &( dataArray[ ii ].full ), 0, 0 );
        sem_init( &( dataArray[ ii ].empty ), 0, 1 );
    }
}

void *readPortThread( void *arg )
{
    unsigned currBuffer = 0;

    // get portFD from arg
    int portFD = ???
    for ( ;; )
    {
        sem_wait( &( dataArray[ currBuffer  ].empty ) );

        // probably should loop and read more, and don't
        // infinite loop on any error (hint...)
        dataArray[ currBuffer  ].count = read( portFD, 
            dataArray[ currBuffer  ].data,
            sizeof( dataArray[ currBuffer  ].data ) );
        sem_post( &( dataArray[ currBuffer  ].full ) );
        currBuffer++;
        currBuffer  %= NUM_BUFFERS;
    }
    return( NULL );
}

void *processData( char *data, ssize_t count )
{
    ...
}

void *logDataThread( void *arg )
{
    for ( ;; )
    {
        sem_wait( &( dataArray[ currBuffer  ].full ) );

        processData( dataArray[ currBuffer  ].data,
            dataArray[ currBuffer  ].count );

        sem_post( &( dataArray[ currBuffer  ].empty ) );
        currBuffer++;
        currBuffer  %= NUM_BUFFERS;
    }
    return( NULL );
}

Note the much finer locking granularity, and the complete lack of extraneous copying of data. Proper headers, all error checking, and full implementation are left as an exercise...

You'll have to test to find optimum values for CHUNK_SIZE and NUM_BUFFERS. Setting the number of buffers dynamically would be a good improvement also.

OT: There's no need for any indirection in your int open_port(int flags, int *fd) function. Simply return the fd value - it's good enough for open() itself:

int open_port(int flags )
{
    struct termios options;
    int fd = open("/dev/ttyUSB0", flags | O_NOCTTY);

    if(fd < 0){
        printf("Error opening port\n");
        return fd;
    }

    // no need for else - the above if statement returns
    printf("Port handle is %d\n", fd);

    // did you set **ALL** the fields???
    memset( options, 0, sizeof( options ) );

    options.c_cflag = BAUDRATE | CS8 | CLOCAL | CREAD;
    options.c_iflag = 0;
    options.c_oflag = 0;
    options.c_lflag = 0;
    options.c_cc[VTIME] = 0;
    options.c_cc[VMIN] = 200;
    tcflush(fd, TCIFLUSH);
    tcsetattr(fd, TCSANOW, &options);

    return fd;
}

Python/PySerial and CPU usage

Maybe you could issue a blocking read(1) call, and when it succeeds use read(inWaiting()) to get the right number of remaining bytes.

CPU High Usage with time for serial port

i got what was the problem with the above code..

I am using Graph path for plotting the trajectory

if (k == 14)
{
    try
    {

        curr_x = (pictureBox2.Width / 2) + (int)((engval[13] * (pictureBox2.Width)) / map_width);
        curr_y = (pictureBox2.Height / 2) - (int)((engval[14] * (pictureBox2.Height)) / map_height);
        PointF p1 = new Point(curr_x, curr_y);
        if (_gPath != null && _gPath.PointCount > 0)
            p1 = _gPath.PathPoints[_gPath.PathPoints.Length - 1];
        PointF p2 = new Point(curr_x, curr_y);
        _gPath.AddLine(p1, p2);
        pictureBox2.Invalidate();
    }
    catch (Exception ex)
    {
        MessageBox.Show(ex.Message);
    }
}

Now as the application continues to run, it gathers lots of graph points and hence plotting this huge number of point is consuming the resources.

Can anybody suggest me solution to this problem that how to plot the trajectory without slowing the system...

CPU load of python serial read

There is not quite enough information to know exactly what is going on, but it appears that you are trying to receive messages that take one of the following forms:

A "Data" message, which looks like: | 2 | Length | d1 | d2 | ... | dN |
Some error indication...which you don't mention, but perhaps is just a single byte that is not 2?

In your first implementation, the line data = ser.read(2) will "wait forever until at least two bytes are received". But if a single 1-byte message is sent, then the code will keep waiting forever until another message is received.

While there is no "right" way to read from a serial port, you might want to try separating the byte receiving and the message parsing.

For example, you might have something like this:

from collections import deque

def receive_bytes(ser, buffer):
    """Read bytes from serial port and append them to the buffer"""
    data = ser.read(1)
    while data:
        buffer.append(data[0])
        data = ser.read(1)

def consume_messages(buffer):
    """Consume available messages from the buffer destructively"""

    # This loop uses the first byte as the key for what
    # message should be in the buffer. We only consume
    # a message from the buffer when it is present in
    # its entirety (i.e. we only read a data mesage once
    # all the bytes are present).
    messages = []
    while len(buffer) > 0:
        # Check if we have a data message
        if buffer[0] == 2 and len(buffer) > 1:
            data_message_len = buffer[1]
            if len(buffer) - 2 > data_message_len:
                # looks like enough info for a data message
                buffer.popleft()  # the 2 byte
                buffer.popleft()  # the length byte
                data_message = [
                    buffer.popleft() for _ in range(data_message_len)
                ]
                messages.append(data_message)

        # Check if we have a "1" message
        elif buffer[0] == 1:
            messages.append(buffer.popleft())

        else:
            # looks like the first byte is something we don't
            # know about...likely a synchronization error, so just
            # drop it
            buffer.popleft()

    return messages

def handle_messages(messages):
    for m in messages:
        print(m)

if __name__ == "__main__":
    # setup
    ser = serial.Serial(port, 38400, timeout=0)
    buf = deque()

    # loop
    while True:
        receive_bytes(ser, buf)
        messages = consume_messages(buf)
        handle_messages(messages)

FWIW, I don't think that time complexity is relevant here. The blocking nature of the original code and the interaction between your application code, the serial library, and the underlying OS are all larger factors than the time complexity.

Basically...your computer is waaaay faster than the hardware controlling the serial port, so having a block-and-wait approach will generally waste more computation time than checking with no timeout - even if you are checking thousands/millions of times between bytes.

Best Practise to Read Desired Amount of Data from Serial?

But AFAIK system calls are expensive, ...

True, a system call can consume many more CPU cycles than a local procedure/function call. The system call requires CPU-mode transitions between user-mode to (protected) kernel mode, and then back to user mode.

... thus, isn't that approach somehow crude, especially when the desired length is large?

The first question you have to ask yourself when reading from a serial terminal (e.g. a /dev/ttyXn device)(rather than a serial port) is "what kind of data is going to be received, that is, is the data lines of (ASCII) text terminated by some type of EOL (end of line) character, or does the data need to be simply treated as binary (or raw) data?"

Lines of text should be read from a serial terminal using canonical (aka cooked) mode. The OS will perform a lexical scan of the received data for your program, and delimit each line of text based on the EOL characters you specify.
The read() can then return a line assuming that blocking I/O is used, and the line of text is not longer than the buffer that is provided.

Binary data should be read from a serial terminal using noncanonical (aka raw) mode. The OS will ignore the values of the data, and (when blocking I/O is used) each read() will return an amount of data based on constraints of time and number of bytes.

See this answer for more details.

Note that the post your question refers to actually is about reading text, yet the OP is (mis)using non-canonical mode. If the OP had used the proper mode to match the input, then he might have never had a partial-read problem.

I am just curious that whether there is a better practise to avoid intensive system calls in this scenario?

Proper termios configuration is essential for efficient I/O with serial terminals.

Blocking I/O mode should be considered the preferred mode.

The OS can perform its scheduling for multitasking better when processes relinquish control more often.

The OS is more efficient in determining when data is available for return to a user.

Note also that the termios configuration is most effective when blocking mode is used, e.g. the VMIN and VTIME specifications in noncanonical mode.

For example using a select() or poll() and then a read() is one additional syscall more than when compared to just the (blocking) read(). And yet you can find many such code examples because there seems to be some popular misconception that the program can get the data faster from the "UART" that way.

But non-blocking and async modes are not necessarily faster (in a multitasking OS), and the read() merely fetches data from the termios buffer which is several layers removed from the actual hardware.

If your program uses non-blocking mode but does not perform useful work while waiting for data, and instead uses select() or poll() (or even worse calls sleep()), then your program is unnecessarily complex and ineffecient. See this answer.

A blocking-mode read() can do all that waiting for your program, make your program simpler and easier to write and maintain, and be more runtime efficient.

However for blocking non-canonical reads, you will have to accept some degree of inefficiency. The best you can do is trade-off latency versus the number of syscalls. One possible example is this answer which tries to fetch as much data per syscall, yet allow for an easy byte-by-byte lexical scan of the received binary data.

Note that a possible source of latency when reading a serial terminal is a poorly configured kernel, rather than the termios API and read() overhead.

For instance, setting the ASYNC_LOW_LATENCY flag via ioctl() (e.g. see High delay in RS232 communication on a PXA270) is one way to improve read() latency.

PySerial's readlines() consumes 25x times as much CPU time as read()

My guess is that readlines and readline busily poll the serial line for new characters in order to fulfill your request to get a full line (or lines), whereas .read will only read and return when there indeed is new data. You'll probably have to implement buffering and splitting to lines yourself (code untested since I don't have anything on a serial line right now :-) ):

import serial

def read_lines(s, sep=b"\n"):
    buffer = b""
    while True:
        buffer += s.read(1000)
        while sep in buffer:
            line, _, buffer = buffer.partition(sep)
            yield line

s = serial.Serial("/dev/ttyACM0", 9600)

for line in read_lines(s):
    print(line)

Read Serial Data Without High CPU Use