Non-Blocking Worker - Interrupt File Copy

Non-blocking worker - interrupt file copy

I don't think the file size has any effect on how long a renaming will take.

For the copy - Qt offers nothing built in, you have to implement it yourself. The key gotcha here is that you will have to find some way to poll for a copy cancellation continuously. This means you cannot lock the main thread in order to be able to process events.

Whether you go for an extra thread in order to keep the main thread responsive, or decide to use the main thread - in both cases you will need to implement "fragmented" copying - one chunk at a time using a buffer, until the file is copied or copying is cancelled. You need this to be able to process user events and track copying progress.

I suggest you implement a QObject derived copy helper worker class which tracks file name, total size, buffer size, progress and clean up on cancellation. Then it is a matter of choice whether you will use it in the main thread or in a dedicated thread.

EDIT: Found it, but you better double check it, since it was done as an example and has not been thoroughly tested:

class CopyHelper : public QObject {
    Q_OBJECT
    Q_PROPERTY(qreal progress READ progress WRITE setProgress NOTIFY progressChanged)
public:
    CopyHelper(QString sPath, QString dPath, quint64 bSize = 1024 * 1024) :
        isCancelled(false), bufferSize(bSize), prog(0.0), source(sPath), destination(dPath), position(0) { }
    ~CopyHelper() { free(buff); }

    qreal progress() const { return prog; }
    void setProgress(qreal p) {
        if (p != prog) {
            prog = p;
            emit progressChanged();
        }
    }

public slots:
    void begin() {
        if (!source.open(QIODevice::ReadOnly)) {
            qDebug() << "could not open source, aborting";
            emit done();
            return;
        }
        fileSize = source.size();
        if (!destination.open(QIODevice::WriteOnly)) {
            qDebug() << "could not open destination, aborting";
            // maybe check for overwriting and ask to proceed
            emit done();
            return;
        }
        if (!destination.resize(fileSize)) {
            qDebug() << "could not resize, aborting";
            emit done();
            return;
        }
        buff = (char*)malloc(bufferSize);
        if (!buff) {
            qDebug() << "could not allocate buffer, aborting";
            emit done();
            return;
        }
        QMetaObject::invokeMethod(this, "step", Qt::QueuedConnection);
        //timer.start();
    }
    void step() {
        if (!isCancelled) {
            if (position < fileSize) {
                quint64 chunk = fileSize - position;
                quint64 l = chunk > bufferSize ? bufferSize : chunk;
                source.read(buff, l);
                destination.write(buff, l);
                position += l;
                source.seek(position);
                destination.seek(position);
                setProgress((qreal)position / fileSize);
                //std::this_thread::sleep_for(std::chrono::milliseconds(100)); // for testing
                QMetaObject::invokeMethod(this, "step", Qt::QueuedConnection);
            } else {
                //qDebug() << timer.elapsed();
                emit done();
                return;
            }
        } else {
            if (!destination.remove()) qDebug() << "delete failed";
            emit done();
        }
    }
    void cancel() { isCancelled = true; }

signals:
    void progressChanged();
    void done();

private:
    bool isCancelled;
    quint64 bufferSize;
    qreal prog;
    QFile source, destination;
    quint64 fileSize, position;
    char * buff;
    //QElapsedTimer timer;
};

The done() signal is used to deleteLater() the copy helper / close copy dialog or whatever. You can enable the elapsed timer and use it to implement an elapsed time property and estimated time as well. Pausing is another possible feature to implement. Using QMetaObject::invokeMethod() allows the event loop to periodically process user events so you can cancel and update progress, which goes from 0 to 1. You can easily tweak it for moving files as well.

Qt. Non-blocking progress dialog during huge amount of non-controllable calculations

Use the Qt Concurrent library, specifically its QConcurrent::run method to spawn a thread for your expensive task. You get back a QFuture that you can watch with QFutureWatcher.

The QT documentation has an asynchronous image scaling example.
I'm adding selected parts below.

Creating a QFutureWatcher to be signaled when a future is done:

    imageScaling = new QFutureWatcher<QImage>(this);
    connect(imageScaling, &QFutureWatcher<QImage>::resultReadyAt, this, &Images::showImage);
    connect(imageScaling, &QFutureWatcher<QImage>::finished, this, &Images::finished);

Constructing the future:

    std::function<QImage(const QString&)> scale = [imageSize](const QString &imageFileName) {
        QImage image(imageFileName);
        return image.scaled(QSize(imageSize, imageSize), Qt::IgnoreAspectRatio, Qt::SmoothTransformation);
    };
    QFuture<QImage> fut = QConcurrent::run(scale, "test.png");

Attaching the future to the futurewatcher:

    imageScaling->setFuture(fut);

Non-blocking access to the file system

Indeed there is no other method.

Actually there is another kind of blocking that can't be dealt with other than by threads and that is page faults. Those may happen in program code, program data, memory allocation or data mapped from files. It's almost impossible to avoid them (actually you can lock some pages to memory, but it's privileged operation and would probably backfire by making the kernel do a poor job of memory management somewhere else). So:

You can't really weed out every last chance of blocking for a particular client, so don't bother with the likes of open and stat. The network will probably add larger delays than these functions anyway.
For optimal performance you should have enough threads so some can be scheduled if the others are blocked on page fault or similar difficult blocking point.

Also if you need to read and process or process and write data during handling a network request, it's faster to access the file using memory-mapping, but that's blocking and can't be made non-blocking. So modern network servers tend to stick with the blocking calls for most stuff and simply have enough threads to keep the CPU busy while other threads are waiting for I/O.

The fact that most modern servers are multi-core is another reason why you need multiple threads anyway.

How to cleanly interrupt a thread blocking on a recv call?

So you have at least these possibilities:

(1) pthread_kill will blow the thread out of recv with errno == EINTR and you can clean up and exit the thread on your own. Some people think this is nasty. Depends, really.

(2) Make your client socket(s) non-blocking and use select to wait on input for a specific period of time before checking if a switch used between the threads has been set to indicated they should shut down.

(3) In combo with (2) have each thread share a pipe with the master thread. Add it to the select. If it becomes readable and contains a shutdonw request, the thread shuts itself down.

(4) Look into the pthread_cancel mechanism if none of the above (or variations thereof) do not meet your needs.

How is none blocking IO for regular files is implemented in .Net on Linux?

It's worth pointing that there are multiple contexts at play here.

The Linux operating system

From Non-Blocking descriptors:

By default, read on any descriptor blocks if there’s no data
available. The same applies to write or send. This applies to
operations on most descriptors except disk files, since writes to disk
never happen directly but via the kernel buffer cache as a proxy. The
only time when writes to disk happen synchronously is when the O_SYNC
flag was specified when opening the disk file.
Any descriptor (pipes, FIFOs, sockets, terminals, pseudo-terminals,
and some other types of devices) can be put in the nonblocking mode.
When a descriptor is set in nonblocking mode, an I/O system call on
that descriptor will return immediately, even if that request can’t be
immediately completed (and will therefore result in the process being
blocked otherwise). The return value can be either of the following:
an error: when the operation cannot be completed at all
a partial count: when the input or output operation can be partially completed
the entire result: when the I/O operation could be fully completed

As explained above, the Non-Blocking descriptors will prevent pipes (or sockets, or...) from blocking continuously. They weren't meant to be used with disk files, however, because no matter if you want to read an entire file, or just a part of it, the data is there. It's not going to get there in the future, so you can start processing it right away.

Quoting your linked post:

Regular files are always readable and they are also always writeable.
This is clearly stated in the relevant POSIX specifications. I cannot
stress this enough. Putting a regular file in non-blocking has
ABSOLUTELY no effects other than changing one bit in the file flags.
Reading from a regular file might take a long time. For instance, if
it is located on a busy disk, the I/O scheduler might take so much
time that the user will notice that the application is frozen.
Nevertheless, non-blocking mode will not fix it. It will simply not
work. Checking a file for readability or writeability always succeeds
immediately. If the system needs time to perform the I/O operation, it
will put the task in non-interruptible sleep from the read or write
system call. In other words, if you can assume that a file descriptor
refers to a regular file, do not waste your time (or worse, other
people's time) in implementing non-blocking I/O.
The only safe way to read data from or write data to a regular file
while not blocking a task... consists of not performing the operation,
not in that particular task anyway. Concretely, you need to create a separate thread (or process), or use asynchronous I/O (functions whose
name starts with aio_). Whether you like it or not, and even if you
think multiple threads suck, there are no other options.

The .NET runtime

Implements the async/await pattern to unblock the main event loop while I/O is being performed. As mentioned above:

Concretely, you need to create a separate thread (or process), or use
asynchronous I/O (functions whose name starts with aio_). Whether you
like it or not, and even if you think multiple threads suck, there are
no other options.

The .NET threadpool will spawn additional processes as needed (ref why is .NET spawning multiple processes on Linux). So, ideally, when the .NET File.ReadAsync(...) or File.WriteAsync(...) overloads are called, the current thread (from the threadpool) will initiate the I/O operation and will then give up control, freeing it to do other work. But before it does, a continuation is placed on the I/O operation. So when the I/O device signals the operation has finished, the threadpool scheduler knows the next free thread can pick up the continuation.

To be sure, this is all about responsiveness. All code that requires the I/O to complete, will still have to wait. Although, it won't "block" the application.

Back to OS

The thread giving up control, which eventually leads to it being freed up, can be achieved on Windows:

https://docs.microsoft.com/en-us/troubleshoot/windows/win32/asynchronous-disk-io-synchronous

Asynchronous I/O hasn't been a part of Linux (for very long), the flow we have here is described at:

https://devblogs.microsoft.com/dotnet/file-io-improvements-in-dotnet-6/#unix

Unix-like systems don’t expose async file IO APIs (except of the new
io_uring which we talk about later). Anytime user asks FileStream to
perform async file IO operation, a synchronous IO operation is being
scheduled to Thread Pool. Once it’s dequeued, the blocking operation
is performed on a dedicated thread.

Similar flow is suggested by Python's asyncio implementation:

asyncio does not support asynchronous operations on the filesystem.
Even if files are opened with O_NONBLOCK, read and write will block.
...
The Linux kernel provides asynchronous operations on the filesystem
(aio), but it requires a library and it doesn't scale with many
concurrent operations. See aio.
...
For now, the workaround is to use aiofiles that uses threads to handle
files.

Closing thoughts

The concept behind Linux' Non-Blocking descriptor (and its polling mechanism) is not what makes async I/O tick on Windows.

As mentioned by @Damien_The_Unbeliever there's a relatively new io_uring Linux kernel interface that allows continuation flow similar to the one on Windows. However, the following links confirm this is not yet implemented on .NET6:

https://devblogs.microsoft.com/dotnet/file-io-improvements-in-dotnet-6/#whats-next
https://github.com/dotnet/runtime/issues/12650

Non-Blocking Worker - Interrupt File Copy