How, Exactly, Do I Render Metal on a Background Thread

Doing UI on a background thread

While I'm not quite up to date on the latest releases of MacOS/iOS, as of 2020 Apple UIKit and AppKit were not thread safe. Only one thread can safely change UI objects, and unless you go to a lot of trouble that's going to be the main thread. Even if you do go to all the trouble of closing the window manager connection etc etc you're still going to end up with one thread only doing UI. So the limitation still applies on at least one major system.

While it's possibly unsafe to directly modify the contents of a window from any other thread, you can do software rendering to an offscreen bitmap image from any thread you like, taking as long as you like. Then hand the finished image over to the main thread for rendering. (The possibly is why cross platform toolkits disallow/tell you not to. Sometimes it might work, but you can't say why, or even that it will keep working.)

With Vulkan and DirectX 12 (and I think but am not sure Metal) you can render from multiple threads. Woohoo! Of course now you have to figure out how to do all the coordination and locking and cross-synching without making the whole thing slower than single threaded, but at least you have the option to try.

Adding to the excellent answer by Matt, with Qt programs you can use invokeMethod and postEvent to have background threads update the UI safely.

Rendering to CAMetalLayer from dedicated render thread / loop

At some level, you're going to be throttled by the availability of drawables. A CAMetalLayer has a fixed pool of drawables available, and calling nextDrawable will block the current thread until a drawable becomes available. This doesn't imply you have to call nextDrawable at the top of your render loop, though.

If you want to draw on your own schedule without getting blocked waiting on a drawable, render to an off-screen renderbuffer (i.e., a MTLTexture with dimensions matching your drawable size), and then blit from the most-recently-drawn texture to a drawable's texture and present on whatever cadence you prefer. This can be useful for getting frame timings, but every frame you draw and then don't display is wasted work. It also increases the risk of judder.

Your options are limited when it comes to getting callbacks that match the v-sync cadence. Your best is almost certainly a CVDisplayLink scheduled in the default and tracking run loop modes, though this has caveats.

You could use something like a counting semaphore in concert with a display link if you want to free-run without getting too far ahead.

If your application is able to maintain a real-time framerate, you'll normally be rendering a frame or two ahead of what's going on the glass, so you don't want to literally block on v-sync; you just want to inform the window server that you'd like presentation to match v-sync. On macOS, you do this by setting the layer's displaySyncEnabled to true (the default). Turning this off may cause tearing on certain displays.

Is drawing to an MTKView or CAMetalLayer required to take place on the main thread?

It is safe to draw on background threads. The docs for -nextDrawable say:

Calling this method blocks the current CPU thread until a new drawable is available.

(Emphasis added.) If it could only be called on the main thread, that would probably not be so generalized. Also, Apple's general advice is to avoid blocking the main thread, so you'd think they would call out that fact in some way here, such as advising you not to call it unless you're pretty sure it won't block.

For how the drawable is used (rather than obtained), note that a typical use case is to call the command buffer's -presentDrawable: method. That method is a convenience for adding a scheduled handler block (as via -addScheduledHandler:) which will then call -present on the drawable. It is unspecified what thread or queue the handler blocks will be called on, which suggests that there's no promise that the -present call on the drawable will happen on the main thread.

And even after that, the actual presentation of the drawable to the screen is not synchronous within the call to -present. The drawable waits until any commands that render or write to its texture are completed and only then presents to the screen. It's not specified how that asynchronicity is achieved, but it further suggests that it doesn't matter what thread -present is called on.

There's a bit of discussion about multi-threading in the Metal Programming Guide, although it's not quite as direct as one might hope. See especially the section on Multiple Threads, Command Buffers, and Command Encoders. Note that there's a discussion of command buffers being filled by background threads and no specific warning about working with drawables. Again, it's sort of argument by lack of evidence, but I think it's clear. They do call out that only a single thread may act on a given command buffer at a time, so they are considering thread safety questions.

Off Screen Rendering Metal

off-screen texture's usage to .renderTarget, you should use [.renderTarget, .shaderRead].

How to render/export frames offline in Metal?

I adapted my answer here and the Apple Metal game template to create this sample, which demonstrates how to record a video file directly from a sequence of frames rendered by Metal.

Since all rendering in Metal draws to a texture, it's not too hard to adapt normal Metal code so that it's suitable for rendering offline into a movie file. To recap the core recording process:

Create an AVAssetWriter that targets your URL of choice
Create an AVAssetWriterInput of type .video so you can write video frames
Wrap an AVAssetWriterInputPixelBufferAdaptor around the input so you can append CVPixelBuffers as frames to the video
After you start recording, each frame, copy the pixels from your rendered frame texture into a pixel buffer obtained from the adapter's pixel buffer pool.
When you're done, mark the input as finished and finish writing to the asset writer.

As for driving the recording, since you aren't getting delegate callbacks from an MTKView or CADisplayLink, you need to do it yourself. The basic pattern looks like this:

for t in stride(from: 0, through: duration, by: frameDelta) {
    draw(in: renderBuffer, depthTexture: depthBuffer, time: t) { (texture) in
        recorder.writeFrame(forTexture: texture, time: t)
    }
}

If your rendering and recording code is asynchronous and thread-safe, you can throw this on a background queue to keep your interface responsive. You could also throw in a progress callback to update your UI if your rendering takes a long time.

Note that since you're not running in real-time, you'll need to ensure that any animation takes into account the current frame time (or the timestep between frames) so things run at the proper rate when played back. In my sample, I do this by just having the rotation of the cube depend directly on the frame's presentation time.