Most Efficient/Realtime Way to Get Pixel Values from iOS Camera Feed in Swift

Most efficient/realtime way to get pixel values from iOS camera feed in Swift

You don't want an AVCaptureVideoPreviewLayer - that's what you want if you want to display the video. Instead, you want a different output: AVCaptureVideoDataOutput:

https://developer.apple.com/library/ios/documentation/AVFoundation/Reference/AVCaptureVideoDataOutput_Class/index.html#//apple_ref/occ/cl/AVCaptureVideoDataOutput

This gives you direct access to the stream of sample buffers, which you can then get into pixel-space.

Just a note: I don't know what the throughput on current devices is, but I was unable to get a live stream at the highest quality from the iPhone 4S because the GPU<-->CPU pipeline was too slow.

iOS: Get pixel-by-pixel data from camera

AV Foundation can give you back the raw bytes for an image captured by either the video or still camera. You need to set up an AVCaptureSession with an appropriate AVCaptureDevice and a corresponding AVCaptureDeviceInput and AVCaptureDeviceOutput (AVCaptureVideoDataOutput or AVCaptureStillImageOutput). Apple has some examples of this process in their documentation, and it requires some boilerplate code to configure.

Once you have your capture session configured and you are capturing data from the camera, you will set up a -captureOutput:didOutputSampleBuffer:fromConnection: delegate method, where one of the parameters will be a CMSampleBufferRef. That will have a CVImageBufferRef within it that you access via CMSampleBufferGetImageBuffer(). Using CVPixelBufferGetBaseAddress() on that pixel buffer will return the base address of the byte array for the raw pixel data representing your camera frame. This can be in a few different formats, but the most common are BGRA and planar YUV.

I have an example application that uses this here, but I'd recommend that you also take a look at my open source framework which wraps the standard AV Foundation boilerplate and makes it easy to perform image processing on the GPU. Depending on what you want to do with these raw camera bytes, I may already have something you can use there or a means of doing it much faster than with on-CPU processing.

What is the most efficient way to show a live video preview with a CIFilter applied?

I think this is the most effective way.

Core Image (in its default config) uses Metal to perform the image filtering operations on the GPU. An MTKView is meant to be used for displaying results of a Metal pipeline, so that fits.

As for the @Published property: I think it doesn't really matter how the pixel buffers are propagated to your filter chain and ultimately to the view as long as the method doesn't add too much overhead. Using Combine should be totally fine.

There is, unfortunately, no convenient way for applying CIFilters to the camera feed as there is for video playback and export (using AVVideoComposition).

Slow down the ARSession without slowing down the camera feed

About ARSession

There is no need to change the speed of the ARSession, as this not only spoil the desired effect, but also ruin the user's AR experience. The session must be running at 60 fps, it must continue to track all the anchors in the scene and mustn't stop.

Freezing animation in RealityKit 2.0

A robust solution would be to use 2 different animation speeds - normal animation speed when you aren't recording and bullet-time animation speed (or even freeze animation) during screen recording.

var speed: Float { get set }               // Default value is 1.0

The "freezing" effect can be achieved using AnimationPlaybackController:

import RealityKit
import ReplayKit

var ctrl: AnimationPlaybackController!

let neo = try ModelEntity.load(named: "Neo_with_Animation")

ctrl = neo.playAnimation(neo.availableAnimations[0].repeat(count: 50), 
                             transitionDuration: 2, 
                             startsPaused: false)

func startRecording(sender: UIButton!) {
    ctrl.speed = 0.02                      // animation speed is 2%
    // some code to start recording...
}
func stopRecording(sender: UIButton!) {
    ctrl.speed = 1.0                       // animation speed is 100%
    ctrl.speed = -1.0                      // reverse animation speed is 100%
    // some code to stop recording...
}

If you need more info on asset animation, read this post.

Freezing physics in RealityKit 2.0

When you're simulating physics you can stop the process using .static case of PhysicsBodyMode enum and resume the process using .dynamic case. However, the process of freezing the physics does not look as impressive as freezing the animation.

let neoScene = try! Experience.loadNeoWithPhysics()

let neo = neoScene.developer!.children[0] as? (ModelEntity & 
                                               HasCollision & 
                                               HasPhysicsBody & 
                                               HasPhysicsMotion)

func startRecording(sender: UIButton!) {
    neo.physicsBody?.mode = .static       // freeze simulation
    // some code to start recording...
}
func stopRecording(sender: UIButton!) {
    neo.physicsBody?.mode = .dynamic      // resume simulation

    // it's quite possible that after unfreezing  
    // an additional impulse will be required...
    neo.physicsMotion?.linearVelocity.x = 1.0
    neo.physicsMotion?.linearVelocity.z = 0.5

    // some code to stop recording...
}

As far as I know, there's no parameter in RealityKit 2.0 that helps you just to slow down (not freeze!) physics simulation without breaking it. Because RealityKit's engine calculates physics in realtime.

Read this post to find out how to quickly setup physics in RealityKit.

Freezing physics in SceneKit

Nonetheless, SceneKit has such a property that changes simulation's rate. Its name is speed.

var speed: CGFloat { get set }

Capture Metal MTKView as Movie in realtime?

Here's a small class that performs the essential functions of writing out a movie file that captures the contents of a Metal view:

class MetalVideoRecorder {
    var isRecording = false
    var recordingStartTime = TimeInterval(0)

    private var assetWriter: AVAssetWriter
    private var assetWriterVideoInput: AVAssetWriterInput
    private var assetWriterPixelBufferInput: AVAssetWriterInputPixelBufferAdaptor

    init?(outputURL url: URL, size: CGSize) {
        do {
            assetWriter = try AVAssetWriter(outputURL: url, fileType: .m4v)
        } catch {
            return nil
        }

        let outputSettings: [String: Any] = [ AVVideoCodecKey : AVVideoCodecType.h264,
            AVVideoWidthKey : size.width,
            AVVideoHeightKey : size.height ]

        assetWriterVideoInput = AVAssetWriterInput(mediaType: .video, outputSettings: outputSettings)
        assetWriterVideoInput.expectsMediaDataInRealTime = true

        let sourcePixelBufferAttributes: [String: Any] = [
            kCVPixelBufferPixelFormatTypeKey as String : kCVPixelFormatType_32BGRA,
            kCVPixelBufferWidthKey as String : size.width,
            kCVPixelBufferHeightKey as String : size.height ]

        assetWriterPixelBufferInput = AVAssetWriterInputPixelBufferAdaptor(assetWriterInput: assetWriterVideoInput,
                                                                           sourcePixelBufferAttributes: sourcePixelBufferAttributes)

        assetWriter.add(assetWriterVideoInput)
    }

    func startRecording() {
        assetWriter.startWriting()
        assetWriter.startSession(atSourceTime: .zero)

        recordingStartTime = CACurrentMediaTime()
        isRecording = true
    }

    func endRecording(_ completionHandler: @escaping () -> ()) {
        isRecording = false

        assetWriterVideoInput.markAsFinished()
        assetWriter.finishWriting(completionHandler: completionHandler)
    }

    func writeFrame(forTexture texture: MTLTexture) {
        if !isRecording {
            return
        }

        while !assetWriterVideoInput.isReadyForMoreMediaData {}

        guard let pixelBufferPool = assetWriterPixelBufferInput.pixelBufferPool else {
            print("Pixel buffer asset writer input did not have a pixel buffer pool available; cannot retrieve frame")
            return
        }

        var maybePixelBuffer: CVPixelBuffer? = nil
        let status  = CVPixelBufferPoolCreatePixelBuffer(nil, pixelBufferPool, &maybePixelBuffer)
        if status != kCVReturnSuccess {
            print("Could not get pixel buffer from asset writer input; dropping frame...")
            return
        }

        guard let pixelBuffer = maybePixelBuffer else { return }

        CVPixelBufferLockBaseAddress(pixelBuffer, [])
        let pixelBufferBytes = CVPixelBufferGetBaseAddress(pixelBuffer)!

        // Use the bytes per row value from the pixel buffer since its stride may be rounded up to be 16-byte aligned
        let bytesPerRow = CVPixelBufferGetBytesPerRow(pixelBuffer)
        let region = MTLRegionMake2D(0, 0, texture.width, texture.height)

        texture.getBytes(pixelBufferBytes, bytesPerRow: bytesPerRow, from: region, mipmapLevel: 0)

        let frameTime = CACurrentMediaTime() - recordingStartTime
        let presentationTime = CMTimeMakeWithSeconds(frameTime, preferredTimescale: 240)
        assetWriterPixelBufferInput.append(pixelBuffer, withPresentationTime: presentationTime)

        CVPixelBufferUnlockBaseAddress(pixelBuffer, [])
    }
}

After initializing one of these and calling startRecording(), you can add a scheduled handler to the command buffer containing your rendering commands and call writeFrame (after you end encoding, but before presenting the drawable or committing the buffer):

let texture = currentDrawable.texture
commandBuffer.addCompletedHandler { commandBuffer in
    self.recorder.writeFrame(forTexture: texture)
}

When you're done recording, just call endRecording, and the video file will be finalized and closed.

Caveats:

This class assumes the source texture to be of the default format, .bgra8Unorm. If it isn't, you'll get crashes or corruption. If necessary, convert the texture with a compute or fragment shader, or use Accelerate.

This class also assumes that the texture is the same size as the video frame. If this isn't the case (if the drawable size changes, or your screen autorotates), the output will be corrupted and you may see crashes. Mitigate this by scaling or cropping the source texture as your application requires.

Most Efficient/Realtime Way to Get Pixel Values from iOS Camera Feed in Swift