Metal: Limit Mtlrendercommandencoder Texture Loading to Only Part of Texture

Metal: limit MTLRenderCommandEncoder texture loading to only part of texture

One approach, to avoid loading and storing the entire texture into the render pipeline, could be the following, assuming your scissor rectangle is constant between drawcalls:

Blit (MTLBlitCommandEncoder) the region of interest from the large texture to a smaller(e.g. the size of your scissor rectangle) intermediate texture.
Load and store, and draw/operate only on the smaller intermediate texture.
When done encoding, blit back the result to the original source region of the larger texture.

This way you load and store only the region of interest in your pipeline, with only the added constant memory cost of maintaining a smaller intermediate texture(assuming region of interest is constant between drawcalls).

Blitting is a fast operation and the above method should thus optimize your current pipeline.

How do I use MetalKit texture loader with Metal heaps?

Well, the method I outlined above seems to work. As predicted, it's pretty long-winded. I'd be very interested to know if anyone has anything more elegant.

enum MetalError: Error {
    case anErrorOccured
}

extension MTLTexture {
    var descriptor: MTLTextureDescriptor {
        let descriptor = MTLTextureDescriptor()
        descriptor.width = width
        descriptor.height = height
        descriptor.depth = depth
        descriptor.textureType = textureType
        descriptor.cpuCacheMode = cpuCacheMode
        descriptor.storageMode = storageMode
        descriptor.pixelFormat = pixelFormat
        descriptor.arrayLength = arrayLength
        descriptor.mipmapLevelCount = mipmapLevelCount
        descriptor.sampleCount = sampleCount
        descriptor.usage = usage
        return descriptor
    }

    var size: MTLSize {
        return MTLSize(width: width, height: height, depth: depth)
    }
}

extension MTKTextureLoader {
    func newHeap(withTexturesNamed names: [String], queue: MTLCommandQueue, scaleFactor: CGFloat, bundle: Bundle?, options: [MTKTextureLoader.Option : Any]?, onCompletion: (([MTLTexture]) -> Void)?) throws -> MTLHeap {
        let device = queue.device
        let sourceTextures = try names.map { name in
            return try newTexture(name: name, scaleFactor: scaleFactor, bundle: bundle, options: options)
        }
        let storageMode: MTLStorageMode = .private
        let descriptors: [MTLTextureDescriptor] = sourceTextures.map { source in
            let desc = source.descriptor
            desc.storageMode = storageMode
            return desc
        }
        let sizeAligns = descriptors.map { device.heapTextureSizeAndAlign(descriptor: $0) }
        let heapDescriptor = MTLHeapDescriptor()
        heapDescriptor.size = sizeAligns.reduce(0) { $0 + $1.size }
        heapDescriptor.cpuCacheMode = descriptors[0].cpuCacheMode
        heapDescriptor.storageMode = storageMode
        guard let heap = device.makeHeap(descriptor: heapDescriptor),
            let buffer = queue.makeCommandBuffer(),
            let blit = buffer.makeBlitCommandEncoder()
            else {
            throw MetalError.anErrorOccured
        }
        let destTextures = descriptors.map { descriptor in
            return heap.makeTexture(descriptor: descriptor)
        }
        let origin = MTLOrigin()
        zip(sourceTextures, destTextures).forEach {(source, dest) in
            blit.copy(from: source, sourceSlice: 0, sourceLevel: 0, sourceOrigin: origin, sourceSize: source.size, to: dest, destinationSlice: 0, destinationLevel: 0, destinationOrigin: origin)
            blit.generateMipmaps(for: dest)
        }
        blit.endEncoding()
        buffer.addCompletedHandler { _ in
            onCompletion?(destTextures)
        }
        buffer.commit()
        return heap
    }
}

Chunk Rendering in Metal

You note:

I've read that the point of the MTLParallelRenderCommandEncoder is to create each MTLRenderCommandEncoder in a separate thread...

And you're correct. What you're doing is sequentially creating, encoding with, and ending command encoders — there's nothing parallel going on here, so MTLParallelRenderCommandEncoder is doing nothing for you. You'd have roughly the same performance if you eliminated the parallel encoder and just created encoders with renderCommandEncoderWithDescriptor(_:) on each pass through your for loop... which is to say, you'd still have the same performance problem due to the overhead of creating all those encoders.

So, if you're going to encode sequentially, just reuse the same encoder. Also, you should reuse as much of your other shared state as possible. Here's a quick pass at a possible refactoring (untested):

let passDescriptor = MTLRenderPassDescriptor()

// call this once before your render loop
func setup() {
    makeDepthTexture()

    passDescriptor.colorAttachments[0].clearColor = MTLClearColorMake(0.2, 0.2, 0.2, 1)
    passDescriptor.colorAttachments[0].storeAction = .Store
    passDescriptor.colorAttachments[0].loadAction = .Clear

    passDescriptor.depthAttachment.texture = depthTexture
    passDescriptor.depthAttachment.clearDepth = 1
    passDescriptor.depthAttachment.loadAction = .Clear
    passDescriptor.depthAttachment.storeAction = .Store

    // set up render pipeline state and depthStencil state
}

func drawNodes(nodes: [OctreeNode], inView view: AHMetalView) {

    updateUniformsForView(view, duration: view.frameDuration)

    // Set up completed handler ahead of time
    let commandBuffer = commandQueue.commandBuffer()
    commandBuffer.addCompletedHandler { _ in // unused parameter
        self.uniformBufferIndex = (self.uniformBufferIndex + 1) % AHInFlightBufferCount
        dispatch_semaphore_signal(self.displaySemaphore)
    }

    // Semaphore should be tied to drawable acquisition
    dispatch_semaphore_wait(displaySemaphore, DISPATCH_TIME_FOREVER)
    guard let drawable = layer.nextDrawable()
        else { return }

    // Set up the one part of the pass descriptor that changes per-frame
    passDescriptor.colorAttachments[0].texture = drawable.texture

    // Get one render pass descriptor and reuse it
    let renderPass = commandBuffer.renderCommandEncoderWithDescriptor(passDescriptor)
    renderPass.setTriangleFillMode(.Lines)
    renderPass.setRenderPipelineState(renderPipelineState)
    renderPass.setDepthStencilState(depthStencilState)

    for node in nodes {
        // Update offsets and draw
        let uniformBufferOffset = sizeof(AHUniforms) * uniformBufferIndex
        renderPass.setVertexBuffer(node.vertexBuffer, offset: 0, atIndex: 0)
        renderPass.setVertexBuffer(uniformBuffer, offset: uniformBufferOffset, atIndex: 1)
        renderPass.drawIndexedPrimitives(.Triangle, indexCount: AHMaxIndicesPerChunk, indexType: AHIndexType, indexBuffer: node.indexBuffer, indexBufferOffset: 0)

    }
    renderPass.endEncoding()

    commandBuffer.presentDrawable(drawable)
    commandBuffer.commit()
}

Then, profile with Instruments to see what, if any, further performance issues you might have. There's a great WWDC 2015 session about that showing several of the common "gotchas", how to diagnose them in profiling, and how to fix them.

Second-pass rendering in Metal -- is it this easy?

It's not really clear which part you were worried about, but, yes, that works. You can change any of the properties of a render command encoder for which there's a "set" method at any point and do some more drawing. You are not limited to a single draw per encoder, nor even a single draw configuration. The only things which are fixed for the lifetime of the render command encoder are the properties described by the render pass descriptor you use to create it.

You can even change which render pipeline state is used. However, remember that the render pass descriptor is fixed and the render pipeline state's attachment pixel formats must match the render pass descriptor's attachment textures.

Of course, if you need to, you can use multiple command encoders and it's really not hard to set up the load and store actions.

Loading data into a texture defined as a MTLTextureType1DArray

You need to use multiple calls to -replaceRegion:mipmapLevel:slice:withBytes:bytesPerRow:bytesPerImage:, once for each element of the array. You specify the array index with the slice parameter.

The region parameter needs to be 1-dimensional.

Error when using Metal Indirect Command Buffer: Fragment shader cannot be used with indirect command buffers

Looking through the code and through Metal documentation and Metal Shading Language specification, I think I know why you get this error.

If you look through render_command interface that is present in metal_command_buffer header in Metal, you'll find that to pass parameters to indirect render commands, you only have these functions: set_vertex_buffer and set_fragment_buffer, there is no set_vertex_texture or set_vertex_sampler like you have in MTLRenderCommandEncoder.

But, since your pipeline uses shader that in turn uses textures as arguments and you indicate by using supportIndirectCommandBuffers that you would like to use this pipeline in indirect commands, Metal has no choice but to fail pipeline creation.

Instead if you want to pass textures or samplers to indirect render commands, you should use argument buffers, that you will pass to the shader that issues indirect render commands, which in turn will bind them using set_vertex_buffer and set_fragment_buffer for each render_command.

Specification: Metal Shading Language Specification (Section 5.16)

Metal: Limit Mtlrendercommandencoder Texture Loading to Only Part of Texture