How do I use CoreAudio's AudioConverter to encode AAC in real-time?
AudioConverterFillComplexBuffer
does not actually mean "fill the encoder with my input buffer that I have here". It means "fill this output buffer here with encoded data from the encoder". With this perspective, the callback suddenly makes sense -- it is used to fetch source data to satisfy the "fill this output buffer for me" request. Maybe this is obvious to others, but it took me a long time to understand this (and from all the AudioConverter sample code I see floating around where people send input data through inInputDataProcUserData
, I'm guessing I'm not the only one).
The AudioConverterFillComplexBuffer
call is blocking, and is expecting you to deliver data to it synchronously from the callback. If you are encoding in real time, you will thus need to call FillComplexBuffer
on a separate thread that you set up yourself. In the callback, you can then check for available input data, and if it is not available, you need to block on a semaphore. Using an NSCondition, the encoder thread would then look something like this:
- (void)startEncoder
{
OSStatus creationStatus = AudioConverterNew(&_fromFormat, &_toFormat, &_converter);
_running = YES;
_condition = [[NSCondition alloc] init];
[self performSelectorInBackground:@selector(_encoderThread) withObject:nil];
}
- (void)_encoderThread
{
while(_running) {
// Make quarter-second buffers.
size_t bufferSize = (_outputBitrate/8) * 0.25;
NSMutableData *outAudioBuffer = [NSMutableData dataWithLength:bufferSize];
AudioBufferList outAudioBufferList;
outAudioBufferList.mNumberBuffers = 1;
outAudioBufferList.mBuffers[0].mNumberChannels = _toFormat.mChannelsPerFrame;
outAudioBufferList.mBuffers[0].mDataByteSize = (UInt32)bufferSize;
outAudioBufferList.mBuffers[0].mData = [outAudioBuffer mutableBytes];
UInt32 ioOutputDataPacketSize = 1;
_currentPresentationTime = kCMTimeInvalid; // you need to fill this in during FillComplexBuffer
const OSStatus conversionResult = AudioConverterFillComplexBuffer(_converter, FillBufferTrampoline, (__bridge void*)self, &ioOutputDataPacketSize, &outAudioBufferList, NULL);
// here I convert the AudioBufferList into a CMSampleBuffer, which I've omitted for brevity.
// Ping me if you need it.
[self.delegate encoder:self encodedSampleBuffer:outSampleBuffer];
}
}
And the callback could look like this: (note that I normally use this trampoline to immediately forward to a method on my instance (by forwarding my instance in inUserData
; this step is omitted for brevity)):
static OSStatus FillBufferTrampoline(AudioConverterRef inAudioConverter,
UInt32* ioNumberDataPackets,
AudioBufferList* ioData,
AudioStreamPacketDescription** outDataPacketDescription,
void* inUserData)
{
[_condition lock];
UInt32 countOfPacketsWritten = 0;
while (true) {
// If the condition fires and we have shut down the encoder, just pretend like we have written 0 bytes and are done.
if(!_running) break;
// Out of input data? Wait on the condition.
if(_inputBuffer.length == 0) {
[_condition wait];
continue;
}
// We have data! Fill ioData from your _inputBuffer here.
// Also save the input buffer's start presentationTime here.
// Exit out of the loop, since we're done waiting for data
break;
}
[_condition unlock];
// 2. Set ioNumberDataPackets to the amount of data remaining
// if running is false, this will be 0, indicating EndOfStream
*ioNumberDataPackets = countOfPacketsWritten;
return noErr;
}
And for completeness, here's how you would then feed this encoder with data, and how to shut it down properly:
- (void)appendSampleBuffer:(CMSampleBufferRef)sampleBuffer
{
[_condition lock];
// Convert sampleBuffer and put it into _inputBuffer here
[_condition broadcast];
[_condition unlock];
}
- (void)stopEncoding
{
[_condition lock];
_running = NO;
[_condition broadcast];
[_condition unlock];
}
AAC encoding using AudioConverter and writing to AVAssetWriter
Turns out there were a variety of things that I was doing wrong. Instead of posting a garble of code, I'm going to try and organize this into bite-sized pieces of things that I discovered..
Samples vs Packets vs Frames
This had been a huge source of confusion for me:
- Each
CMSampleBuffer
can have 1 or more sample buffers (discovered viaCMSampleBufferGetNumSamples
) - Each
CMSampleBuffer
that contains 1 sample represents a single audio packet. - Therefore,
CMSampleBufferGetNumSamples(sample)
will return the number of packets contained in the given buffer. - Packets contain frames. This is governed by the
mFramesPerPacket
property of the buffer'sAudioStreamBasicDescription
. For linear PCM buffers, the total size of each sample buffer isframes * bytes per frame
. For compressed buffers (like AAC), there is no relationship between the total size and frame count.
AudioConverterComplexInputDataProc
This callback is used to retrieve more linear PCM audio data for encoding. It's imperative that you must supply at least the number of packets specified by ioNumberDataPackets
. Since I've been using the converter for real-time push-style encoding, I needed to ensure that each data push contains the minimum amount of packets. Something like this (pseudo-code):
let minimumPackets = outputFramesPerPacket / inputFramesPerPacket
var buffers: [CMSampleBuffer] = []
while getTotalSize(buffers) < minimumPackets {
buffers = buffers + [getNextBuffer()]
}
AudioConverterFillComplexBuffer(...)
Slicing CMSampleBuffer
's
You can actually slice CMSampleBuffer
's if they contain multiple buffers. The tool to do this is CMSampleBufferCopySampleBufferForRange
. This is nice so that you can provide the AudioConverterComplexInputDataProc
with the exact number of packets that it asks for, which makes handling timing information for the resulting encoded buffer easier. Because if you give the converter 1500
frames of data when it expects 1024
, the result sample buffer will have a duration of 1024/sampleRate
as opposed to 1500/sampleRate
.
Priming and trim duration
When doing AAC encoding, you must set the trim duration like so:
CMSetAttachment(buffer,
kCMSampleBufferAttachmentKey_TrimDurationAtStart,
CMTimeCopyAsDictionary(primingDuration, kCFAllocatorDefault),
kCMAttachmentMode_ShouldNotPropagate)
One thing I did wrong was that I added the trim duration at encode time. This should be handled by your writer so that it can guarantee the information gets added to your leading audio frames.
Also, the value of kCMSampleBufferAttachmentKey_TrimDurationAtStart
should never be greater than the duration of the sample buffer. An example of priming:
- Priming frames:
2112
- Sample rate:
44100
- Priming duration:
2112 / 44100 = ~0.0479s
- First frame, frames:
1024
, priming duration:1024 / 44100
- Second frame, frames:
1024
, priming duration:1088 / 41100
Creating the new CMSampleBuffer
AudioConverterFillComplexBuffer
has an optional outputPacketDescriptionsPtr
. You should use it. It will point to a new array of packet descriptions that contains sample size information. You need this sample size information to construct the new compressed sample buffer:
let bufferList: AudioBufferList
let packetDescriptions: [AudioStreamPacketDescription]
var newBuffer: CMSampleBuffer?
CMAudioSampleBufferCreateWithPacketDescriptions(
kCFAllocatorDefault, // allocator
nil, // dataBuffer
false, // dataReady
nil, // makeDataReadyCallback
nil, // makeDataReadyRefCon
formatDescription, // formatDescription
Int(bufferList.mNumberBuffers), // numSamples
CMSampleBufferGetPresentationTimeStamp(buffer), // sbufPTS (first PTS)
&packetDescriptions, // packetDescriptions
&newBuffer)
OS X / iOS - Sample rate conversion for a buffer using AudioConverterFillComplexBuffer
Working code for Core Audio sample rate conversion and channel count conversion, using Audio Converter Services (now available as a part of the BSD-licensed XAL audio library):
void CoreAudio_AudioManager::_convertStream(Buffer* buffer, unsigned char** stream, int *streamSize)
{
if (buffer->getBitsPerSample() != unitDescription.mBitsPerChannel ||
buffer->getChannels() != unitDescription.mChannelsPerFrame ||
buffer->getSamplingRate() != unitDescription.mSampleRate)
{
// describe the input format's description
AudioStreamBasicDescription inputDescription;
memset(&inputDescription, 0, sizeof(inputDescription));
inputDescription.mFormatID = kAudioFormatLinearPCM;
inputDescription.mFormatFlags = kLinearPCMFormatFlagIsPacked | kLinearPCMFormatFlagIsSignedInteger;
inputDescription.mChannelsPerFrame = buffer->getChannels();
inputDescription.mSampleRate = buffer->getSamplingRate();
inputDescription.mBitsPerChannel = buffer->getBitsPerSample();
inputDescription.mBytesPerFrame = (inputDescription.mBitsPerChannel * inputDescription.mChannelsPerFrame) / 8;
inputDescription.mFramesPerPacket = 1; //*streamSize / inputDescription.mBytesPerFrame;
inputDescription.mBytesPerPacket = inputDescription.mBytesPerFrame * inputDescription.mFramesPerPacket;
// copy conversion output format's description from the
// output audio unit's description.
// then adjust framesPerPacket to match the input we'll be passing.
// framecount of our input stream is based on the input bytecount.
// output stream will have same number of frames, but different
// number of bytes.
AudioStreamBasicDescription outputDescription = unitDescription;
outputDescription.mFramesPerPacket = 1; //inputDescription.mFramesPerPacket;
outputDescription.mBytesPerPacket = outputDescription.mBytesPerFrame * outputDescription.mFramesPerPacket;
// create an audio converter
AudioConverterRef audioConverter;
OSStatus acCreationResult = AudioConverterNew(&inputDescription, &outputDescription, &audioConverter);
if(!audioConverter)
{
// bail out
free(*stream);
*streamSize = 0;
*stream = (unsigned char*)malloc(0);
return;
}
// calculate number of bytes required for output of input stream.
// allocate buffer of adequate size.
UInt32 outputBytes = outputDescription.mBytesPerPacket * (*streamSize / inputDescription.mBytesPerPacket); // outputDescription.mFramesPerPacket * outputDescription.mBytesPerFrame;
unsigned char *outputBuffer = (unsigned char*)malloc(outputBytes);
memset(outputBuffer, 0, outputBytes);
// describe input data we'll pass into converter
AudioBuffer inputBuffer;
inputBuffer.mNumberChannels = inputDescription.mChannelsPerFrame;
inputBuffer.mDataByteSize = *streamSize;
inputBuffer.mData = *stream;
// describe output data buffers into which we can receive data.
AudioBufferList outputBufferList;
outputBufferList.mNumberBuffers = 1;
outputBufferList.mBuffers[0].mNumberChannels = outputDescription.mChannelsPerFrame;
outputBufferList.mBuffers[0].mDataByteSize = outputBytes;
outputBufferList.mBuffers[0].mData = outputBuffer;
// set output data packet size
UInt32 outputDataPacketSize = outputBytes / outputDescription.mBytesPerPacket;
// fill class members with data that we'll pass into
// the InputDataProc
_converter_currentBuffer = &inputBuffer;
_converter_currentInputDescription = inputDescription;
// convert
OSStatus result = AudioConverterFillComplexBuffer(audioConverter, /* AudioConverterRef inAudioConverter */
CoreAudio_AudioManager::_converterComplexInputDataProc, /* AudioConverterComplexInputDataProc inInputDataProc */
this, /* void *inInputDataProcUserData */
&outputDataPacketSize, /* UInt32 *ioOutputDataPacketSize */
&outputBufferList, /* AudioBufferList *outOutputData */
NULL /* AudioStreamPacketDescription *outPacketDescription */
);
// change "stream" to describe our output buffer.
// even if error occured, we'd rather have silence than unconverted audio.
free(*stream);
*stream = outputBuffer;
*streamSize = outputBytes;
// dispose of the audio converter
AudioConverterDispose(audioConverter);
}
}
OSStatus CoreAudio_AudioManager::_converterComplexInputDataProc(AudioConverterRef inAudioConverter,
UInt32* ioNumberDataPackets,
AudioBufferList* ioData,
AudioStreamPacketDescription** ioDataPacketDescription,
void* inUserData)
{
if(ioDataPacketDescription)
{
xal::log("_converterComplexInputDataProc cannot provide input data; it doesn't know how to provide packet descriptions");
*ioDataPacketDescription = NULL;
*ioNumberDataPackets = 0;
ioData->mNumberBuffers = 0;
return 501;
}
CoreAudio_AudioManager *self = (CoreAudio_AudioManager*)inUserData;
ioData->mNumberBuffers = 1;
ioData->mBuffers[0] = *(self->_converter_currentBuffer);
*ioNumberDataPackets = ioData->mBuffers[0].mDataByteSize / self->_converter_currentInputDescription.mBytesPerPacket;
return 0;
}
In the header, as part of the CoreAudio_AudioManager
class, here are relevant instance variables:
AudioStreamBasicDescription unitDescription;
AudioBuffer *_converter_currentBuffer;
AudioStreamBasicDescription _converter_currentInputDescription;
A few months later, I'm looking at this and I've realized that I didn't document the changes.
If you are interested in what the changes were:
- look at the callback function
CoreAudio_AudioManager::_converterComplexInputDataProc
- one has to properly specify the number of output packets into
ioNumberDataPackets
- this has required introduction of new instance variables to hold both the buffer (the previous
inUserData
) and the input description (used to calculate the number of packets to be fed into Core Audio's converter) - this calculation of "output" packets (those fed into the converter) is done based on amount of data that our callback received, and the number of bytes per packet that the input format contains
Hopefully this edit will help a future reader (myself included)!
Related Topics
What Is Container View in iOS 5 Sdk
Accessing the Settings App from Your App in iOS 8
How to Create Custom View Programmatically in Swift Having Controls Text Field, Button etc
Dragging Scnnode in Arkit Using Scenekit
Opening Testflight App from Another App and Deep Link to Specific App
How to Move to the Next Page in Facebook JSON Response Using iOS Sdk
Full Resolution Screenshots for iPhone 6 and 6+
External Framework File/File.H (Parse/Parse.H) File Not Found
Xctest Build Errors for Test Target Xcode 5:
Will Apps That Use Telprompt Be Rejected
How to Disable Custom Keyboards (Ios8) for My App
App Crashing When Using Firebase Auth, Reason: 'Default App Has Already Been Configured.'
Objective-C Check If Subviews of Rotated Uiviews Intersect
Using Generic In-App Purchase Items for a Dynamic Range of Digital Products
Custom View Which Looks Like Uialertview