Clipping Sound with Opus on Android, Sent from iOS

Clipping sound with opus on Android, sent from IOS

Your code is doing Swift memory allocation (Array concatenation) and Swift method calls (your recording delegate) inside the audio callback. Apple (in a WWDC session on Audio) recommends not doing any memory allocation or method calls inside the real-time audio callback context (especially when requesting short Preferred IO Buffer Durations). Stick to C function calls, such as memcpy and TPCircularBuffer.

Added: Also, don't discard samples. If you get 680 samples, but only need 640 for a packet, keep the 40 "left over" samples and use them appended in front of a later packet. The circular buffer will save them for you. Rinse and repeat. Send all the samples you get from the audio callback when you've accumulated enough for a packet, or yet another packet when you end up accumulating 1280 (2*640) or more.

AudioUnit + Opus codec = crackle issue

Your audio call back time may need increased. Try increasing your session setPreferredIOBufferDuration time. I have used opus on iOS and have measured the decoding time. It takes 2 to 3 ms to decode about 240 frames of data. There is a good chance you are missing your subsequent callbacks because it is taking to long to decode the audio.

Processing data with Audio Unit recording callback [iOS][Swift]

Apple recommends against using semaphores or calling Swift methods (such as encoders) inside any real-time Audio Unit callback. Just copy the data into a pre-allocated circular buffer inside the audio unit callback. Period. Do everything else outside the callback. Semaphores and signals included.

So, you need to create a polling thread.

Do everything inside a polling loop, timer callback, or network ready callback. Do your work anytime there is enough data in the FIFO. Call (poll) often enough (high enough polling frequency or timer callback rate) that you do not lose data. Handle all the data you can (perhaps multiple buffers at a time, if available) in each iteration of the polling loop.

You may need to pre-fill the circular buffer a bit (perhaps a few multiples of your 640 UDP frame size) before starting to send, to account for network and timer jitter.

Can I send sound buffers received using AVAudioSinkNode to be rendered in real-time using AVAudioSourceNode?

The way to do this in real time is to use Audio Unit callbacks (the buffers of which can be as small as a few milliseconds). They will almost always be the same size (except maybe at device power state changes), so just save each one, process it as needed, and have it ready for the next output, a few mS later. Or use a circular/ring fifo/buffer. The RemoteIO Audio Unit in iOS has synchronized I/O.

iOS - WebRTC one way audio only with Opus Codec

For anybody else who stumbles across this, I wasn't using the audiosession provided by callkit in the didActivate method of the callprovider protocol.

Here's my amended configureAudioSession

private func configureAudioSession() {
    self.rtcAudioSession.lockForConfiguration()
    do {
         self.rtcAudioSession.useManualAudio = true
         self.rtcAudioSession.isAudioEnabled = false
        try self.rtcAudioSession.setCategory(AVAudioSession.Category.playAndRecord.rawValue)
        try self.rtcAudioSession.setMode(AVAudioSession.Mode.voiceChat.rawValue)
    } catch let error {
        debugPrint("Error changeing AVAudioSession category: \(error)")
    }
    self.rtcAudioSession.unlockForConfiguration()
}

And then in my provider delegate

 func provider(_ provider: CXProvider, didActivate audioSession: AVAudioSession) {
        print("Received \(#function)")
        
        webRTCClient.rtcAudioSession.audioSessionDidActivate(audioSession)
        webRTCClient.rtcAudioSession.isAudioEnabled = true
        // Start call audio media, now that the audio session has been activated after having its priority boosted.
        //   startAudio()
    }

offline rendering with a lowpass filter causes aliasing and clipping

I am using converters at both ends with ASBD set to 8000 samplerate mono floats for input of input converter and output of output converter while using 44100.0 stereo for input and output of the low pass unit, and calling AudioUnitRender on the end converter with no io unit for the offline render. For the online render I put a converter unit before the io unit so the render callback will pull from buffers at 8K for playback too. It appears that the lower sample rate on the output ASBD requires a higher maximum frames per slice and a smaller slice (AudioUnitRender inNumberFrames) and that's why it wouldn't render.

#import "ViewController.h"
#import <AudioToolbox/AudioToolbox.h>

@implementation ViewController{

    int sampleCount;
    int renderBufferHead;
    float *renderBuffer;
}

- (void)viewDidLoad {

    [super viewDidLoad];
    float sampleRate = 8000;

    int bufferSeconds = 3;
    sampleCount = sampleRate * bufferSeconds;//seconds
    float *originalSaw = generateSawWaveBuffer(440, sampleRate, sampleCount);

    renderBuffer = originalSaw;
    renderBufferHead = 0;

    AURenderCallbackStruct cbStruct = {renderCallback,(__bridge void *)self};

    //this will do offline render using the render callback,  callback just reads from renderBuffer at samplerate
    float *processedBuffer = offlineRender(sampleCount, sampleRate, &cbStruct);

    renderBufferHead = 0;//rewind render buffer after processing

    //set up audio units to do live render using the render callback at sample rate then self destruct after delay
    //it will play originalSaw for bufferSeconds, then after delay will switch renderBuffer to point at processedBuffer
    float secondsToPlayAudio = (bufferSeconds + 1) * 2;
    onlineRender(sampleRate, &cbStruct,secondsToPlayAudio);

    //wait for original to finish playing, then change render callback source buffer to processed buffer
    dispatch_after(dispatch_time(DISPATCH_TIME_NOW, (int64_t)((secondsToPlayAudio / 2) * NSEC_PER_SEC)), dispatch_get_main_queue(), ^{
        renderBuffer = processedBuffer;
        renderBufferHead = 0;//rewind render buffer
    });

    //destroy after all rendering done
    dispatch_after(dispatch_time(DISPATCH_TIME_NOW, (int64_t)(secondsToPlayAudio * NSEC_PER_SEC)), dispatch_get_main_queue(), ^{
        free(originalSaw);
        free(processedBuffer);
    });
}

float * offlineRender(int count, double sampleRate, AURenderCallbackStruct *cbStruct){

    AudioComponentInstance inConverter = getComponentInstance(kAudioUnitType_FormatConverter, kAudioUnitSubType_AUConverter);
    AudioComponentInstance lowPass = getComponentInstance(kAudioUnitType_Effect, kAudioUnitSubType_LowPassFilter);
    AudioComponentInstance outConverter = getComponentInstance(kAudioUnitType_FormatConverter, kAudioUnitSubType_AUConverter);

    AudioStreamBasicDescription asbd = getMonoFloatASBD(sampleRate);
    AudioUnitSetProperty(inConverter, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Input, 0, &asbd, sizeof(AudioStreamBasicDescription));
    AudioUnitSetProperty(outConverter, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Output, 0, &asbd, sizeof(AudioStreamBasicDescription));

    AudioUnitSetProperty(inConverter, kAudioUnitProperty_SetRenderCallback, kAudioUnitScope_Input, 0, cbStruct, sizeof(AURenderCallbackStruct));

    formatAndConnect(inConverter, lowPass);
    formatAndConnect(lowPass, outConverter);

    UInt32 maxFramesPerSlice = 4096;
    AudioUnitSetProperty(inConverter, kAudioUnitProperty_MaximumFramesPerSlice, kAudioUnitScope_Global, 0, &maxFramesPerSlice, sizeof(UInt32));
    AudioUnitSetProperty(lowPass, kAudioUnitProperty_MaximumFramesPerSlice, kAudioUnitScope_Global, 0, &maxFramesPerSlice, sizeof(UInt32));
    AudioUnitSetProperty(outConverter, kAudioUnitProperty_MaximumFramesPerSlice, kAudioUnitScope_Global, 0, &maxFramesPerSlice, sizeof(UInt32));

    AudioUnitInitialize(inConverter);
    AudioUnitInitialize(lowPass);
    AudioUnitInitialize(outConverter);

    AudioUnitSetParameter(lowPass, kLowPassParam_CutoffFrequency, kAudioUnitScope_Global, 0, 500, 0);

    AudioBufferList *bufferlist = malloc(sizeof(AudioBufferList) + sizeof(AudioBufferList));//stereo bufferlist  + sizeof(AudioBuffer)
    float *left = malloc(sizeof(float) * 4096);
    bufferlist->mBuffers[0].mData = left;
    bufferlist->mNumberBuffers = 1;

    AudioTimeStamp inTimeStamp;
    memset(&inTimeStamp, 0, sizeof(AudioTimeStamp));
    inTimeStamp.mFlags = kAudioTimeStampSampleTimeValid;
    inTimeStamp.mSampleTime = 0;

    float *buffer = malloc(sizeof(float) * count);
    int inNumberframes = 512;
    AudioUnitRenderActionFlags flag = 0;
    int framesRead = 0;
    while (count){
        inNumberframes = MIN(inNumberframes, count);
        bufferlist->mBuffers[0].mDataByteSize = sizeof(float) * inNumberframes;
        printf("Offline Render %i frames\n",inNumberframes);
        AudioUnitRender(outConverter, &flag, &inTimeStamp, 0, inNumberframes, bufferlist);
        memcpy(buffer + framesRead, left, sizeof(float) * inNumberframes);
        inTimeStamp.mSampleTime += inNumberframes;
        count -= inNumberframes;
        framesRead += inNumberframes;

    }
    free(left);
//    free(right);
    free(bufferlist);
    AudioUnitUninitialize(inConverter);
    AudioUnitUninitialize(lowPass);
    AudioUnitUninitialize(outConverter);
    return buffer;
}

OSStatus renderCallback(void *                          inRefCon,
                        AudioUnitRenderActionFlags *    ioActionFlags,
                        const AudioTimeStamp *          inTimeStamp,
                        UInt32                          inBusNumber,
                        UInt32                          inNumberFrames,
                        AudioBufferList *               ioData){

    ViewController *self = (__bridge ViewController*)inRefCon;
    float *left = ioData->mBuffers[0].mData;

    for (int i = 0; i < inNumberFrames; i++) {
        if (self->renderBufferHead >= self->sampleCount) {
            left[i] = 0;
        }
        else{
            left[i] = self->renderBuffer[self->renderBufferHead++];
        }
    }
    if(ioData->mNumberBuffers == 2){
        memcpy(ioData->mBuffers[1].mData, left, sizeof(float) * inNumberFrames);
    }
    printf("render %f to %f\n",inTimeStamp->mSampleTime,inTimeStamp->mSampleTime + inNumberFrames);
    return noErr;
}

void onlineRender(double sampleRate, AURenderCallbackStruct *cbStruct,float duration){
    AudioComponentInstance converter = getComponentInstance(kAudioUnitType_FormatConverter, kAudioUnitSubType_AUConverter);
    AudioComponentInstance ioUnit = getComponentInstance(kAudioUnitType_Output, kAudioUnitSubType_DefaultOutput);

    AudioStreamBasicDescription asbd = getMonoFloatASBD(sampleRate);
    AudioUnitSetProperty(converter, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Input, 0, &asbd, sizeof(AudioStreamBasicDescription));
    AudioUnitSetProperty(converter, kAudioUnitProperty_SetRenderCallback, kAudioUnitScope_Input, 0, cbStruct, sizeof(AURenderCallbackStruct));

    formatAndConnect(converter, ioUnit);

    AudioUnitInitialize(converter);
    AudioUnitInitialize(ioUnit);
    AudioOutputUnitStart(ioUnit);

    dispatch_after(dispatch_time(DISPATCH_TIME_NOW, (int64_t)(duration * NSEC_PER_SEC)), dispatch_get_main_queue(), ^{
        AudioOutputUnitStop(ioUnit);
        AudioUnitUninitialize(ioUnit);
        AudioUnitUninitialize(converter);
    });

}

float * generateSawWaveBuffer(float frequency,float sampleRate, int sampleCount){
    float *buffer = malloc(sizeof(float) * sampleCount);
    float increment = (frequency / sampleRate) * 2;
    int increasing = 1;
    float sample = 0;
    for (int i = 0; i < sampleCount; i++) {
        if (increasing) {
            sample += increment;
            if (sample >= 1) {
                increasing = 0;
            }
        }
        else{
            sample -= increment;
            if (sample < -1) {
                increasing = 1;
            }
        }
        buffer[i] = sample;
    }
    return buffer;
}
AudioComponentInstance getComponentInstance(OSType type,OSType subType){
    AudioComponentDescription desc = {0};
    desc.componentFlags = 0;
    desc.componentFlagsMask = 0;
    desc.componentManufacturer = kAudioUnitManufacturer_Apple;
    desc.componentSubType =  subType;
    desc.componentType    = type;
    AudioComponent ioComponent = AudioComponentFindNext(NULL, &desc);
    AudioComponentInstance unit;
    AudioComponentInstanceNew(ioComponent, &unit);
    return unit;
}

AudioStreamBasicDescription getMonoFloatASBD(double sampleRate){
    AudioStreamBasicDescription asbd = {0};
    asbd.mSampleRate = sampleRate;
    asbd.mFormatID = kAudioFormatLinearPCM;
    asbd.mFormatFlags = kAudioFormatFlagIsFloat | kAudioFormatFlagIsNonInterleaved | kAudioFormatFlagIsPacked;
    asbd.mFramesPerPacket = 1;
    asbd.mChannelsPerFrame = 1;
    asbd.mBitsPerChannel = 32;
    asbd.mBytesPerPacket = 4;
    asbd.mBytesPerFrame = 4;
    return asbd;
}

void formatAndConnect(AudioComponentInstance src,AudioComponentInstance dst){

    AudioStreamBasicDescription asbd;
    UInt32 propsize = sizeof(AudioStreamBasicDescription);
    AudioUnitGetProperty(dst, kAudioUnitProperty_StreamFormat,kAudioUnitScope_Input,0,&asbd,&propsize);
    AudioUnitSetProperty(src, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Output, 0, &asbd, sizeof(AudioStreamBasicDescription));

    AudioUnitConnection connection = {0};
    connection.destInputNumber = 0;
    connection.sourceAudioUnit = src;
    connection.sourceOutputNumber = 0;
    AudioUnitSetProperty(dst, kAudioUnitProperty_MakeConnection, kAudioUnitScope_Input, 0, &connection, sizeof(AudioUnitConnection));
}
@end

Clipping Sound with Opus on Android, Sent from iOS