Rec iOS Conversations. Where to Start

Rec iOS conversations. Where to start?

"Audio Recorder" is indeed a very simple tweak. The author tried to obfuscate important parts of his tweak (which function is being hooked), but here is what I found.

Tweak basically hooks just one function - AudioConverterConvertComplexBuffer from AudioToolbox.framework. Tweak is loaded in mediaserverd daemon at startup.

First, we need to find out when we should start recording because AudioConverterConvertComplexBuffer is called even when you just playing regular audio files. To achieve that tweak is listening to kCTCallStatusChangeNotification notification from CTTelephonyCenter.

Second, AudioConverterConvertComplexBuffer implementation. I didn't finished it yet so I will post what I have so far. Here is somewhat working example that will get you started.

Helper class to keep track of AudioConverterRef - ExtAudioFileRef pairs

@interface ConverterFile : NSObject

@property (nonatomic, assign) AudioConverterRef converter;
@property (nonatomic, assign) ExtAudioFileRef file;
@property (nonatomic, assign) BOOL failedToOpenFile;

@end

@implementation ConverterFile
@end

ConverterFile objects container

NSMutableArray* callConvertersFiles = [[NSMutableArray alloc] init];

AudioConverterConvertComplexBuffer original implementation

OSStatus(*AudioConverterConvertComplexBuffer_orig)(AudioConverterRef, UInt32, const AudioBufferList*, AudioBufferList*);

AudioConverterConvertComplexBuffer hook declaration

OSStatus AudioConverterConvertComplexBuffer_hook(AudioConverterRef inAudioConverter, UInt32 inNumberPCMFrames, const AudioBufferList *inInputData, AudioBufferList *outOutputData);

Hooking

MSHookFunction(AudioConverterConvertComplexBuffer, AudioConverterConvertComplexBuffer_hook, &AudioConverterConvertComplexBuffer_orig);

AudioConverterConvertComplexBuffer hook definition

OSStatus AudioConverterConvertComplexBuffer_hook(AudioConverterRef inAudioConverter, UInt32 inNumberPCMFrames, const AudioBufferList *inInputData, AudioBufferList *outOutputData)
{
    //Searching for existing AudioConverterRef-ExtAudioFileRef pair
    __block ConverterFile* cf = nil;
    [callConvertersFiles enumerateObjectsUsingBlock:^(ConverterFile* obj, NSUInteger idx, BOOL *stop){
        if (obj.converter == inAudioConverter)
        {
            cf = obj;
            *stop = YES;
        }
    }];

    //Inserting new AudioConverterRef
    if (!cf)
    {
        cf = [[[ConverterFile alloc] init] autorelease];
        cf.converter = inAudioConverter;
        [callConvertersFiles addObject:cf];
    }

    //Opening new audio file
    if (!cf.file && !cf.failedToOpenFile)
    {
        //Obtaining input audio format
        AudioStreamBasicDescription desc;
        UInt32 descSize = sizeof(desc);
        AudioConverterGetProperty(cf.converter, kAudioConverterCurrentInputStreamDescription, &descSize, &desc);

        //Opening audio file
        CFURLRef url = CFURLCreateWithFileSystemPath(NULL, (CFStringRef)[NSString stringWithFormat:@"/var/mobile/Media/DCIM/Call%u.caf", [callConvertersFiles indexOfObject:cf]], kCFURLPOSIXPathStyle, false);
        ExtAudioFileRef audioFile = NULL;
        OSStatus result = ExtAudioFileCreateWithURL(url, kAudioFileCAFType, &desc, NULL, kAudioFileFlags_EraseFile, &audioFile);
        if (result != 0)
        {
            cf.failedToOpenFile = YES;
            cf.file = NULL;
        }
        else
        {
            cf.failedToOpenFile = NO;
            cf.file = audioFile;

            //Writing audio format
            ExtAudioFileSetProperty(cf.file, kExtAudioFileProperty_ClientDataFormat, sizeof(desc), &desc);
        }
        CFRelease(url);
    }

    //Writing audio buffer
    if (cf.file)
    {
        ExtAudioFileWrite(cf.file, inNumberPCMFrames, inInputData);
    }

    return AudioConverterConvertComplexBuffer_orig(inAudioConverter, inNumberPCMFrames, inInputData, outOutputData);
}

This is roughly how it's done in the tweak. But why it's done like that? When phone call is in progress AudioConverterConvertComplexBuffer_hook will be called continuously. But inAudioConverter argument will be different. I've found that there can be more than nine different inAudioConverter objects passed to our hook during one phone call. They will have different audio formats so we can't write everything in one file. This is why we building array of AudioConverterRef-ExtAudioFileRef pairs - to keep track of what is being saved to where. This code will create as many file as there is AudioConverterRef objects. All files will containt different audio - one or two will be the speaker sound. Others - microphone. I've tested this code on iPhone 4S with iOS 6.1 and it works. Unfortunately, call recording on 4S can be done only when speaker is turned on. There is no such limitation on iPhone 5. This is mentioned in tweak's description.

Only thing left is to find out how we can find just two specific inAudioConverter objects - one for speaker audio and one for microphone. Everything else is not a problem.

And one last thing - mediaserverd process is sandboxed so as our tweak. We can't save files anywhere we want. This is why I chose that file path - it can be written even from the inside of the sandbox.

PS Even though I've posted this code credit has to go to Elias Limneos. He's done it.

Record the conversation of phone - ios

There is no public API for recording the calls made (or received) by the built-in Phone app.

You will have to implement your own phone calling mechanism. You'll probably want to use VoIP. (That is what Google Voice uses, for example.) You'll need to run your own server on the Internet, or contract with an existing VoIP service. You'll want to use in-app purchase to let the user buy minutes, because it costs money to run your own server or use a third-party service.

How can I record a conversation / phone call on iOS?

Here you go. Complete working example. Tweak should be loaded in mediaserverd daemon. It will record every phone call in /var/mobile/Media/DCIM/result.m4a. Audio file has two channels. Left is microphone, right is speaker. On iPhone 4S call is recorded only when the speaker is turned on. On iPhone 5, 5C and 5S call is recorded either way. There might be small hiccups when switching to/from speaker but recording will continue.

#import <AudioToolbox/AudioToolbox.h>
#import <libkern/OSAtomic.h>

//CoreTelephony.framework
extern "C" CFStringRef const kCTCallStatusChangeNotification;
extern "C" CFStringRef const kCTCallStatus;
extern "C" id CTTelephonyCenterGetDefault();
extern "C" void CTTelephonyCenterAddObserver(id ct, void* observer, CFNotificationCallback callBack, CFStringRef name, void *object, CFNotificationSuspensionBehavior sb);
extern "C" int CTGetCurrentCallCount();
enum
{
    kCTCallStatusActive = 1,
    kCTCallStatusHeld = 2,
    kCTCallStatusOutgoing = 3,
    kCTCallStatusIncoming = 4,
    kCTCallStatusHanged = 5
};

NSString* kMicFilePath = @"/var/mobile/Media/DCIM/mic.caf";
NSString* kSpeakerFilePath = @"/var/mobile/Media/DCIM/speaker.caf";
NSString* kResultFilePath = @"/var/mobile/Media/DCIM/result.m4a";

OSSpinLock phoneCallIsActiveLock = 0;
OSSpinLock speakerLock = 0;
OSSpinLock micLock = 0;

ExtAudioFileRef micFile = NULL;
ExtAudioFileRef speakerFile = NULL;

BOOL phoneCallIsActive = NO;

void Convert()
{
    //File URLs
    CFURLRef micUrl = CFURLCreateWithFileSystemPath(NULL, (CFStringRef)kMicFilePath, kCFURLPOSIXPathStyle, false);
    CFURLRef speakerUrl = CFURLCreateWithFileSystemPath(NULL, (CFStringRef)kSpeakerFilePath, kCFURLPOSIXPathStyle, false);
    CFURLRef mixUrl = CFURLCreateWithFileSystemPath(NULL, (CFStringRef)kResultFilePath, kCFURLPOSIXPathStyle, false);

    ExtAudioFileRef micFile = NULL;
    ExtAudioFileRef speakerFile = NULL;
    ExtAudioFileRef mixFile = NULL;

    //Opening input files (speaker and mic)
    ExtAudioFileOpenURL(micUrl, &micFile);
    ExtAudioFileOpenURL(speakerUrl, &speakerFile);

    //Reading input file audio format (mono LPCM)
    AudioStreamBasicDescription inputFormat, outputFormat;
    UInt32 descSize = sizeof(inputFormat);
    ExtAudioFileGetProperty(micFile, kExtAudioFileProperty_FileDataFormat, &descSize, &inputFormat);
    int sampleSize = inputFormat.mBytesPerFrame;

    //Filling input stream format for output file (stereo LPCM)
    FillOutASBDForLPCM(inputFormat, inputFormat.mSampleRate, 2, inputFormat.mBitsPerChannel, inputFormat.mBitsPerChannel, true, false, false);

    //Filling output file audio format (AAC)
    memset(&outputFormat, 0, sizeof(outputFormat));
    outputFormat.mFormatID = kAudioFormatMPEG4AAC;
    outputFormat.mSampleRate = 8000;
    outputFormat.mFormatFlags = kMPEG4Object_AAC_Main;
    outputFormat.mChannelsPerFrame = 2;

    //Opening output file
    ExtAudioFileCreateWithURL(mixUrl, kAudioFileM4AType, &outputFormat, NULL, kAudioFileFlags_EraseFile, &mixFile);
    ExtAudioFileSetProperty(mixFile, kExtAudioFileProperty_ClientDataFormat, sizeof(inputFormat), &inputFormat);

    //Freeing URLs
    CFRelease(micUrl);
    CFRelease(speakerUrl);
    CFRelease(mixUrl);

    //Setting up audio buffers
    int bufferSizeInSamples = 64 * 1024;

    AudioBufferList micBuffer;
    micBuffer.mNumberBuffers = 1;
    micBuffer.mBuffers[0].mNumberChannels = 1;
    micBuffer.mBuffers[0].mDataByteSize = sampleSize * bufferSizeInSamples;
    micBuffer.mBuffers[0].mData = malloc(micBuffer.mBuffers[0].mDataByteSize);

    AudioBufferList speakerBuffer;
    speakerBuffer.mNumberBuffers = 1;
    speakerBuffer.mBuffers[0].mNumberChannels = 1;
    speakerBuffer.mBuffers[0].mDataByteSize = sampleSize * bufferSizeInSamples;
    speakerBuffer.mBuffers[0].mData = malloc(speakerBuffer.mBuffers[0].mDataByteSize);

    AudioBufferList mixBuffer;
    mixBuffer.mNumberBuffers = 1;
    mixBuffer.mBuffers[0].mNumberChannels = 2;
    mixBuffer.mBuffers[0].mDataByteSize = sampleSize * bufferSizeInSamples * 2;
    mixBuffer.mBuffers[0].mData = malloc(mixBuffer.mBuffers[0].mDataByteSize);

    //Converting
    while (true)
    {
        //Reading data from input files
        UInt32 framesToRead = bufferSizeInSamples;
        ExtAudioFileRead(micFile, &framesToRead, &micBuffer);
        ExtAudioFileRead(speakerFile, &framesToRead, &speakerBuffer);
        if (framesToRead == 0)
        {
            break;
        }

        //Building interleaved stereo buffer - left channel is mic, right - speaker
        for (int i = 0; i < framesToRead; i++)
        {
            memcpy((char*)mixBuffer.mBuffers[0].mData + i * sampleSize * 2, (char*)micBuffer.mBuffers[0].mData + i * sampleSize, sampleSize);
            memcpy((char*)mixBuffer.mBuffers[0].mData + i * sampleSize * 2 + sampleSize, (char*)speakerBuffer.mBuffers[0].mData + i * sampleSize, sampleSize);
        }

        //Writing to output file - LPCM will be converted to AAC
        ExtAudioFileWrite(mixFile, framesToRead, &mixBuffer);
    }

    //Closing files
    ExtAudioFileDispose(micFile);
    ExtAudioFileDispose(speakerFile);
    ExtAudioFileDispose(mixFile);

    //Freeing audio buffers
    free(micBuffer.mBuffers[0].mData);
    free(speakerBuffer.mBuffers[0].mData);
    free(mixBuffer.mBuffers[0].mData);
}

void Cleanup()
{
    [[NSFileManager defaultManager] removeItemAtPath:kMicFilePath error:NULL];
    [[NSFileManager defaultManager] removeItemAtPath:kSpeakerFilePath error:NULL];
}

void CoreTelephonyNotificationCallback(CFNotificationCenterRef center, void *observer, CFStringRef name, const void *object, CFDictionaryRef userInfo)
{
    NSDictionary* data = (NSDictionary*)userInfo;

    if ([(NSString*)name isEqualToString:(NSString*)kCTCallStatusChangeNotification])
    {
        int currentCallStatus = [data[(NSString*)kCTCallStatus] integerValue];

        if (currentCallStatus == kCTCallStatusActive)
        {
            OSSpinLockLock(&phoneCallIsActiveLock);
            phoneCallIsActive = YES;
            OSSpinLockUnlock(&phoneCallIsActiveLock);
        }
        else if (currentCallStatus == kCTCallStatusHanged)
        {
            if (CTGetCurrentCallCount() > 0)
            {
                return;
            }

            OSSpinLockLock(&phoneCallIsActiveLock);
            phoneCallIsActive = NO;
            OSSpinLockUnlock(&phoneCallIsActiveLock);

            //Closing mic file
            OSSpinLockLock(&micLock);
            if (micFile != NULL)
            {
                ExtAudioFileDispose(micFile);
            }
            micFile = NULL;
            OSSpinLockUnlock(&micLock);

            //Closing speaker file
            OSSpinLockLock(&speakerLock);
            if (speakerFile != NULL)
            {
                ExtAudioFileDispose(speakerFile);
            }
            speakerFile = NULL;
            OSSpinLockUnlock(&speakerLock);

            Convert();
            Cleanup();
        }
    }
}

OSStatus(*AudioUnitProcess_orig)(AudioUnit unit, AudioUnitRenderActionFlags *ioActionFlags, const AudioTimeStamp *inTimeStamp, UInt32 inNumberFrames, AudioBufferList *ioData);
OSStatus AudioUnitProcess_hook(AudioUnit unit, AudioUnitRenderActionFlags *ioActionFlags, const AudioTimeStamp *inTimeStamp, UInt32 inNumberFrames, AudioBufferList *ioData)
{
    OSSpinLockLock(&phoneCallIsActiveLock);
    if (phoneCallIsActive == NO)
    {
        OSSpinLockUnlock(&phoneCallIsActiveLock);
        return AudioUnitProcess_orig(unit, ioActionFlags, inTimeStamp, inNumberFrames, ioData);
    }
    OSSpinLockUnlock(&phoneCallIsActiveLock);

    ExtAudioFileRef* currentFile = NULL;
    OSSpinLock* currentLock = NULL;

    AudioComponentDescription unitDescription = {0};
    AudioComponentGetDescription(AudioComponentInstanceGetComponent(unit), &unitDescription);
    //'agcc', 'mbdp' - iPhone 4S, iPhone 5
    //'agc2', 'vrq2' - iPhone 5C, iPhone 5S
    if (unitDescription.componentSubType == 'agcc' || unitDescription.componentSubType == 'agc2')
    {
        currentFile = &micFile;
        currentLock = &micLock;
    }
    else if (unitDescription.componentSubType == 'mbdp' || unitDescription.componentSubType == 'vrq2')
    {
        currentFile = &speakerFile;
        currentLock = &speakerLock;
    }

    if (currentFile != NULL)
    {
        OSSpinLockLock(currentLock);

        //Opening file
        if (*currentFile == NULL)
        {
            //Obtaining input audio format
            AudioStreamBasicDescription desc;
            UInt32 descSize = sizeof(desc);
            AudioUnitGetProperty(unit, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Input, 0, &desc, &descSize);

            //Opening audio file
            CFURLRef url = CFURLCreateWithFileSystemPath(NULL, (CFStringRef)((currentFile == &micFile) ? kMicFilePath : kSpeakerFilePath), kCFURLPOSIXPathStyle, false);
            ExtAudioFileRef audioFile = NULL;
            OSStatus result = ExtAudioFileCreateWithURL(url, kAudioFileCAFType, &desc, NULL, kAudioFileFlags_EraseFile, &audioFile);
            if (result != 0)
            {
                *currentFile = NULL;
            }
            else
            {
                *currentFile = audioFile;

                //Writing audio format
                ExtAudioFileSetProperty(*currentFile, kExtAudioFileProperty_ClientDataFormat, sizeof(desc), &desc);
            }
            CFRelease(url);
        }
        else
        {
            //Writing audio buffer
            ExtAudioFileWrite(*currentFile, inNumberFrames, ioData);
        }

        OSSpinLockUnlock(currentLock);
    }

    return AudioUnitProcess_orig(unit, ioActionFlags, inTimeStamp, inNumberFrames, ioData);
}

__attribute__((constructor))
static void initialize()
{
    CTTelephonyCenterAddObserver(CTTelephonyCenterGetDefault(), NULL, CoreTelephonyNotificationCallback, NULL, NULL, CFNotificationSuspensionBehaviorHold);

    MSHookFunction(AudioUnitProcess, AudioUnitProcess_hook, &AudioUnitProcess_orig);
}

A few words about what's going on. AudioUnitProcess function is used for processing audio streams in order to apply some effects, mix, convert etc. We are hooking AudioUnitProcess in order to access phone call's audio streams. While phone call is active these streams are being processed in various ways.

We are listening for CoreTelephony notifications in order to get phone call status changes. When we receive audio samples we need to determine where they come from - microphone or speaker. This is done using componentSubType field in AudioComponentDescription structure. Now, you might think, why don't we store AudioUnit objects so that we don't need to check componentSubType every time. I did that but it will break everything when you switch speaker on/off on iPhone 5 because AudioUnit objects will change, they are recreated. So, now we open audio files (one for microphone and one for speaker) and write samples in them, simple as that. When phone call ends we will receive appropriate CoreTelephony notification and close the files. We have two separate files with audio from microphone and speaker that we need to merge. This is what void Convert() is for. It's pretty simple if you know the API. I don't think I need to explain it, comments are enough.

About locks. There are many threads in mediaserverd. Audio processing and CoreTelephony notifications are on different threads so we need some kind synchronization. I chose spin locks because they are fast and because the chance of lock contention is small in our case. On iPhone 4S and even iPhone 5 all the work in AudioUnitProcess should be done as fast as possible otherwise you will hear hiccups from device speaker which obviously not good.

How can I access phone call audio stream while calling on iOS 11?

Apple does not currently allow it: it is not possible to spy on conversations on a non-jailbroken iPhone, even with potential user consent.

iOS record phone calls audio on a jailbroken device

Voice memo recorder only picks up audio from the microphone - therefore unless you are using speakerphone, it won't record the other person.

How to record call in iOS?

All active Audio sessions are put on halt when the call is active on iOS platform. It is not supported by Apple by design due to security & performance reasons. In short it's not possible to achieve what you mentioned without Jailbreak.

Rec iOS Conversations. Where to Start