How to Get Reliable Timing for My Audio App

Accurate (Music-grade) Timing of Audio Playback

NSTimer is accurate enough for most purposes as long as you don't dilly-dally on whatever thread the timer is scheduled on long enough for the timer to be late. You need to return to the run loop in time for its next fire date. Fortunately, NSSound plays asynchronously, so this shouldn't be a problem.

What usually causes problems with NSTimers is when people set the interval really low (one question I saw had it at 1 centisecond). If what you do takes longer than that interval (and taking longer than 1 centisecond is really easy), then you will return to the run loop after the timer was supposed to fire, which will make the timer late, which will screw up your timing.

You just need to make your timer method implementation as fast as possible, and if you can't make it fast enough, make an NSOperation subclass to do the job and have your timer method just instantiate the operation, set it up, and add it to an operation queue. (The operation will run on another thread, so it won't tie up the thread whose run loop you scheduled your timer on.) If the timer may be really frequent, then this is one case where micro-optimization (guided by Instruments and Shark, of course) may be warranted.

How do I achieve very accurate timing in Swift?

For acceptable musically accurate rhythms, the only suitable timing source is using Core Audio or AVFoundation.

How do I get most accurate audio frequency data possible from real time FFT on android?

if there is a third option I'm overlooking

Yes: doing both at the same time, a reduction of the FFT size as well as a larger step size. In a comment you pointed out that you want to detect "sniffling/chewing with mouth". So, what you want to do is similar to the typical task of speech recognition. There, you typically extract a feature vector in steps of 10ms (meaning with Fs=44.1kHz every 441 samples) and the signal window to transform is roughly about double the size of the step size, so 20ms which yields to a 2^X FFT size of 1024 samples (make sure that you choose an FFT size which is a power of 2, because it is faster).

Any increase in window size or reduction in step size increases the data but mainly adds redundancy.

Additional hints:

  • @SztupY correctly pointed out that you need to "window" your signal prior to the FFT, typically with a Hamming-wondow. (But this is not "filtering". It is just multiplying each sample value with the corresponding window value without accumulating the result).

  • The raw FFT output is hardly suited to recognize "sniffling/chewing with mouth", a classical recognizer consists of HMMs or ANNs which process sequences of MFCCs and their deltas.

Could the performance I'm currently getting just be the best I'm going to get? Or does it seem like I must be something stupid because much faster speeds are possible?

It's close to the best, but you are wasting all the CPU power to estimate highly redundant data, leaving no CPU power to the recognizer.

Is my approach to this at least fundamentally correct or am I barking entirely up the wrong tree?

After considering my answer you might re-think your approach.

Accurate timing in iOS

Ok, I have some answers after doing some more tests, so I am sharing it with anyone who is interested.

I've placed a variable to measure time intervals between ticks, inside the play method (the method that actually sends the play message to the AVAudioPlayer object), and as my simple compare-to-external-watch experiment showed, the 60 BPM was too slow - I got these time intervals (in seconds):

1.004915
1.009982
1.010014
1.010013
1.010028
1.010105
1.010095
1.010105

My conclusion was that some overhead time elapses after each 1-second-interval is counted, and that extra time (about 10msec) is accumulated to a noticeable amount after a few tens of seconds --- quite bad for a metronome. So instead of measuring the interval between calls, I decided to measure the total interval from the first call, so that the error won't be accumulated. In other words I've replaced this condition:

while (continuePlaying && ((currentTime0 + [duration doubleValue]) >= currentTime1)

with this condition:

while (continuePlaying && ((_currentTime0 + _cnt * [duration doubleValue]) >= currentTime1 ))

where now _currentTime0 and _cnt are class members (sorry if it's a c++ jargon, I am quite new to Obj-C), the former holds the time stamp of the first call to the method, and the latter is an int counting number of ticks (==function calls). This resulted in the following measured time intervals:

1.003942
0.999754
0.999959
1.000213
0.999974
0.999451
1.000581
0.999470
1.000370
0.999723
1.000244
1.000222
0.999869

and it is evident even without calculating the average, that these values fluctuate around 1.0 second (and the average is close to 1.0 with at least a millisecond of accuracy).

I will be happy to hear more insights regarding what causes the extra time to elapse - 10msec sounds as eternity for a modern CPU - though I am not familiar with the specs of the iPod CPU (it's iPod 4G, and Wikipedia says the CUP is PowerVR SGX GPU 535 @ 200 MHz)



Related Topics



Leave a reply



Submit