How to Generate the Audio Spectrum Using Fft in C++

How to generate the audio spectrum using fft in C++?

There are quite a few similar/related questions on SO already which are well worth reading as the answers contain a lot of useful information and advice, but in essence you need to do this:

  • Convert the audio data to the format required by FFT (e.g. int -> float, with separate L/R channels);
  • Apply suitable window function (e.g. Hann aka Hanning window)
  • Apply FFT (NB: if using typical complex-to-complex FFT then set all imaginary parts in the input array to zero);
  • Calculate the magnitude of the first N/2 FFT output bins (sqrt(re*re + im*im));
  • Optionally convert magnitude to dB (log) scale (20 * log10(magnitude) or 10 * log10(re*re + im*im));
  • Plot N/2 (log) magnitude values.

Note that while FFTW is a very good and very fast FFT it may be a little overwhelming for a beginner - it's also very expensive if you want to include it as part of a commercial product. I recommend starting with KissFFT instead.

How to perform FFT on WAV file data?

You are likely measuring the interleave difference between two stereo channels, which can include high frequencies due to unequal mix and pan. Try again with the channels separated or mixed down to mono, and use a smooth window function to reduce FFT aperture edge artifacts, which can also introduce a small amount of high frequency noise due to your rectangular window.

Calculate values for spectrum analyser

You're probably looking for FFTw.

edit:

To elaborate on your question:

calculating these values can be done by using an FFT — however, I'm not exactly sure how to calculate those, given a buffer of input data: yes, you're right; that's exactly how it's done. You take a (necssarily small, due to the time-frequency uncertainty principle) sample segment out of the currently playing audio data, and feed it to a (typically) discrete, real-only FFT (one of the best known, most widely used and fastest being the DCT family of DFTs - in fact there are highly optimized versions of most DCTs in FFTw). Then you take out the next sample segment and repeat the process.
The output of the FFT will be the freqeuncy decomposition of the audio signal that has been fed in - you then need to decide how to display it (i.e. which function to use on the outputs of the FFT, common candidates being f(x) = x; f(x) = sqrt(x); f(x) = log(x)) and also how to present/animate the following readings (e.g. you could average each band in the temporal direction or you could have the maximums "fall off" slowly).


rage-edit:
Additional links since it appears somebody knows how to downvote but not how to use google:

  • http://en.wikipedia.org/wiki/FFTW
  • http://webcache.googleusercontent.com/search?q=cache:m6ou54tn_soJ:www.fftw.org/+&cd=1&hl=en&ct=clnk&gl=it&client=firefox-a
  • http://web.archive.org/web/20130123131356/http://fftw.org/

Implementing Real Time frequency spectrum for a beginner

There are already many libraries to do FFTs for you. No reason to reinvent the wheel. DirectX has an implementation but it might only be in the most recent version. Here's an open source C library for it.

If you want to understand the math behind it, here's a simple explanation and here's a complicated explanation.

Doing FFT in realtime

If you need amplitude, frequency and time in one graph, then the transform is known as a Time-Frequency decomposition. The most popular one is called the Short Time Fourier Transform. It works as follows:

1. Take a small portion of the signal (say 1 second)

2. Window it with a small window (say 5 ms)

3. Compute the 1D fourier transform of the windowed signal.

4. Move the window by a small amount (2.5 ms)

5. Repeat above steps until end of signal.

6. All of this data is entered into a matrix that is then used to create the kind of 3D representation of the signal that shows its decomposition along frequency, amplitude and time.

The length of the window will decide the resolution you are able to obtain in frequency and time domains. Check here for more details on STFT and search for "Robi Polikar"'s tutorials on wavelet transforms for a layman's introduction to the above.

Edit 1:
You take a windowing function (there are innumerable functions out there - here is a list. Most intuitive is a rectangular window but the most commonly used are the Hamming/Hanning window functions. You can follow the steps below if you have a paper-pencil in hand and draw it along.

Assume that the signal that you have obtained is 1 sec long and is named x[n]. The windowing function is 5 msec long and is named w[n]. Place the window at the start of the signal (so the end of the window coincides with the 5ms point of the signal) and multiply the x[n] and w[n] like so:

y[n] = x[n] * w[n] - point by point multiplication of the signals.

Take an FFT of y[n].

Then you shift the window by a small amount (say 2.5 msec). So now the window stretches from 2.5ms to 7.5 ms of the signal x[n]. Repeat the multiplication and FFT generation steps. In other words, you have an overlap of 2.5 msec. You will see that changing the length of the window and the overlap gives you different resolutions on the time and Frequency axis.

Once you do this, you need to feed all the data into a matrix and then have it displayed. The overlap is for minimising the errors that might arise at boundaries and also to get more consistent measurements over such short time frames.

P.S: If you had understood STFT and other time-frequency decompositions of a signal, then you should have had no problems with steps 2 and 4. That you have not understood the above mentioned steps makes me feel like you should revisit time-frequency decompositions also.

Result from audio FFT function makes it near impossible to inspect low/mid frequencies

What you are seeing is indeed the expected outcome of an FFT (Fourier Transform). The logarithmic f-axis that you're expecting is achieved by the Constant-Q transform.

Now, the implementation of the Constant-Q transform is non-trivial. The Fourier Transform has become popular precisely because there is a fast implementation (the FFT). In practice, the constant-Q transform is often implemented by using an FFT, and combining multiple high-frequency bins. This discards resolution in the higher bins; it doesn't give you more resolution in the lower bins.

To get more frequency resolution in the lower bins of the FFT, just use a longer window. But if you also want to keep the time resolution, you'll have to use a hop size that's smaller than the window size. In other words, your FFT windows will overlap.

How to syncronize audio with the power spectrum and choose frame length N (to do fft)?

Start by deciding how often you want the visualizer to update. Let's say we want it to update 25 times per second (similar to TV or movie frame rates). That means every 1 / 25 seconds, or every 40 ms. At a sample rate of 44.1 kHz this translates to 44100 / 25 = 1764 samples. Since we typically want a power of 2 FFT size then let's go for N = 2048.

This gives a resolution in the frequency axis of 44100 / 2048 = 21.5 Hz. If you want higher resolution then you can overlap successive FFT windows, e.g. keeping the same update rate and overlapping by 50% then you can have N = 4096 for a resolution of 10.75 Hz.

Signal Processing/FFT On Live Input Audio

To do this you have a couple of options depending on your most preferred language/framework basically. I'm not sure how new you are to signal processing so I'll suggest a few options.

Visual Programming

These are all visual programming environments which don't actually require any writing of code, however Simulink and Pure Data both require a runtime for the user to run the program.

Simulink (Paid)

MathWorks/Matlab's visual programming tool that works really well in real-time (in my opinion). Using the Audio System Toolbox, you can easily capture microphone input from your system in realtime and carry out the FFT processing, plot the spectrum and, like you said, if certain FFT conditions are met then to carry out some further processing.

This isn't free software and requires having the Matlab/Simulink run-time installed to be used. You can also script your processing in Matlab's .m language as desired (a cross between Java, JS and C).

Max MSP (Paid)

A similar version to Simulink but developed as a standalone visual programming tool. This will allow you similar freedom to Simulink but I think it will be easier for re-distribution.

You can compile MAX MSP into executables to give to someone straight away. Here is a reference to get you started on using the FFT in MAX. Again, this isn't free but if you wanted to learn more about it then I think it's worth the money (if I recall it's not too expensive).

If you need some more custom processing than the built in modules, I believe you can design custom MAX modules using C or JavaScript. Max is designed to easily get system audio input / outputs and here's a link to get you started.

Bonus: You can design your own Ableton Live plugins with the Max4Live addon which just lets your MAX MSP projects get compiled into .VST format. So you can build custom FX if you are into music production.

Pure Data (PD) (Free)

A very bland open source version of MAX MSP but completely free. It may look dull at first but a lot of researches I know use it to build fairly complex systems that can do some serious data processing. There are also lots of community built extras for PD if you ever needed a custom module. Here is a link to get you started on the FFT in PD. You cannot compile applications with PD, but since it's completely free to install anyone can run your program after installing PD. Another link for troubleshooting audio I/O in PD (if it isn't working right out of the box).

Programming Languages

Now the visual stuff is a really good way to get started if you aren't already introduced to DSP or audio programming. Otherwise here a just a few options and links to get started and where I would recommend.

Matlab & Octave

Like before, the Audio Systems Toolbox supports realtime audio I/O within a Matlab script. This combined with Matlab's built in FFT function can have you setup programming realtime FFTs and plotting the response in no time at all (less than 10 lines of code or something).

Octave has it's own version of the FFT function and different backends for rendering plot responses, but no Audio Systems Toolbox. However, Playrec is also an opensource alternative for audio I/O in Matlab/Octave that supports realtime audio input and output.

(Octave is an open source equivalent to Matlab (Matlab needs a paid license to develop a program), but does not support all Matlab supported features).

Python

Due to the PyAudio module, realtime audio I/O and DSP is more possible using Python! I would recommend Python if you are just starting out for sure since it's a nice introduction into any programming language and can help with teaching the fundamentals of DSP before attempting lower level languages.

Here's where you can get started with real-time non-blocking audio I/O in Python with PyAudio. To plot your data you can use a library such as matplotlib (designed similar to Matlab's easy plotting functionality).

For your FFT there are multiple libraries out there but I'd start with the Scipy / Numpy one.

C

One of the classic (and sometimes) most daunting programming language. With no objects (unless you want to make them yourself) or other high level abstractions, C is one of the few languages that still feels like you're building a lot from the ground up (which personally I like).

To get started with audio, I'd look at, in my opinion, the most widely used cross-platform audio I/O library; portaudio. This will let you access the soundcard data inputs and outputs in realtime on Mac, Linux & Windows.

Once you get this up and running an FFT I would use to get started would be the KissFFT just because of it's pure simplicity to use. If you want to plot the data, I would maybe look at gnuplot, but this is'nt a very pretty route in terms of development.

If you are very new to programming I would not recommend this unless you really want to get stuck in.

C++

Both KissFFT and portaudio will also compile with C++ code, but here are a couple of higher level alternatives.

One of my favourites is the JUCE framework / development environment. It has built in cross-platform audio I/O and already has a custom FFT function as part of the framework. You can build custom VSTs for your music DAW if you want to as well. It also comes with 'easy' (if you know C++) access to graphics windows with higher level access to openGL, so you could get fancy when plotting your data in real-time. If i remember correctly, one of the Demo projects on first installation is a real-time FFT plot you can compile and see the input from your laptop mic. JUCE is free for personal use, but comes with a small license fee as an indie developer.

Otherwise another one that comes to mind is the QT C++ library/framework for UI design (mainly). This is a cross platform easy to use GUI designer, that also has high level classes for obtaining audio input from a Mac/Win/Linux mic. Here is just one example I came across using QT's multimedia classes and FFTReal to plot a realtime FFT spectrum.

Summary

I've suggested a lot of options, but also missed out a few other people may recommend like languages such as R, C#, Java, Rust etc... and there are so many suggestions it's impossible to cover them all but I think this should be enough to get started. If it were me in terms of experience:

  • Complete Beginner to Programming: Max MSP
  • Novice / Knows their way around a little bit: Python (with PyAudio)
  • Programmed in other languages maybe looking to gain more programming skills : C++ with JUCE

Any of these languages you pick will be good in future reference for software positions and many companies / researchers use them to prototype / develop realtime audio processing software.

This is just my opinion but hopefully this gets you well along your way!



Related Topics



Leave a reply



Submit