Learning to Work with Audio in C++

Learning to work with audio in C++

It really depends on what kind of audio work you want to do, If you want to implement audio for a game, C++ is sure the right language. There are many libraries around, OpenAL is great, free and multiplatform. I also used DirectSound and Fmod with great sucess. Check them out, it all depends on your needs.

Absolute beginners guide to working with audio in C/C++?

Thanks everyone for the responses! I sort of cobbled them together to successfully make a small utility that converts a AIFF/WAV/etc file to an mp3 file. There seems to be some interest in this question, so here it what I did, step by step:

Step 1:
Download and install the libsndfile library as suggested by James Morris. This library is very easy to use – its only shortcoming is it won't work with mp3 files.

Step 2:
Look inside the 'examples' folder that comes with libsndfile and find generate.c. This gives a nice working example of converting any non-mp3 file to various file formats. It also gives a glimpse of the power behind libsndfile.

Step 3:
Borrowing code from generate.c, I created a c file that just converts an audio file to a .wav file. Here is my code: http://pastie.org/719546

Step 4:
Download and install the LAME encoder. This will install both the libmp3lame library and the lame command-line utility.

Step 5:
Now you can peruse LAME's API or just fork & exec a process to lame to convert your wav file to an mp3 file.

Step 6: Bring out the champagne and caviar!

If there is a better way (I'm sure there is) to do this, please let me know. I personally have never seen a step-by-step roadmap like this so I thought I'd put it out there.

I want to learn audio programming

"Sound programming" is a very broad field. First of all, it is definitely a feasible subject, but since you need to cram stuff into a single semester you will need to limit your scope. I can see that you're looking for a place to start, so here are some ideas to get you thinking.

Since you have mentioned both "how sound works in computer science" and "synthesizers", it's worth pointing out the difference between analogue sound, sampled sound and synthesized sound, as they are different concepts. I'll explain them briefly here.

Analogue sound is sound as we humans typically interpret it -- vibrations of air sensed by the human ear. You can think of sound as a one-dimensional signal, where the independent variable is time and the dependent variable is amplitude of vibration. Analogue sound is continuous both in the time and amplitude domain. Older sound recording methods (e.g. magnetic tape) used an analogue sound representation. Analogue sound is not frequently used with computers (computers aren't good with storing continuous-domain data), but understanding analogue signals is important nevertheless. Expect to see plenty of math (e.g. complex numbers, Fourier transforms) if you go down this path.

Sampled sound is the sound representation that lends itself well to processing with a computer. People are most familiar with sampled sound through CDs and other musical recordings. An analogue signal is sampled at some frequency (e.g. 44.1KHz for CD recording). So a sampled sound signal is discrete in the time domain. If the signal is quantized then it will be discrete in the amplitude domain as well. Formats like MP3 are sampled formats. There's lots of things to study in this field if you're interested, such as restoration (removing static, etc) and compression (again, codecs MP3, Ogg Vorbis). It's a lot of fun because there's lots to experiment with and code.

Both analogue and sampled sound dig deeply into a field called Digital Signal Processing. Google around for that to get a feel of what it's like. It's often taught as a course at universities, so if you're really keen you can have a look at some lecture slides or even try some of the earlier, simpler projects.

Synthesized sound is a representation that is suited for reproduction of a music track, where the instruments playing the track are known beforehand. Think of it as sheet music for the computer. Somebody has to write the sheet music -- you can't just record it like analogue or sampled sound. This makes synthesized sound a completely different representation to analogue sound and sampled sound Also, the computer needs to know what the instruments are (e.g. piano) so that it can play (synthesize) the track. If it doesn't know the instrument, it either gives up or picks a close match (e.g. replaces the piano with electric keyboard). I have never worked with synthesizers before so I can't comment on the learning curve for them.

So, based on what I wrote -- pick a direction that interests you more, Google around and then refine your question.

EDIT

A good book to read is this. You can probably look around related titles in Amazon and find something newer, but it's been a while since I did my audio processing shopping.

And if you have half an hour to spare, then have a look at this video tutorial. It covers sound, image and video processing -- they're actually closely related fields.

Starting with the Core Audio framework

A preview of a book on Core Audio just came out. I've started reading it and as a beginner myself I find it helpful.

It has a tutorial style teaching method and is very clear in its explanations. I highly recommend it.

Where can I find low level Sound Programming Theory Tutorials

Before getting your hands dirty with the very low levels (C/C++) I'd suggest playing around with higher level tools such as Octave (a free Matlab clone). You might need to install the Signal Processing toolkit too. This should give you a good testbed for playing around with FFTs, convolution, filtering and the like, and also lets you graph the results. I'd suggest finding a good book on signal processing to get familiar with the concepts, then if you want to get into DSP algorithms, MusicDSP.org is worth a look.

If you want an existing framework to work with then look at CLAM.

A pixel in graphics programming is analogous to a single sampled point in audio. A digitised image is comprised of a 2d array of pixels; a digitised audio signal is comprised of a sequence of sample points, each point correponding to an amplitude. The rest you'll find in the books...

How do I create/play audio in C++?

The problem here is that C++ is just a programming language. The same can be said for python, though Python lives in a different ecosystem of modules and package management which get conflated (rightly or wrongly) as part of the language

C++ doesn’t have the same history and the same ecosystem and this is part of the battle you will have when learning it. You don’t have pip, you have a nebulous series of frameworks, headers and libraries (some standard, some which need installation) all of which need linked, path-ed or compiled. It is an ecosystem that is unfriendly if you try and approach it like a novice Python programmer. If you approach it agnostically, it is simultaneously very powerful and exceptionally tedious, a combination that tends to polarise developers.

This means that simple answers like Use SFML!, SDL, NSOUND, OpenAL, CoreAudio, AVFoundation, JUCE, &c... are all technically "correct" but massively gloss over large parts of setup, nomenclature and workflow that are just a pip install away with python.

Pontificating aside, if you want to simply

create an array of floating point values
that represent the samples of sine tone
then play those samples
on macOS

Then you are probably best just

creating your array
writing a .wav
opening the .wav with afplay

Is that the most open, versatile, DSP orientated, play-from-RAM solution? No, of course not, but it is a solution to the problem you pose here. The alternative and correct answer is an exhaustive list of every major media library, cross-platform and macOS specific, their setup, quirks and minimum working example, which would result in an answer so obtusely long I hope you can sympathise with why it is not best addressed on Stack Overflow.

A Simple CLI App

You can find all the constituent parts of this on SO, but I have tallied-off so many how do I play a sound in C++ questions it has made me realise they are not going away.

The setup for Xcode is to create a Command Line Tool project (Console App for Visual Studio).

Here is a header that will wrap up everything into a playSound function

audio.h

#pragma once

//------------------------------------------------------------------------------
#include <iostream>
#include <fstream>
#include <cstddef>
#include <cstdlib>
#if defined _WIN32 || defined _WIN64
#pragma comment(lib, "Winmm")
#include <windows.h>
#endif
//------------------------------------------------------------------------------

/// <#Description#>
struct WaveHeader
{
    /** waveFormatHeader: The first 4 bytes of a wav file should be the characters "RIFF" */
    char chunkID[4] = { 'R', 'I', 'F', 'F' };
    /** waveFormatHeader: This is the size of the entire file in bytes minus 8 bytes */
    uint32_t chunkSize;
    /** waveFormatHeader" The should be characters "WAVE" */
    char format[4] = { 'W', 'A', 'V', 'E' };
    /** waveFormatHeader" This should be the letters "fmt ", note the space character */
    char subChunk1ID[4] = { 'f', 'm', 't', ' ' };
    /** waveFormatHeader: For PCM == 16, since audioFormat == uint16_t */
    uint32_t subChunk1Size = 16;
    /** waveFormatHeader: For PCM this is 1, other values indicate compression */
    uint16_t audioFormat = 1;
    /** waveFormatHeader: Mono = 1, Stereo = 2, etc. */
    uint16_t numChannels = 1;
    /** waveFormatHeader: Sample Rate of file */
    uint32_t sampleRate = 44100;
    /** waveFormatHeader: SampleRate * NumChannels * BitsPerSample/8 */
    uint32_t byteRate = 44100 * 2;
    /** waveFormatHeader: The number of bytes for one sample including all channels */
    uint16_t blockAlign = 2;
    /** waveFormatHeader: 8 bits = 8, 16 bits = 16 */
    uint16_t bitsPerSample = 16;
    /** waveFormatHeader: Contains the letters "data" */
    char subChunk2ID[4] = { 'd', 'a', 't', 'a' };
    /** waveFormatHeader: == NumberOfFrames * NumChannels * BitsPerSample/8
     This is the number of bytes in the data.
     */
    uint32_t subChunk2Size;
    
    WaveHeader(uint32_t samplingFrequency = 44100, uint16_t bitDepth = 16, uint16_t numberOfChannels = 1)
    {
        numChannels = numberOfChannels;
        sampleRate = samplingFrequency;
        bitsPerSample = bitDepth;
        
        byteRate = sampleRate * numChannels * bitsPerSample / 8;
        blockAlign = numChannels * bitsPerSample / 8;
    };
    
    /// sets the fields that refer to how large the wave file is
    /// @warning This MUST be set before writing a file, or the file will be unplayable.
    /// @param numberOfFrames total number of audio frames. i.e. total number of samples / number of channels
    void setFileSize(uint32_t numberOfFrames)
    {
        subChunk2Size = numberOfFrames * numChannels * bitsPerSample / 8;
        chunkSize = 36 + subChunk2Size;
    }
    
};

/// write an array of float data to a 16-bit, 44100 Hz Mono wav file in the same directory as the program and then play it
/// @param audio audio samples, assumed to be 44100 Hz sampling rate
/// @param numberOfSamples total number of samples in audio
/// @param filename filename, should end in .wav and will be written to your Desktop
void playSound(float* audio,
               uint32_t numberOfSamples,
               const char* filename)
{
    std::ofstream fs;
    std::string filepath {filename};
    
    if (filepath.substr(filepath.size() - 4, 4) != std::string(".wav"))
        filepath += std::string(".wav");
    
    fs.open(filepath, std::fstream::out | std::ios::binary);
    
    WaveHeader* header = new WaveHeader{};
    header->setFileSize(numberOfSamples);
    
    fs.write((char*)header, sizeof(WaveHeader));
    
    int16_t* audioData = new int16_t[numberOfSamples];
    constexpr float max16BitValue = 32768.0f;
    
    for (int i = 0; i < numberOfSamples; ++i)
    {
        int pcm = int(audio[i] * (max16BitValue));
        
        if (pcm >= max16BitValue)
            pcm = max16BitValue - 1;
        else if (pcm < -max16BitValue)
            pcm = -max16BitValue;
        
        audioData[i] = int16_t(pcm);
    }
    
    
    fs.write((char*)audioData, header->subChunk2Size);
    
    fs.close();
    std::cout << filename << " written to:\n" << filepath << std::endl;
    
    
#if defined _WIN32 || defined _WIN64
    // don't forget to add Add 'Winmm.lib' in Properties > Linker > Input > Additional Dependencies
    PlaySound(std::wstring(filepath.begin(), filepath.end()).c_str(), NULL, SND_FILENAME);
#else
    std::system((std::string("afplay ") + filepath).c_str());
#endif
    
}

main.cpp

Your main function could then be something like:

#include <iostream>
#include <cmath>
#include "audio.h"

int main(int argc, const char * argv[])
{
    const int numSamples = 44100;
    float sampleRate = 44100.0f;
    float* sineWave = new float[numSamples];
    float frequency = 440.0f;
    
    float radsPerSamp = 2.0f * 3.1415926536f * frequency / sampleRate;
    
    for (unsigned long i = 0; i < numSamples; i++)
    {
        sineWave[i] = std::sin (radsPerSamp * (float) i);
    }
    
    playSound(sineWave, numSamples, "test.wav");
        
    return 0;
}

Learning to Work with Audio in C++