Get a Spectrum of Frequencies from Wav/Riff Using Linux Command Line

How do I plot the spectrum of a wav file using FFT?

The signature is

public static void  FFT( float[] data, int length, FourierDirection direction )

You pass an array of complex numbers, represented as pairs. Since you only have real numbers (the samples), you should put your samples in the even locations in the array - data[0], data[2], data[4] and so on. Odd locations should be 0, data[1] = data[3] = 0...
The length is the amount of samples you want to calculate your FFT on, it should be exactly half of the length of the data array. You can FFT your entire WAV or parts of it - depends on what you wish to see. Audacity will plot the power spectrum of the selected part of the file, if you wish to do the same, pass the entire WAV or the selected parts.
FFT will only show you frequencies up to half of your sampling rate. So you should have values between 0 and half your sampling rate. The amount of values depends on the amount of samples you have (the amount of samples will affect the precision of the calculation)
Audacity plots the power spectrum. You should take each complex number pair in the array you receive and calculate its ABS. ABS is defined as sqrt(r^2+i^2). Each ABS value will correspond to a single frequency.

Here's an example of a working code:

float[] data = new float[8];
data[0] = 1; data[2] = 1; data[4] = 1; data[6] = 1;
Fourier.FFT(data, data.Length/2, FourierDirection.Forward);

I'm giving it 4 samples, all the same. So I expect to get something only at frequency 0. And indeed, after running it, I get

data[0] == 1, data[2] == 1, data[4] == 1, data[6] == 1

And others are 0.

If I want to use the Complex array overload

Complex[] data2 = new Complex[4];
data2[0] = new Complex(1,0);
data2[1] = new Complex(1, 0);
data2[2] = new Complex(1, 0);
data2[3] = new Complex(1, 0);
Fourier.FFT(data2,data2.Length,FourierDirection.Forward);

Please note that here the second parameter equals the length of the array, since each array member is a complex number. I get the same result as before.

I think I missed the complex overload before. I seems less error prone and more natural to use, unless your data already comes in pairs.

sox convert to spectogram parameters meaning

The official sox manual describes the parameters in full and the source code is here spectrogram.c.

But briefly:

−X num:

X-axis pixels/second; the default is auto-calculated to fit the given
or known audio duration to the X-axis size, or 100 otherwise. If given
in conjunction with −d, this option affects the width of the
spectrogram; otherwise, it affects the duration of the spectrogram.
num can be from 1 (low time resolution) to 5000 (high time resolution)
and need not be an integer.

and

-Y num:

Sets the target total height of the spectrogram(s). The default value is 550
pixels. Using this option (and by default), SoX will
choose a height for individual spectrogram channels that is one more
than a power of two, so the actual total height may fall short of the
given number.

For -X 50, the horizontal time resolution is:

dt = 1000/50 = 20 ms/pixel

For -Y 200 the largest power of 2 less than 200 is 128. Assuming a sampling rate of 44.1 kHz, the frequency resolution is:

bin_size = 44100/128 = 344.5 Hz

Any way I can get SoX to just print the amplitude values from a wav file?

If you want the data specifically for use in C++, it's very easy to use something like Libsndfile. It's a pretty mature C library, but comes with a convenient C++ wrapper (sndfile.hh).

Here's example usage lifted from something I wrote recently where I needed easy access to audio data.

std::string infile_name = "/path/to/vocal2.wav";

// Open input file.
SndfileHandle infile_handle( infile_name );
if( !infile_handle || infile_handle.error() != 0 )
{
    std::cerr << "Unable to read " << infile_name << std::endl;
    std::cerr << infile_handle.strError() << std::endl;
    return 1;
}

// Show file stats
int64_t in_frames = infile_handle.frames();
int in_channels = infile_handle.channels();
int in_samplerate = infile_handle.samplerate();
std::cerr << "Input file: " << infile_name << std::endl;
std::cerr << " * Frames      : " << std::setw(6) << in_frames << std::endl;
std::cerr << " * Channels    : " << std::setw(6) << in_channels << std::endl;
std::cerr << " * Sample Rate : " << std::setw(6) << in_samplerate << std::endl;

// Read audio data as float
std::vector<float> in_data( in_frames * in_channels );
infile_handle.read( in_data.data(), in_data.size() );

If you just want to use SoX on the command line and get text output, you can do something like this:

sox vocal2.wav -t f32 - | od -ve -An | more

Here I've specified an output of raw 32-bit float, and run it through GNU od. It's a little frustrating that you can't tell od how many columns you want, but you can clean that up with other simple tools. Have a look at the manpage for od if you want different sample encodings.

How do I get an audio file sample rate using sox?

just use:

soxi <filename>

sox --i <filename>

to produce output such as:

Input File     : 'final.flac'
Channels       : 4
Sample Rate    : 44100
Precision      : 16-bit
Duration       : 00:00:11.48 = 506179 samples = 860.849 CDDA sectors
File Size      : 2.44M
Bit Rate       : 1.70M
Sample Encoding: 16-bit FLAC
Comment        : 'Comment=Processed by SoX'

The latter one is in case you're using the win32 version that doesn't include soxi, by default. To grab the sample rate only, just use:

soxi -r <filename>

sox --i -r <filename>

which will return the sample rate alone.

Extract Fast Fourier Transform data from file

Here's the final solution to what I was trying to achieve, thanks a lot to Randall Cook's helpful advice. The code to extract sound wave and FFT of a wav file in Ruby:

require "ruby-audio"
require "fftw3"

fname = ARGV[0]
window_size = 1024
wave = Array.new
fft = Array.new(window_size/2,[])

begin
    buf = RubyAudio::Buffer.float(window_size)
    RubyAudio::Sound.open(fname) do |snd|
        while snd.read(buf) != 0
            wave.concat(buf.to_a)
            na = NArray.to_na(buf.to_a)
            fft_slice = FFTW3.fft(na).to_a[0, window_size/2]
            j=0
            fft_slice.each { |x| fft[j] << x; j+=1 }
        end
    end

rescue => err
    log.error "error reading audio file: " + err
    exit
end

# now I can work on analyzing the "fft" and "wave" arrays...