How to Create a Waveform Image of an Mp3 in Linux

Are there any libraries that can create a waveform image out of a audio file (mp3)?

I found this library, and it works well.

Generate visual (waveform) from MP3/WAV file in Windows 2008 Server?

Sox, "the Swiss Army knife of audio manipulation", can generate accurate PNG spectrograms from sound files. It plays pretty much anything, and binaries are available for Windows. At the most basic level, you'd use something like this:

sox my.wav -n spectrogram

If you want a spectrogram with no axes, titles, legends, and a light background that's 100px high:

sox "Me, London.mp3" -n spectrogram -Y 130 -l -r -o "Me, London.png"

Sox accepts a lot of options if you only want to analyze a single channel for example. If you need your visuals to be even cooler, you could post-process the resulting PNG.

Here is a short overview from the commandline about all available parameters, the manpage has more details:

-x num  X-axis size in pixels; default derived or 800
-X num X-axis pixels/second; default derived or 100
-y num Y-axis size in pixels (per channel); slow if not 1 + 2^n
-Y num Y-height total (i.e. not per channel); default 550
-z num Z-axis range in dB; default 120
-Z num Z-axis maximum in dBFS; default 0
-q num Z-axis quantisation (0 - 249); default 249
-w name Window: Hann (default), Hamming, Bartlett, Rectangular, Kaiser
-W num Window adjust parameter (-10 - 10); applies only to Kaiser
-s Slack overlap of windows
-a Suppress axis lines
-r Raw spectrogram; no axes or legends
-l Light background
-m Monochrome
-h High colour
-p num Permute colours (1 - 6); default 1
-A Alternative, inferior, fixed colour-set (for compatibility only)
-t text Title text
-c text Comment text
-o text Output file name; default `spectrogram.png'
-d time Audio duration to fit to X-axis; e.g. 1:00, 48
-S time Start the spectrogram at the given time through the input

Generating a waveform using ffmpeg

Default waveform

Default waveform

ffmpeg -i input.wav -filter_complex showwavespic -frames:v 1 output.png


  • Notice the segment of silent audio in the middle (see "Fancy waveform" below if you want to see how to add a line).

  • The background is transparent.

  • Default colors are red (left channel) and green (right channel) for a stereo input. The color is mixed where the channels overlap.

  • You can change the channel colors with the colors option, such as "showwavespic=colors=blue|yellow". See a list of valid color names or use hexadecimal notation, such as #ffcc99.

  • See the showwavespic filter documentation for additional options.

  • If you want a video instead of an image use the showwaves filter.

Fancy waveform

Fancy waveform

ffmpeg -i input.mp4 -filter_complex \
"[0:a]aformat=channel_layouts=mono, \
compand=gain=-6, \
showwavespic=s=600x120:colors=#9cf42f[fg]; \
color=s=600x120:color=#44582c, \
drawgrid=width=iw/10:height=ih/5:color=#9cf42f@0.1[bg]; \
[bg][fg]overlay=format=auto,drawbox=x=(iw-w)/2:y=(ih-h)/2:w=iw:h=1:color=#9cf42f" \
-frames:v 1 output.png

Explanation of options

  1. aformat downsamples the audio to mono. Otherwise, by default, a stereo input would result in a waveform with a different color for each channel (see Default waveform example above).

  2. compand modifies the dynamic range of the audio to make the waveform look less flat. It makes a less accurate representation of the actual audio, but can be more visually appealing for some inputs.

  3. showwavespic makes the actual waveform.

  4. color source filter is used to make a colored background that is the same size as the waveform.

  5. drawgrid adds a grid over the background. The grid does not represent anything, but is just for looks. The grid color is the same as the waveform color (#9cf42f), but opacity is set to 10% (@0.1).

  6. overlay will place [bg] (what I named the filtergraph for the background) behind [fg] (the waveform).

  7. Finally, drawbox will make the horizontal line so any silent areas are not blank.

Gradient example

Gradient example

Using gradients filter:

ffmpeg -i input.mp3 -filter_complex "gradients=s=1920x1080:c0=000000:c1=434343:x0=0:x1=0:y0=0:y1=1080,drawbox=x=(iw-w)/2:y=(ih-h)/2:w=iw:h=1:color=#0000ff[bg];[0:a]aformat=channel_layouts=mono,showwavespic=s=1920x1080:colors=#0068ff[fg];[bg][fg]overlay=format=auto" -vframes:v 1 output.png

Color background

waveform with simple color background

ffmpeg -i input.opus -filter_complex "color=c=blue[color];aformat=channel_layouts=mono,showwavespic=s=1280x720:colors=white[wave];[color][wave]scale2ref[bg][fg];[bg][fg]overlay=format=auto" -frames:v 1 output.png

The scale2ref filter automatically makes the background the same size as the waveform.

Image background

Of course you can use an image or video instead for the background:

Image background example

ffmpeg -i audio.flac -i background.jpg -filter_complex \
"[1:v]scale=600:-1,crop=iw:120[bg]; \
[0:a]showwavespic=s=600x120:colors=cyan|aqua[fg]; \
[bg][fg]overlay=format=auto" \
-q:v 3 showwavespic_bg.jpg

Getting waveform stats and data

Use the astats filter. Many stats are available: RMS, peak, min, max, difference, etc.

RMS level per audio frame

Example to get standard RMS level measured in dBFS per audio frame:

ffprobe -v error -f lavfi -i "amovie=input.wav,astats=metadata=1:reset=1" -show_entries frame_tags=lavfi.astats.Overall.RMS_level -of csv=p=0 > rms.log

Peak level per second

Add the asetnsamples filter.

ffprobe -v error -f lavfi -i "amovie=input.wav,asetnsamples=44100,astats=metadata=1:reset=1" -show_entries frame_tags=lavfi.astats.Overall.Peak_level -of csv=p=0

Same as above but with timestamps

ffprobe -v error -f lavfi -i "amovie=input.wav,asetnsamples=44100,astats=metadata=1:reset=1" -show_entries frame=pkt_pts_time:frame_tags=lavfi.astats.Overall.Peak_level -of csv=p=0

Output to file

Just append > output.log to the end of your command:

ffprobe -v error -f lavfi -i "amovie=input.wav,asetnsamples=44100,astats=metadata=1:reset=1" -show_entries frame_tags=lavfi.astats.Overall.RMS_level -of csv=p=0 > output.log


ffprobe -v error -f lavfi -i "amovie=input.wav,asetnsamples=44100,astats=metadata=1:reset=1" -show_entries frame_tags=lavfi.astats.Overall.RMS_level -of json > output.json

Generate .WAV sound frequency?

A friend did this one :

You need linux (i successfully use Centos & Ubuntu)

If i remember that was enough, it generate a .png from a .mp3 file, using libmad so. Code is quite simple to understand, as always feel free to submit improve !

it will generate a waveform pretty close as what you can found on soundcloud for example...

Related Topics

Leave a reply
