Detect Silence When Recording

Detect silence in audio recording

With this solution inspired by Visualizations with Web Audio API, you can set minimal required decibels and detect if anything was recorded.

const MIN_DECIBELS = -45;

navigator.mediaDevices.getUserMedia({ audio: true })
  .then(stream => {
    const mediaRecorder = new MediaRecorder(stream);
    mediaRecorder.start();

    const audioChunks = [];
    mediaRecorder.addEventListener("dataavailable", event => {
      audioChunks.push(event.data);
    });

    const audioContext = new AudioContext();
    const audioStreamSource = audioContext.createMediaStreamSource(stream);
    const analyser = audioContext.createAnalyser();
    analyser.minDecibels = MIN_DECIBELS;
    audioStreamSource.connect(analyser);

    const bufferLength = analyser.frequencyBinCount;
    const domainData = new Uint8Array(bufferLength);

    let soundDetected = false;

    const detectSound = () => {
      if (soundDetected) {
        return
      }

      analyser.getByteFrequencyData(domainData); 

      for (let i = 0; i < bufferLength; i++) {
        const value = domainData[i];

        if (domainData[i] > 0) {
          soundDetected = true
        }
      }

      window.requestAnimationFrame(detectSound);
    };

    window.requestAnimationFrame(detectSound);

    mediaRecorder.addEventListener("stop", () => {
      const audioBlob = new Blob(audioChunks);
      const audioUrl = URL.createObjectURL(audioBlob);
      const audio = new Audio(audioUrl);
      audio.play();
      
      console.log({ soundDetected });
    });
  });

Detect silence when recording

How can I detect silence when recording operation is started in Java?

Calculate the dB or RMS value for a group of sound frames and decide at what level it is considered to be 'silence'.

What is PCM data?

Data that is in Pulse-code modulation format.

How can I calculate PCM data in Java?

I do not understand that question. But guessing it has something to do with the speech-recognition tag, I have some bad news. This might theoretically be done using the Java Speech API. But there are apparently no 'speech to text' implementations available for the API (only 'text to speech').

I have to calculate rms for speech-recognition project. But I do not know how can I calculate in Java.

For a single channel that is represented by signal sizes in a double ranging from -1 to 1, you might use this method.

/** Computes the RMS volume of a group of signal sizes ranging from -1 to 1. */
public double volumeRMS(double[] raw) {
    double sum = 0d;
    if (raw.length==0) {
        return sum;
    } else {
        for (int ii=0; ii<raw.length; ii++) {
            sum += raw[ii];
        }
    }
    double average = sum/raw.length;

    double sumMeanSquare = 0d;
    for (int ii=0; ii<raw.length; ii++) {
        sumMeanSquare += Math.pow(raw[ii]-average,2d);
    }
    double averageMeanSquare = sumMeanSquare/raw.length;
    double rootMeanSquare = Math.sqrt(averageMeanSquare);

    return rootMeanSquare;
}

There is a byte buffer to save input values from the line, and what I should have to do with this buffer?

If using the volumeRMS(double[]) method, convert the byte values to an array of double values ranging from -1 to 1. ;)

Can I use MediaRecorder to record audio with capture silence detection in Android?

No, in order to detect silence you need to inspect the audio data.

MediaRecorder just writes the audio to a file as can be seen in MediaRecorder's state machine diagram.

The solution would be to use AudioRecord and analyze the data before writing the bytes to a file. Since you want to resume recording, you'll need to the keep the mic open and process the incoming audio for the user speaking.

The subject of analyzing digital audio data is known as signal processing where you can find a number of resources:

Simplest way of detecting where audio envelopes start and stop

What dBFS threshold should I set for differentiation between silence and NOT silence

But for a first attempt, I would use Audio record in android and FFT and assume a quiet room. You'll have to come up with an appropriate trigger level (see above links).

Be aware that processing does add some lag and you'll have to compensate, i.e. buffer 0 has silence, buffer 1 the user starts to speak but doesn't trigger threshold, buffer 2 user is speaking and threshold is triggered. You may want to save buffer 1 in addition to buffer 2.

Find silence in AVAudioRecorder Session

here I Have created my function that will actually detect silence for 5 seconds and if condition is satisfied you can stop recording that time

-- I had used Recording Manager NSObject class so you can get the code from below function and manage to use it in yours

Code

//StartNewRecordingIfSilenceFor5Second
    func newSessionIfSilence(){

        //get Audio file name to store
        let AudioFileName = getDocumentsDirectory().appendingPathComponent("\(getUniqueName()).wav")
        //Declare a value that will be updated when silence is detected
        var statusForDetection = Float()
        //Recorder Settings used
        let settings: [String: Any] = [
            AVFormatIDKey: Int(kAudioFormatLinearPCM),
            AVSampleRateKey: 16000,
            AVNumberOfChannelsKey: 1,
            AVLinearPCMBitDepthKey: 16,
            AVEncoderAudioQualityKey: AVAudioQuality.high.rawValue,
            AVLinearPCMIsBigEndianKey: false,
            AVLinearPCMIsFloatKey: false,
            ]
        //Try block
        do {
            //Start Recording With Audio File name
            Manager.recorder = try AVAudioRecorder(url: AudioFileName, settings: settings)
            Manager.recorder?.delegate = self
            Manager.recorder?.isMeteringEnabled = true
            Manager.recorder?.prepareToRecord()
            Manager.recorder?.record()

            //Tracking Metering values here only
            Manager.meterTimer = Timer.scheduledTimer(withTimeInterval: 0.10, repeats: true, block: { (timer: Timer) in

                //Update Recording Meter Values so we can track voice loudness
                //Getting Recorder from another class
                //i managed my recorder from Manager class
                if let recorder = Manager.recorder
                {
                    //Start Metering Updates
                    recorder.updateMeters()

                    //Get peak values
                    Manager.recorderApc0 = recorder.averagePower(forChannel: 0)
                    Manager.recorderPeak0 = recorder.peakPower(forChannel: 0)

                    //it’s converted to a 0-1 scale, where zero is complete quiet and one is full volume.
                    let ALPHA: Double = 0.05
                    let peakPowerForChannel = pow(Double(10), (0.05 * Double(Manager.recorderPeak0)))

//                    static var lowPassResults: Double = 0.0
                    RecordingManager.lowPassResults = ALPHA * peakPowerForChannel + (1.0 - ALPHA) * RecordingManager.lowPassResults

                    if(RecordingManager.lowPassResults > 0){
                        print("Mic blow detected")
                        //Do what you wanted to do here

                        //if blow is detected update silence value as zero
                        statusForDetection = 0.0
                    }
                    else
                    {

                        //Update Value for Status is blow being detected or not
                        //As timer is called at interval of 0.10 i.e 0.1 So add value each time in silence Value with 0.1
                        statusForDetection += 0.1

                        //if blow is not Detected for 5 seconds
                        if statusForDetection > 5.0 {
                            //Update value to zero
                            //When value of silence is greater than 5 Seconds
                            //Time to Stop recording
                            statusForDetection = 0.0

                            //Stop Audio recording
                            recorder.stop()

                        }
                    }
                }
            })

        } catch {
            //Finish Recording with a Error
            print("Error Handling: \(error.localizedDescription)")
            self.finishRecording(success: false)
        }

    }

Detect & Record Audio in Python - trim beginning silence

The issue was due to a quick, inaudible flash of sound when the mic was started. Adding del r[0:8000] before r = normalize(r) eliminated the first moments of audio containing the inaudible sound data, solving my issue.