Detect & Record Audio in Python

Detect & Record Audio in Python - trim beginning silence

The issue was due to a quick, inaudible flash of sound when the mic was started. Adding del r[0:8000] before r = normalize(r) eliminated the first moments of audio containing the inaudible sound data, solving my issue.

pyaudio - Listen until voice is detected and then record to a .wav file

Look here:

https://github.com/jeysonmc/python-google-speech-scripts/blob/master/stt_google.py

It even converts Wav to flac and sends it to the google Speech api , just delete the stt_google_wav function if you dont need it ;)

How to record audio each time user presses a key?

As to the primary concern of your computer seeming to work terribly hard, that's because you use a while loop to constantly check for when the record key is released. Within this loop, the computer will loop around as fast as it can without ever taking a break.

A better solution is to use event driven programming where you let the OS inform you of events periodically, and check if you want to do anything when they happen. This may sound complicated, but fortunately pynput does most of the hard work for you.

If you keep track of the state of the recording or playback, it is also fairly simple to start a new recording the next time a control key down event happens without needing the "hack" of calling an entire new process recursively for each new recording. The event loop inside the keyboard listener will keep on going until one of the callback functions returns False or raises self.stopException().

I have created a simple listener class similar to your initial attempt that calls on a recorder or player instance (which I'll get to later) to start and stop. I also have to agree with Anwarvic that <ctl-c> is supposed to be reserved as an emergency way of stopping a script, so I have changed the stop command to the letter q.

class listener(keyboard.Listener):
    def __init__(self, recorder, player):
        super().__init__(on_press = self.on_press, on_release = self.on_release)
        self.recorder = recorder
        self.player = player
    
    def on_press(self, key):
        if key is None: #unknown event
            pass
        elif isinstance(key, keyboard.Key): #special key event
            if key.ctrl and self.player.playing == 0:
                self.recorder.start()
        elif isinstance(key, keyboard.KeyCode): #alphanumeric key event
            if key.char == 'q': #press q to quit
                if self.recorder.recording:
                    self.recorder.stop()
                return False #this is how you stop the listener thread
            if key.char == 'p' and not self.recorder.recording:
                self.player.start()
                
    def on_release(self, key):
        if key is None: #unknown event
            pass
        elif isinstance(key, keyboard.Key): #special key event
            if key.ctrl:
                self.recorder.stop()
        elif isinstance(key, keyboard.KeyCode): #alphanumeric key event
            pass

if __name__ == '__main__':
    r = recorder("mic.wav")
    p = player("mic.wav")
    l = listener(r, p)
    print('hold ctrl to record, press p to playback, press q to quit')
    l.start() #keyboard listener is a thread so we start it here
    l.join() #wait for the tread to terminate so the program doesn't instantly close

With that structure, we then need a recorder class with a start and stop function which will not block (asynchronous) the listener thread from continuing to receive key events. The documentation for PyAudio gives a pretty good example for asynchronous output, so I simply applied it to an input. A little bit of re-arranging, and a flag to let our listener know when we're recording later, and we have a recorder class:

class recorder:
    def __init__(self, 
                 wavfile, 
                 chunksize=8192, 
                 dataformat=pyaudio.paInt16, 
                 channels=2, 
                 rate=44100):
        self.filename = wavfile
        self.chunksize = chunksize
        self.dataformat = dataformat
        self.channels = channels
        self.rate = rate
        self.recording = False
        self.pa = pyaudio.PyAudio()

    def start(self):
        #we call start and stop from the keyboard listener, so we use the asynchronous 
        # version of pyaudio streaming. The keyboard listener must regain control to 
        # begin listening again for the key release.
        if not self.recording:
            self.wf = wave.open(self.filename, 'wb')
            self.wf.setnchannels(self.channels)
            self.wf.setsampwidth(self.pa.get_sample_size(self.dataformat))
            self.wf.setframerate(self.rate)
            
            def callback(in_data, frame_count, time_info, status):
                #file write should be able to keep up with audio data stream (about 1378 Kbps)
                self.wf.writeframes(in_data) 
                return (in_data, pyaudio.paContinue)
            
            self.stream = self.pa.open(format = self.dataformat,
                                       channels = self.channels,
                                       rate = self.rate,
                                       input = True,
                                       stream_callback = callback)
            self.stream.start_stream()
            self.recording = True
            print('recording started')
    
    def stop(self):
        if self.recording:         
            self.stream.stop_stream()
            self.stream.close()
            self.wf.close()
            
            self.recording = False
            print('recording finished')

Finally, we create an audio player for audio playback when you press p. I threw the PyAudio example into a thread which is created everytime you press the button so that multiple players could be created which overlap eachother. We also keep track of how many players are playing so we don't try to record while the file is already in use by a player. (I also have included my imports at the top)

from threading import Thread, Lock
from pynput import keyboard
import pyaudio
import wave

class player:
    def __init__(self, wavfile):
        self.wavfile = wavfile
        self.playing = 0 #flag so we don't try to record while the wav file is in use
        self.lock = Lock() #muutex so incrementing and decrementing self.playing is safe
    
    #contents of the run function are processed in another thread so we use the blocking
    # version of pyaudio play file example: http://people.csail.mit.edu/hubert/pyaudio/#play-wave-example
    def run(self):
        with self.lock:
            self.playing += 1
        with wave.open(self.wavfile, 'rb') as wf:
            p = pyaudio.PyAudio()
            stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
                            channels=wf.getnchannels(),
                            rate=wf.getframerate(),
                            output=True)
            data = wf.readframes(8192)
            while data != b'':
                stream.write(data)
                data = wf.readframes(8192)

            stream.stop_stream()
            stream.close()
            p.terminate()
            wf.close()
        with self.lock:
            self.playing -= 1
        
    def start(self):
        Thread(target=self.run).start()

I can't guarantee this is perfectly free of bugs, but if you have any questions on how it works / how to get it working, feel free to comment.

Detect & Record Audio in Python