Google Speech Recognition Timeout

Google speech API timeout time

By default there is a system timeout of 10 minutes.
This is a known issue for other Google Cloud services, but the fix suggested there did not work for me, I assume it's something else to be set when you run your code and start your connection.

Anyways, there is a workaround! You get the long running operation name, and then you stop your program. The operation will continue on the google server, and later you will fetch the result!

As written in the docs

Asynchronous Speech Recognition starts a long running audio processing operation.

I will refer to the node.js sample here, similar concepts will apply for others.
So, when you get your response (do not use the promise version) pass it a callback, like explained here, and instead of

operation
.on('error', function(err) {})
.on('complete', function(transcript) {
// transcript = "how old is the Brooklyn Bridge"
});

just do something like

console.log(operation)

take note of the operation name, and later on use the operation method

You can test these on the google oauth playground

Google Speech Recognition timeout

EDIT - Has apparently been fixed in the August 2016 coming release You can test the beta to confirm.

This is a bug with the release of Google 'Now' V6.0.23.* and persists in the latest V6.1.28.*

Since the release of V5.11.34.* Google's implementation of the SpeechRecognizer has been plagued with bugs.

You can use this gist to replicate many of them.

You can use this BugRecognitionListener to work around some of them.

I have reported these directly to the Now team, so they are aware, but as yet, nothing has been fixed. There is no external bug tracker for Google Now, as it's not part of AOSP, so nothing you can star I'm afraid.

The most recent bug you detail pretty much makes their implementation unusable, as you correctly point out, the parameters to control the speech input timings are ignored. Which according to the documentation:

Additionally, depending on the recognizer implementation, these values
may have no effect.

is something we should expect......

The recognition will continue indefinitely if you don't speak or make any detectable sound.

I'm currently creating a project to replicate this new bug and all of the others, which I'll forward on and link here shortly.

EDIT - I was hoping I could create a workaround that used the detection of partial or unstable results as the trigger to know that the user was still speaking. Once they stopped, I could manually call recognizer.stopListening() after a set period of time.

Unfortunately, stopListening() is broken too and doesn't actually stop the recognition, therefore there is no workaround to this.

Attempts around the above, of destroying the recognizer and relying only on the partial results up until that point (when destroying the recognizer onResults() is not called) failed to produce a reliable implementation, unless you're simply keyword spotting.

There is nothing we can do until Google fix this. Your only outlet is to email apps-help@google.com reporting the problem and hope that the volume they receive gives them a nudge.....

How to end Google Speech-to-Text streamingRecognize gracefully and get back the pending text results?

My bad — unsurprisingly, this turned to be an obscure race condition in my code.

I've put together a self-contained sample that works as expected (gist). It helped me tracking down the issue. Hopefully, it may help others and my future self:

// A simple streamingRecognize workflow,
// tested with Node v15.0.1, by @noseratio

import fs from 'fs';
import path from "path";
import url from 'url';
import util from "util";
import timers from 'timers/promises';
import speech from '@google-cloud/speech';

export {}

// need a 16-bit, 16KHz raw PCM audio
const filename = path.join(path.dirname(url.fileURLToPath(import.meta.url)), "sample.raw");
const encoding = 'LINEAR16';
const sampleRateHertz = 16000;
const languageCode = 'en-US';

const request = {
config: {
encoding: encoding,
sampleRateHertz: sampleRateHertz,
languageCode: languageCode,
},
interimResults: false // If you want interim results, set this to true
};

// init SpeechClient
const client = new speech.v1p1beta1.SpeechClient();
await client.initialize();

// Stream the audio to the Google Cloud Speech API
const stream = client.streamingRecognize(request);

// log all data
stream.on('data', data => {
const result = data.results[0];
console.log(`SR results, final: ${result.isFinal}, text: ${result.alternatives[0].transcript}`);
});

// log all errors
stream.on('error', error => {
console.warn(`SR error: ${error.message}`);
});

// observe data event
const dataPromise = new Promise(resolve => stream.once('data', resolve));

// observe error event
const errorPromise = new Promise((resolve, reject) => stream.once('error', reject));

// observe finish event
const finishPromise = new Promise(resolve => stream.once('finish', resolve));

// observe close event
const closePromise = new Promise(resolve => stream.once('close', resolve));

// we could just pipe it:
// fs.createReadStream(filename).pipe(stream);
// but we want to simulate the web socket data

// read RAW audio as Buffer
const data = await fs.promises.readFile(filename, null);

// simulate multiple audio chunks
console.log("Writting...");
const chunkSize = 4096;
for (let i = 0; i < data.length; i += chunkSize) {
stream.write(data.slice(i, i + chunkSize));
await timers.setTimeout(50);
}
console.log("Done writing.");

console.log("Before ending...");
await util.promisify(c => stream.end(c))();
console.log("After ending.");

// race for events
await Promise.race([
errorPromise.catch(() => console.log("error")),
dataPromise.then(() => console.log("data")),
closePromise.then(() => console.log("close")),
finishPromise.then(() => console.log("finish"))
]);

console.log("Destroying...");
stream.destroy();
console.log("Final timeout...");
await timers.setTimeout(1000);
console.log("Exiting.");

The output:


Writting...
Done writing.
Before ending...
SR results, final: true, text: this is a test I'm testing voice recognition This Is the End
After ending.
data
finish
Destroying...
Final timeout...
close
Exiting.

To test it, a 16-bit/16KHz raw PCM audio file is required. An arbitrary WAV file wouldn't work as is because it contains a header with metadata.

Google speech to text API Timeout

How to abort a long running method?
Original code found here

The thread will run for your set time then abort you can put your exception handling or logger in the if statement. The long running method is only for demonstration purposes.

   class Program
{
static void Main(string[] args)
{

//Method will keep on printing forever as true is true trying to simulate a long runnning method
void LongRunningMethod()
{
while (true)
{
Console.WriteLine("Test");
}
}


//New thread runs for set amount of time then aborts the operation after the time in this case 1 second.
void StartThread()
{
Thread t = new Thread(LongRunningMethod);
t.Start();
if (!t.Join(1000)) // give the operation 1s to complete
{
Console.WriteLine("Aborted");
// the thread did not complete on its own, so we will abort it now
t.Abort();

}
}

//Calling the start thread method.
StartThread();

}
}

MAUI-Android: How to keep Google Speech Recognizer from timeout

After struggle for a few days without any success, I found a new way to do this thing by using SpeechRecognizer class, instead of using a Google service. With this, I am able to have a better control on the process.

To use SpeechRecognizer, I copied the code in "Create platform microphone services" for permission from this Microsoft page: https://docs.microsoft.com/en-us/xamarin/xamarin-forms/data-cloud/azure-cognitive-services/speech-recognition

I have update my code as below:

  1. Login page: currently is named Prototype2.
using MauiDemo.Common;
using MauiDemo.Speech;
using Microsoft.Maui.Controls;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;

namespace MauiDemo.View
{
public partial class Prototype2 : ContentPage
{
private string _field = string.Empty;
private int _waitTime = 2000;

public List<Language> Languages { get; }

private SpeechToTextImplementation2 _speechRecognizer;

//private BackgroundWorker worker = new BackgroundWorker();

private struct VoiceMode
{
int Username = 1;
int Password = 2;
}

public Prototype2()
{
InitializeComponent();
this.lblTitle.Text = "Prototype2" + App.Status;

CheckMicrophone();

CommonData.CurrentField = string.Empty;

try
{
_speechRecognizer = new SpeechToTextImplementation2();
_speechRecognizer.Language = DefaultData.SettingLanguage;
}
catch (Exception ex)
{
DisplayAlert("Error", ex.Message, "OK");
}

MessagingCenter.Subscribe<ISpeechToText, string>(this, "STT", (sender, args) =>
{
ReceivedUsernameAsync(args);
});

MessagingCenter.Subscribe<ISpeechToText>(this, "Final", (sender) =>
{
btnSpeak.IsEnabled = true;
});

MessagingCenter.Subscribe<IMessageSender, string>(this, "STT", (sender, args) =>
{
SpeechToTextRecievedAsync(args);
});

isReceiveUsername = false;
isReceivePassword = false;
RequestUsername(true);
}

protected override void OnDisappearing()
{
CommonData.CurrentField = string.Empty;
base.OnDisappearing();
}

private async void btnSpeak_Clicked(Object sender, EventArgs e)
{
isReceiveUsername = false;
isReceivePassword = false;
await RequestUsername(true);
}

private async void SpeechToTextRecievedAsync(string args)
{
switch (_field)
{
case "Username":
await this.ReceivedUsernameAsync(args);
break;

case "Password":
await this.ReceivedPasswordAsync(args);
break;

}
}

bool isReceiveUsername = false;
bool isReceivePassword = false;

private async Task ReceivedUsernameAsync(string args)
{
txtUsername.Text = args.Replace(" ", string.Empty);
lblMessage.Text = string.Empty;

if (string.IsNullOrWhiteSpace(txtUsername.Text))
{
isReceiveUsername = false;
}
else
{
isReceiveUsername = true;
var checkUser = DefaultData.Users.Where(x => x.Username.ToLower().Equals(txtUsername.Text.ToLower()));
if (checkUser.Any())
{
await RequestPassword(true);
}
else
{
string message = CommonData.GetMessage(MessageCode.WrongUsername);
lblMessage.Text = message;
isReceiveUsername = false;
await RequestUsername(false, message);
}
}
}

private async Task ReceivedPasswordAsync(string args)
{
txtPassword.Text = args.Replace(" ", string.Empty);
lblMessage.Text = string.Empty;

if (string.IsNullOrWhiteSpace(txtPassword.Text))
{
isReceivePassword = false;
}
else
{
isReceivePassword = true;
var checkUser = DefaultData.Users.Where(x => x.Username.ToLower().Equals(txtUsername.Text.ToLower()) && x.Password.Equals(txtPassword.Text));
if (checkUser.Any())
{
_field = "";
lblDisplayname.Text = checkUser.FirstOrDefault().Displayname;

string msg = CommonData.GetMessage(MessageCode.LoginSuccess);
await Plugin.TextToSpeech.CrossTextToSpeech.Current.Speak(
msg
, crossLocale: CommonData.GetCrossLocale(DefaultData.SettingLanguage)
, speakRate: DefaultData.SettingSpeed
, pitch: DefaultData.SettingPitch
);

await Navigation.PushAsync(new MainPage());
}
else
{
string message = CommonData.GetMessage(MessageCode.WrongPassword);
lblMessage.Text = message;
isReceivePassword = false;
await RequestPassword(false, message);
}
}
}

private async Task RepeatVoiceUsername(string message)
{
do
{
await Plugin.TextToSpeech.CrossTextToSpeech.Current.Speak(
message
, crossLocale: CommonData.GetCrossLocale(DefaultData.SettingLanguage)
, speakRate: DefaultData.SettingSpeed
, pitch: DefaultData.SettingPitch
);
Thread.Sleep(_waitTime);
}
while (!isReceiveUsername);
}

private async Task RepeatVoicePassword(string message)
{
do
{
await Plugin.TextToSpeech.CrossTextToSpeech.Current.Speak(
message
, crossLocale: CommonData.GetCrossLocale(DefaultData.SettingLanguage)
, speakRate: DefaultData.SettingSpeed
, pitch: DefaultData.SettingPitch
);
Thread.Sleep(_waitTime);
}
while (!isReceivePassword);
}

private bool CheckMicrophone()
{
string rec = Android.Content.PM.PackageManager.FeatureMicrophone;
if (rec != "android.hardware.microphone")
{
// no microphone, no recording. Disable the button and output an alert
DisplayAlert("Error", CommonData.GetMessage(MessageCode.SettingSaveSuccess), "OK");
btnSpeak.IsEnabled = false;
return false;
}
return true;
}

private async Task RequestUsername(bool isRepeat, string message = "")
{
_field = "Username";
isReceiveUsername = false;
//txtUsername.Text = string.Empty;
//lblDisplayname.Text = string.Empty;
txtUsername.Focus();
message = (message.IsNullOrWhiteSpace() ? CommonData.GetMessage(MessageCode.InputUsername) : message);
if (isRepeat)
{
Task.Run(() => RepeatVoiceUsername(message));
}
else
{
await Plugin.TextToSpeech.CrossTextToSpeech.Current.Speak(
message
, crossLocale: CommonData.GetCrossLocale(DefaultData.SettingLanguage)
, speakRate: DefaultData.SettingSpeed
, pitch: DefaultData.SettingPitch
);
}
_speechRecognizer.StartListening();
}

private async Task RequestPassword(bool isRepeat, string message = "")
{
_field = "Password";
isReceivePassword = false;
//txtPassword.Text = string.Empty;
//lblDisplayname.Text = string.Empty;
txtPassword.Focus();
message = (message.IsNullOrWhiteSpace() ? CommonData.GetMessage(MessageCode.InputPassword) : message);
if (isRepeat)
{
Task.Run(() => RepeatVoicePassword(message));
}
else
{
await Plugin.TextToSpeech.CrossTextToSpeech.Current.Speak(
message
, crossLocale: CommonData.GetCrossLocale(DefaultData.SettingLanguage)
, speakRate: DefaultData.SettingSpeed
, pitch: DefaultData.SettingPitch
);
}
_speechRecognizer.StartListening();
}
}
}

  1. New Microphone Service to handle the permission
using Android.App;
using Android.Content.PM;
using Android.OS;
using AndroidX.Core.App;
using Google.Android.Material.Snackbar;
using System.Threading.Tasks;

namespace MauiDemo.Speech
{
public class MicrophoneService
{
public const int RecordAudioPermissionCode = 1;
private TaskCompletionSource<bool> tcsPermissions;
string[] permissions = new string[] { Manifest.Permission.RecordAudio };

public MicrophoneService()
{
tcsPermissions = new TaskCompletionSource<bool>();
}

public Task<bool> GetPermissionAsync()
{

if ((int)Build.VERSION.SdkInt < 23)
{
tcsPermissions.TrySetResult(true);
}
else
{
var currentActivity = MainActivity.Instance;
if (ActivityCompat.CheckSelfPermission(currentActivity, Manifest.Permission.RecordAudio) != (int)Permission.Granted)
{
RequestMicPermissions();
}
else
{
tcsPermissions.TrySetResult(true);
}

}

return tcsPermissions.Task;
}

public void OnRequestPermissionResult(bool isGranted)
{
tcsPermissions.TrySetResult(isGranted);
}

void RequestMicPermissions()
{
if (ActivityCompat.ShouldShowRequestPermissionRationale(MainActivity.Instance, Manifest.Permission.RecordAudio))
{
Snackbar.Make(MainActivity.Instance.FindViewById(Android.Resource.Id.Content),
"Microphone permissions are required for speech transcription!",
Snackbar.LengthIndefinite)
.SetAction("Ok", v =>
{
((Activity)MainActivity.Instance).RequestPermissions(permissions, RecordAudioPermissionCode);
})
.Show();
}
else
{
ActivityCompat.RequestPermissions((Activity)MainActivity.Instance, permissions, RecordAudioPermissionCode);
}
}
}
}

  1. New Speech=>Text class to use SpeechRecognizer: Mostly take frrm this How to increase the voice listen time in Google Recognizer Intent(Speech Recognition) Android
using Android;
using Android.App;
using Android.Content;
using Android.OS;
using Android.Runtime;
using Android.Speech;
using AndroidX.Core.App;
using Java.Util;
using MauiDemo.Common;
using Microsoft.Maui.Controls;
using Plugin.CurrentActivity;
using System.Threading;

namespace MauiDemo.Speech
{

public class SpeechToTextImplementation2 : Java.Lang.Object, IRecognitionListener, IMessageSender
{
public static AutoResetEvent autoEvent = new AutoResetEvent(false);
private readonly int VOICE = 10;
private Activity _activity;
private float _timeOut = 3;
private SpeechRecognizer _speech;
private Intent _speechIntent;
public string Words;
public string Language;
private MicrophoneService micService;

public SpeechToTextImplementation2()
{
micService = new MicrophoneService();
_activity = CrossCurrentActivity.Current.Activity;
var locale = Locale.Default;
if (!string.IsNullOrWhiteSpace(Language))
{
locale = new Locale(Language);
}
_speech = SpeechRecognizer.CreateSpeechRecognizer(this._activity);
_speech.SetRecognitionListener(this);
_speechIntent = new Intent(RecognizerIntent.ActionRecognizeSpeech);
_speechIntent.PutExtra(RecognizerIntent.ExtraLanguageModel, RecognizerIntent.LanguageModelFreeForm);

_speechIntent.PutExtra(RecognizerIntent.ExtraSpeechInputCompleteSilenceLengthMillis, _timeOut * 1000);
_speechIntent.PutExtra(RecognizerIntent.ExtraSpeechInputPossiblyCompleteSilenceLengthMillis, _timeOut * 1000);
_speechIntent.PutExtra(RecognizerIntent.ExtraSpeechInputMinimumLengthMillis, _timeOut * 1000);
_speechIntent.PutExtra(RecognizerIntent.ExtraMaxResults, 1);

_speechIntent.PutExtra(RecognizerIntent.ExtraLanguage, locale.ToString());

}


void RestartListening()
{
var locale = Locale.Default;
if (!string.IsNullOrWhiteSpace(Language))
{
locale = new Locale(Language);
}

_speech.Destroy();
_speech = SpeechRecognizer.CreateSpeechRecognizer(this._activity);
_speech.SetRecognitionListener(this);
_speechIntent = new Intent(RecognizerIntent.ActionRecognizeSpeech);
_speechIntent.PutExtra(RecognizerIntent.ExtraLanguageModel, RecognizerIntent.LanguageModelFreeForm);
_speechIntent.PutExtra(RecognizerIntent.ExtraSpeechInputCompleteSilenceLengthMillis, _timeOut * 1000);
_speechIntent.PutExtra(RecognizerIntent.ExtraSpeechInputPossiblyCompleteSilenceLengthMillis, _timeOut * 1000);
_speechIntent.PutExtra(RecognizerIntent.ExtraSpeechInputMinimumLengthMillis, _timeOut * 1000);
_speechIntent.PutExtra(RecognizerIntent.ExtraMaxResults, 1);
_speechIntent.PutExtra(RecognizerIntent.ExtraLanguage, locale.ToString());
StartListening();
}

public async void StartListening()
{
bool isMicEnabled = await micService.GetPermissionAsync();
if (!isMicEnabled)
{
Words = "Please grant access to the microphone!";
return;
}
_speech.StartListening(_speechIntent);
}

public void StopListening()
{
_speech.StopListening();
}

public void OnBeginningOfSpeech()
{

}

public void OnBufferReceived(byte[] buffer)
{
}

public void OnEndOfSpeech()
{

}

public void OnError([GeneratedEnum] SpeechRecognizerError error)
{
Words = error.ToString();
MessagingCenter.Send<IMessageSender, string>(this, "Error", Words);
RestartListening();
}

public void OnEvent(int eventType, Bundle @params)
{
}

public void OnPartialResults(Bundle partialResults)
{
}

public void OnReadyForSpeech(Bundle @params)
{
}

public void OnResults(Bundle results)
{

var matches = results.GetStringArrayList(SpeechRecognizer.ResultsRecognition);
if (matches == null)
Words = "Null";
else
if (matches.Count != 0)
Words = matches[0];
else
Words = "";

MessagingCenter.Send<IMessageSender, string>(this, "STT", Words);

RestartListening();
}

public void OnRmsChanged(float rmsdB)
{

}
}
}

  1. Update MainActivities for permission
using Android.App;
using Android.Content;
using Android.Content.PM;
using Android.OS;
using Android.Runtime;
using Android.Speech;
using MauiDemo.Common;
using MauiDemo.Speech;
using Microsoft.Maui;
using Microsoft.Maui.Controls;

namespace MauiDemo
{
[Activity(Label = "Maui Demo", Theme = "@style/Maui.SplashTheme", MainLauncher = true, ConfigurationChanges = ConfigChanges.ScreenSize | ConfigChanges.Orientation | ConfigChanges.UiMode | ConfigChanges.ScreenLayout | ConfigChanges.SmallestScreenSize)]
public class MainActivity : MauiAppCompatActivity, IMessageSender
{


protected override void OnCreate(Bundle savedInstanceState)
{
base.OnCreate(savedInstanceState);
Instance = this;
micService = new MicrophoneService();
}


private readonly int VOICE = 10;

protected override void OnActivityResult(int requestCode, Result resultCode, Intent data)
{

if (requestCode == VOICE)
{
if (resultCode == Result.Ok)
{
var matches = data.GetStringArrayListExtra(RecognizerIntent.ExtraResults);
if (matches.Count != 0)
{
string textInput = matches[0];
MessagingCenter.Send<IMessageSender, string>(this, "STT", textInput);
}
else
{
MessagingCenter.Send<IMessageSender, string>(this, "STT", "");
}

//SpeechToTextImplementation.autoEvent.Set();
}
}
base.OnActivityResult(requestCode, resultCode, data);
}

MicrophoneService micService;
internal static MainActivity Instance { get; private set; }

public override void OnRequestPermissionsResult(int requestCode, string[] permissions, [GeneratedEnum] Android.Content.PM.Permission[] grantResults)
{
// ...
switch (requestCode)
{
case MicrophoneService.RecordAudioPermissionCode:
if (grantResults[0] == Permission.Granted)
{
micService.OnRequestPermissionResult(true);
}
else
{
micService.OnRequestPermissionResult(false);
}
break;
}
}
}
}

Feel free to check out the code, but I will not use this for anything serious because it does not runs properly yet.

Any opinion to improve for the code will be really appreciated, as I really want to get good with this MAUI platform.



Related Topics



Leave a reply



Submit