Save Google Cloud Speech API Operation(Job) Object to Retrieve Results Later

Save Google Cloud Speech API operation(job) object to retrieve results later

You can monkey-patch this functionality to the version you are using, but I would advise upgrading to google-cloud-speech 0.24.0 or later. With those more current versions you can use Operation#id and Project#operation to accomplish this.

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :linear16,
                     language: "en-US",
                     sample_rate: 16000

op = audio.process
# get the operation's id
id = op.id #=> "1234567890"

# construct a new operation object from the id
op2 = speech.operation id

# verify the jobs are the same
op.id == op2.id #=> true

op2.done? #=> false
op2.wait_until_done!
op2.done? #=> true

results = op2.results

Update Since you can't upgrade, you can monkey-patch this functionality to an older-version using the workaround described in GoogleCloudPlatform/google-cloud-ruby#1214:

require "google/cloud/speech"

# Add monkey-patches
module Google
  Module Cloud
    Module Speech
      class Job
        def id
          @grpc.name
        end
      end
      class Project
        def job id
          Job.from_grpc(OpenStruct.new(name: id), speech.service).refresh!
        end
      end
    end
  end
end

# Use the new monkey-patched methods
speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :linear16,
                     language: "en-US",
                     sample_rate: 16000

job = audio.recognize_job
# get the job's id
id = job.id #=> "1234567890"

# construct a new operation object from the id
job2 = speech.job id

# verify the jobs are the same
job.id == job2.id #=> true

job2.done? #=> false
job2.wait_until_done!
job2.done? #=> true

results = job2.results

How to get the result of a long-running Google Cloud Speech API operation later?

After reading the source, I found that GRPC has a 10 minute timeout. If you submit a large file, transcription can take over 10 minutes. The trick is to use the HTTP backend. The HTTP backend doesn't maintain a connection like GRPC, instead everytime you poll it sends a HTTP request. To use HTTP, do

speech_client = speech.Client(_use_grpc=False)

How to resume Google Cloud Speech API (longRunningRecognize) timeout on Cloud Functions

If your job is going to take more than 540 seconds, Cloud Functions is not really the best solution for this problem. Instead, you may want to consider using Cloud Functions as just a triggering mechanism, then offload the work to App Engine or Compute Engine using pubsub to send it the relevant data (e.g. the location of the file in Cloud Storage, and other metadata needed to make the request to recognize speech.

Google cloud -speech api return null result

Solved it by using the ffmpeg library to encode the audio to flac whit mono channel.

Why is the speech REST API response different from the go SDK API response?

The JSON-marshaled Golang (structs) are protobufs (snake_case'd fields and the times are google.protobuf.Timestamp).

Can you try using the Golang protobuf protojson package instead of encoding/json as this should bijectively map JSON and Golang protobuf structs.

How to work with result from google speech to text API

The MessageToJson converts the RecognizeResponse from protobuf message to JSON format but in a form of string.

You can work directly with the RecognizeResponse in the following way:

response: RecognizeResponse = client.recognize(config=your_config, audio=your_audio)
final_transcripts = []
final_transcripts_confidence = []
for result in response.results:
   alternative = result.alternatives[0]
   final_transcripts_confidence.append(alternative.confidence)
   final_transcripts.append(alternative.transcript)

If you would like to work with MessageToJson anyway and convert it to dictionary you can do the following:

import json
from google.protobuf.json_format import MessageToJson

response: RecognizeResponse = client.recognize(config=your_config, audio=your_audio)
response_json_str = MessageToJson(response, indent=0)
response_dict = json.loads(response_json_str)

or you use MessageToDict to directly convert to dictionary.

NOTE:

From some version the proto conversion changed and results in getting an error: AttributeError: 'DESCRIPTOR'

To solve this you should use:

RecognizeResponse.to_json(response)

or alternatively:

RecognizeResponse.to_dict(response)

Save Google Cloud Speech API Operation(Job) Object to Retrieve Results Later