Executing Text-To-Speech in Order

Executing text-to-speech in order

Just speak an utterance, receive the delegate method, and in that method wait the desired interval and go on to the next utterance and interval.

Here's a complete example. It uses a Cocoa project, not SwiftUI, but you can easily adapt it.

import UIKit
import AVFoundation

func delay(_ delay:Double, closure:@escaping ()->()) {
let when = DispatchTime.now() + delay
DispatchQueue.main.asyncAfter(deadline: when, execute: closure)
}

class Speaker : NSObject, AVSpeechSynthesizerDelegate {
var synth : AVSpeechSynthesizer!
var sentences = [String]()
var intervals = [Double]()
func start(_ sentences: [String], _ intervals: [Double]) {
self.sentences = sentences
self.intervals = intervals
self.synth = AVSpeechSynthesizer()
synth.delegate = self
self.sayOne()
}
func sayOne() {
if let sentence = sentences.first {
sentences.removeFirst()
let utter = AVSpeechUtterance(string: sentence)
self.synth.speak(utter)
}
}
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
if let interval = intervals.first {
intervals.removeFirst()
delay(interval) {
self.sayOne()
}
}
}
}

class ViewController: UIViewController {
let speaker = Speaker()
override func viewDidLoad() {
super.viewDidLoad()
let sentences = [
"I will speak again in one second",
"I will speak again in five seconds",
"I will speak again in one second",
"Done"]
let intervals = [1.0, 5.0, 1.0]
self.speaker.start(sentences, intervals)
}
}

Controlling the loop execution

You could always use Combine for this

import Combine

let speaker = Speaker()
let capitals = ["Canberra is the capital of Australia", "Seoul is the capital of South Korea", "Tokyo is the capital of Japan", "Berlin is the capital of Germany"]
var playerCancellable: AnyCancellable? = nil

Button("Play Sound") {
playSound()
}

func playSound() {
// Fairly standard timer publisher. The call to .autoconnect() tells the timer to start publishing when subscribed to.
let timer = Timer.publish(every: 20, on: .main, in: .default)
.autoconnect()

// Publishers.Zip takes two publishers.
// It will only publish when there is a "symmetrical" output. It behaves in a similar manner as `zip` on sequences.
// So, in this scenario, you will not get the next element of your array until the timer emits another event.
// In the call to sink, we ignore the first element of the tuple relating to the timer
playerCancellable = Publishers.Zip(timer, capitals.publisher)
.sink { _, item in
speaker.speak(item)
}
}

Edit

You mentioned in the comments that you want to be able to variably control the delay between utterances. That's not really something a Timer can be used for. I hacked around a bit because I found it to be an interesting problem and was able to make this work as you describe that you want in the comments:

class Speaker: NSObject {
let synth = AVSpeechSynthesizer()

private var timedPhrases: [(phrase: String, delay: TimeInterval)]
// This is so you don't potentially block the main queue
private let queue = DispatchQueue(label: "Phrase Queue")

override init() {
timed = []
super.init()
synth.delegate = self
}

init(_ timedPhrases: [(phrase: String, delay: TimeInterval)]) {
self.timedPhrases = timedPhrases
super.init()
synth.delegate = self
}

private func speak(_ string: String) {
let utterance = AVSpeechUtterance(string: string)
utterance.voice = AVSpeechSynthesisVoice(language: "en-GB")
utterance.rate = 0.5
synth.speak(utterance)
}

func speak() {
guard let first = timed.first else { return }
speak(first.value)
timed = Array(timed.dropFirst())
}
}

extension Speaker: AVSpeechSynthesizerDelegate {
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
if !timed.isEmpty {
queue.sync {
Thread.sleep(forTimeInterval: TimeInterval(timed.first!.delay))
self.speak()
}
} else {
print("all done")
}
}
}

let speaker = let speaker = Speaker([
(phrase: "1", delay: 0),
(phrase: "2", delay: 3),
(phrase: "3", delay: 1),
(phrase: "4", delay: 5),
(phrase: "5", delay: 10)
])

speaker.speak()

Take this with a huge grain of salt. I don't really consider using Thread.sleep to be a very good practice, but maybe this will give you some ideas on how to approach it. If you want variable timing, a Timer instance is not going to give you that.

Run TTS in NSArray sequentially

The best way to highlight the vocalized word is using the speechSynthesizer:willSpeakRangeOfSpeechString:utterance: method of the AVSpeechSynthesizerDelegate protocol.

If you don't use this delegate method, you won't be able to reach your goal.

Take a look at this complete and useful example (ObjC and Swift) that displays each vocalized word in a bold font with the speech synthesis.

How to text-to-speech a PHP table

Javascript will be easier, for example:

var msg = new SpeechSynthesisUtterance();
msg.text = "Hello World";
window.speechSynthesis.speak(msg);

How to make something say in python

first you need to install module named PYTTSX3.

do pip/pip3 install PYTTSX3

then code is here:

import PYTTSX3 as speaker
tts = speaker.init()

def say(text):
tts.say(text)
tts.runAndWait()

a = "Hello Boss I am your program"
say(a)

This will read aloud 'Hello Boss I am your program'
;)

How to change Azure text to speech silence timeout in JavaScript

Based on what you're describing, you'd need to set the segmentation silence timeout. Unfortunately, there is a bug in the JS SDK at the moment and the PropertyId.Speech_SegmentationSilenceTimeoutMs is not being set correctly.

As a workaround, you can instead set the segmentation timeout as follows:

const speechConfig = SpeechConfig.fromSubscription(subscriptionKey, subscriptionRegion);
speechConfig.speechRecognitionLanguage = "en-US";

const reco = new SpeechRecognizer(speechConfig);
const conn = Connection.fromRecognizer(reco);
conn.setMessageProperty("speech.context", "phraseDetection", {
"INTERACTIVE": {
"segmentation": {
"mode": "custom",
"segmentationSilenceTimeoutMs": 5000
}
},
mode: "Interactive"
});

reco.recognizeOnceAsync(
(result) =>
{
console.log("Recognition done!!!");
// do something with the recognition
},
(error) =>
{
console.log("Recognition failed. Error:" + error);
});

Please note that the allowed range for the segmentation timeout is 100-5000 ms (inclusive)

How to recognize and execute multiple commands from a phrase in python?

Your program ends at chrome because you are using an if-elif. Try with if...if



Related Topics



Leave a reply



Submit