Executing Text-To-Speech in Order

Executing text-to-speech in order

Just speak an utterance, receive the delegate method, and in that method wait the desired interval and go on to the next utterance and interval.

Here's a complete example. It uses a Cocoa project, not SwiftUI, but you can easily adapt it.

import UIKit
import AVFoundation

func delay(_ delay:Double, closure:@escaping ()->()) {
    let when = DispatchTime.now() + delay
    DispatchQueue.main.asyncAfter(deadline: when, execute: closure)
}

class Speaker : NSObject, AVSpeechSynthesizerDelegate {
    var synth : AVSpeechSynthesizer!
    var sentences = [String]()
    var intervals = [Double]()
    func start(_ sentences: [String], _ intervals: [Double]) {
        self.sentences = sentences
        self.intervals = intervals
        self.synth = AVSpeechSynthesizer()
        synth.delegate = self
        self.sayOne()
    }
    func sayOne() {
        if let sentence = sentences.first {
            sentences.removeFirst()
            let utter = AVSpeechUtterance(string: sentence)
            self.synth.speak(utter)
        }
    }
    func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
        if let interval = intervals.first {
            intervals.removeFirst()
            delay(interval) {
                self.sayOne()
            }
        }
    }
}

class ViewController: UIViewController {
    let speaker = Speaker()
    override func viewDidLoad() {
        super.viewDidLoad()
        let sentences = [
            "I will speak again in one second",
            "I will speak again in five seconds",
            "I will speak again in one second",
            "Done"]
        let intervals = [1.0, 5.0, 1.0]
        self.speaker.start(sentences, intervals)
    }
}

Controlling the loop execution

You could always use Combine for this

import Combine

let speaker = Speaker()
let capitals = ["Canberra is the capital of Australia", "Seoul is the capital of South Korea", "Tokyo is the capital of Japan", "Berlin is the capital of Germany"]
var playerCancellable: AnyCancellable? = nil

 Button("Play Sound") {
     playSound()
 }    

func playSound() {
     // Fairly standard timer publisher. The call to .autoconnect() tells the timer to start publishing when subscribed to.
     let timer = Timer.publish(every: 20, on: .main, in: .default)
         .autoconnect()
    
    // Publishers.Zip takes two publishers. 
    // It will only publish when there is a "symmetrical" output. It behaves in a similar manner as `zip` on sequences.
    // So, in this scenario, you will not get the next element of your array until the timer emits another event.
    // In the call to sink, we ignore the first element of the tuple relating to the timer
    playerCancellable = Publishers.Zip(timer, capitals.publisher)
         .sink { _, item in
             speaker.speak(item)
         }
 }

Edit

You mentioned in the comments that you want to be able to variably control the delay between utterances. That's not really something a Timer can be used for. I hacked around a bit because I found it to be an interesting problem and was able to make this work as you describe that you want in the comments:

class Speaker: NSObject {
  let synth = AVSpeechSynthesizer()

  private var timedPhrases: [(phrase: String, delay: TimeInterval)]
  // This is so you don't potentially block the main queue
  private let queue = DispatchQueue(label: "Phrase Queue")
  
  override init() {
    timed = []
    super.init()
    synth.delegate = self
  }
  
  init(_ timedPhrases: [(phrase: String, delay: TimeInterval)]) {
    self.timedPhrases = timedPhrases
    super.init()
    synth.delegate = self
  }
  
  private func speak(_ string: String) {
    let utterance = AVSpeechUtterance(string: string)
    utterance.voice = AVSpeechSynthesisVoice(language: "en-GB")
    utterance.rate = 0.5
    synth.speak(utterance)
  }
  
  func speak() {
    guard let first = timed.first else { return }
    speak(first.value)
    timed = Array(timed.dropFirst())
  }
}

extension Speaker: AVSpeechSynthesizerDelegate {
  func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
    if !timed.isEmpty {
      queue.sync {
        Thread.sleep(forTimeInterval: TimeInterval(timed.first!.delay))
        self.speak()
      }
    } else {
      print("all done")
    }
  }
}

let speaker = let speaker = Speaker([
    (phrase: "1", delay: 0),
    (phrase: "2", delay: 3),
    (phrase: "3", delay: 1),
    (phrase: "4", delay: 5),
    (phrase: "5", delay: 10)
])

speaker.speak()

Take this with a huge grain of salt. I don't really consider using Thread.sleep to be a very good practice, but maybe this will give you some ideas on how to approach it. If you want variable timing, a Timer instance is not going to give you that.

Run TTS in NSArray sequentially

The best way to highlight the vocalized word is using the speechSynthesizer:willSpeakRangeOfSpeechString:utterance: method of the AVSpeechSynthesizerDelegate protocol.

If you don't use this delegate method, you won't be able to reach your goal.

Take a look at this complete and useful example (ObjC and Swift) that displays each vocalized word in a bold font with the speech synthesis.

How to text-to-speech a PHP table

Javascript will be easier, for example:

var msg = new SpeechSynthesisUtterance();
msg.text = "Hello World";
window.speechSynthesis.speak(msg);

How to make something say in python

first you need to install module named PYTTSX3.

do pip/pip3 install PYTTSX3

then code is here:

import PYTTSX3 as speaker
tts = speaker.init()

def say(text):
   tts.say(text)
   tts.runAndWait()

a = "Hello Boss I am your program"
say(a)

This will read aloud 'Hello Boss I am your program'
;)

How to change Azure text to speech silence timeout in JavaScript

Based on what you're describing, you'd need to set the segmentation silence timeout. Unfortunately, there is a bug in the JS SDK at the moment and the PropertyId.Speech_SegmentationSilenceTimeoutMs is not being set correctly.

As a workaround, you can instead set the segmentation timeout as follows:

const speechConfig = SpeechConfig.fromSubscription(subscriptionKey, subscriptionRegion);
speechConfig.speechRecognitionLanguage = "en-US";

const reco = new SpeechRecognizer(speechConfig);
const conn = Connection.fromRecognizer(reco);
conn.setMessageProperty("speech.context", "phraseDetection", {
    "INTERACTIVE": {
        "segmentation": {
            "mode": "custom",
            "segmentationSilenceTimeoutMs": 5000
        }
    },
    mode: "Interactive"
});

reco.recognizeOnceAsync(
    (result) =>
    {
        console.log("Recognition done!!!");
        // do something with the recognition
    },
    (error) =>
    {
        console.log("Recognition failed. Error:" + error);
    });

Please note that the allowed range for the segmentation timeout is 100-5000 ms (inclusive)

How to recognize and execute multiple commands from a phrase in python?

Your program ends at chrome because you are using an if-elif. Try with if...if

Executing Text-To-Speech in Order