Avspeechutterance - Swift - Initializing with a Phrase

AVSpeechUtterance - Swift - initializing with a phrase

The code is fine, speech string is set correctly. However issue is that AVSpeechUtterance is not working as expected on iOS 8 Beta. I suggest file a bug report here.

The code works fine on iOS 7.1 device and simulator.

How to know when an AVSpeechUtterance has finished, so as to continue app activity?

Delegate pattern is one of the most common used design pattern in object-oriented programming, it's not as hard as it seems. For your case, you can simply let your class (a game scene) to be a delegate of the CanSpeak class.

protocol CanSpeakDelegate {
   func speechDidFinish()
}

Next set AVSpeechSynthesizerDelegate to your CanSpeak class, declare CanSpeakDelegate and then use AVSpeechSynthesizerDelegate delegate function.

class CanSpeak: NSObject, AVSpeechSynthesizerDelegate {

   let voices = AVSpeechSynthesisVoice.speechVoices()
   let voiceSynth = AVSpeechSynthesizer()
   var voiceToUse: AVSpeechSynthesisVoice?

   var delegate: CanSpeakDelegate!

   override init(){
      voiceToUse = AVSpeechSynthesisVoice.speechVoices().filter({ $0.name == "Karen" }).first
      self.voiceSynth.delegate = self
   }

   func sayThis(_ phrase: String){
      let utterance = AVSpeechUtterance(string: phrase)
      utterance.voice = voiceToUse
      utterance.rate = 0.5
      voiceSynth.speak(utterance)
   }

   func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
      self.delegate.speechDidFinish()
   }
}

Lastly in your game scene class, simply conform to CanSpeakDelegate and set it as the delegate of your CanSpeak class.

class GameScene: NSObject, CanSpeakDelegate {

   let canSpeak = CanSpeak()

   override init() {
      self.canSpeak.delegate = self
   }

   // This function will be called every time a speech finishes
   func speechDidFinish() {
      // Do something
   }
}

Executing text-to-speech in order

Just speak an utterance, receive the delegate method, and in that method wait the desired interval and go on to the next utterance and interval.

Here's a complete example. It uses a Cocoa project, not SwiftUI, but you can easily adapt it.

import UIKit
import AVFoundation

func delay(_ delay:Double, closure:@escaping ()->()) {
    let when = DispatchTime.now() + delay
    DispatchQueue.main.asyncAfter(deadline: when, execute: closure)
}

class Speaker : NSObject, AVSpeechSynthesizerDelegate {
    var synth : AVSpeechSynthesizer!
    var sentences = [String]()
    var intervals = [Double]()
    func start(_ sentences: [String], _ intervals: [Double]) {
        self.sentences = sentences
        self.intervals = intervals
        self.synth = AVSpeechSynthesizer()
        synth.delegate = self
        self.sayOne()
    }
    func sayOne() {
        if let sentence = sentences.first {
            sentences.removeFirst()
            let utter = AVSpeechUtterance(string: sentence)
            self.synth.speak(utter)
        }
    }
    func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
        if let interval = intervals.first {
            intervals.removeFirst()
            delay(interval) {
                self.sayOne()
            }
        }
    }
}

class ViewController: UIViewController {
    let speaker = Speaker()
    override func viewDidLoad() {
        super.viewDidLoad()
        let sentences = [
            "I will speak again in one second",
            "I will speak again in five seconds",
            "I will speak again in one second",
            "Done"]
        let intervals = [1.0, 5.0, 1.0]
        self.speaker.start(sentences, intervals)
    }
}

AVspeechSynthesizer iOS text speech

I threw together an AVSpeechSynthesizer class to handle flipping from one language to another. Here's a AVSpeechSynthesizer tutorial on NSHipster that's a good starting point for learning about this. I haven't fiddled with translation, but you can figure that part out...I also created a basic translator class that'll translate "hello" to "مرحبا". You can see the project here:

TranslateDemo

To use the translator, you'd probably want to tie an action to a button like so:

@IBAction func translateToArabicAction(_ sender: UIButton) {
    // check that there are characters entered in the textField
    if (textToTranslateTextField.text?.characters.count)! > 0 {
        let translatedText = translator.translate(word: (textToTranslateTextField.text?.lowercased())!)
        speechSynthesizer.speak(translatedText, in: Language.arabic.rawValue)
    }
}

@IBAction func translateToEnglishAction(_ sender: UIButton) {
    // check that there are characters entered in the textField
    if (textToTranslateTextField.text?.characters.count)! > 0 {
        let translatedText = translator.translate(word: (textToTranslateTextField.text?.lowercased())!)
        speechSynthesizer.speak(translatedText, in: Language.english.rawValue)
    }
}

The speech synthesizer looks like this:

import AVFoundation

// You can use an enum so you don't have to manually type out character strings. Look them up once and stick them in an enum. From there, you set the language with your enum rather than typing out the string.
enum Language: String {
    case english = "en-US"
    case arabic = "ar-SA"
}

class Speaker: NSObject {

    let synth = AVSpeechSynthesizer()

    override init() {
        super.init()
        synth.delegate = self
    }

    func speak(_ announcement: String, in language: String) {
        print("speak announcement in language \(language) called")
        prepareAudioSession()
        let utterance = AVSpeechUtterance(string: announcement.lowercased())
        utterance.voice = AVSpeechSynthesisVoice(language: language)
        synth.speak(utterance)
    }

    private func prepareAudioSession() {
        do {
            try AVAudioSession.sharedInstance().setCategory(AVAudioSessionCategoryAmbient, with: .mixWithOthers)
        } catch {
            print(error)
        }

        do {
            try AVAudioSession.sharedInstance().setActive(true)
        } catch {
            print(error)
        }
    }

    func stop() {
        if synth.isSpeaking {
            synth.stopSpeaking(at: .immediate)
        }
    }
}

extension Speaker: AVSpeechSynthesizerDelegate {
    func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didStart utterance: AVSpeechUtterance) {
        print("Speaker class started")
    }

    func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
        print("Speaker class finished")
    }
}

IOS/Swift/AVSpeechSynthesizer: Control Speed of Enqueued Utterances

Is there anyway to speed up the queue so that the utterances are spoken one after the other with no delay?

As you did, the only way is to set the post and pre UtteranceDelay properties to 0 which is the default value by the way.
Sample Image

As recommended here, I implemented the code snippet hereafter (Xcode 10, Swift 5.0 and iOS 12.3.1) to check the impact of different UtteranceDelay values ⟹ 0 is the best solution to improve the speed of enqueued utterances.

var synthesizer = AVSpeechSynthesizer()
var playQueue = [AVSpeechUtterance]()


override func viewDidAppear(_ animated: Bool) {
    super.viewDidAppear(animated)

    for i in 1...10 {

        let stringNb = "Sentence number " + String(i) + " of the speech synthesizer."

        let utterance = AVSpeechUtterance(string: stringNb)
        utterance.rate = AVSpeechUtteranceDefaultSpeechRate
        utterance.pitchMultiplier = 1.0
        utterance.volume = 1.0

        utterance.postUtteranceDelay = 0.0
        utterance.preUtteranceDelay = 0.0

        playQueue.append(utterance)
    }

    synthesizer.delegate = self

    for utterance in playQueue {
        synthesizer.speak(utterance)
    }
}

If a delay is too important with the '0' value in your code, the incoming string is maybe the problem? (adapt the code snippet above to your needs)

AVSpeechSynthesizer detect when the speech is finished

A does not conform to protocol NSObjectProtocol means that your class must inherit from NSObject, you can read more about it here.

Now I don't know how you've structured your code, but this little example seems to work for me. First a dead simple class that holds the AVSpeechSynthesizer:

class Speaker: NSObject {
    let synth = AVSpeechSynthesizer()

    override init() {
        super.init()
        synth.delegate = self
    }

    func speak(_ string: String) {
        let utterance = AVSpeechUtterance(string: string)
        synth.speakUtterance(utterance)
    }
}

Notice that I set the delegate here (in the init method) and notice that it must inherit from NSObject to keep the compiler happy (very important!)

And then the actual delegate method:

extension Speaker: AVSpeechSynthesizerDelegate {
    func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
        print("all done")
    }
}

And finally, I can use that class here, like so:

class ViewController: UIViewController {
    let speaker = Speaker()

    @IBAction func buttonTapped(sender: UIButton) {
        speaker.speak("Hello world")
    }
}

Which rewards me with

all done

in my console when the AVSpeechSynthesizer has stopped speaking.

Hope that helps you.

Update

So, time passes and in the comments below @case-silva asked if there was a practical example and @dima-gershman suggested to just use the AVSpeectSynthesizer directly in the ViewController.

To accommodate both, I've made a simple ViewController example here with a UITextField and a UIButton.

The flow is:

You enter some text in the textfield (if not, a default value will be set)
You press the button
The button is disabled and the background color is changed (sorry, it was the best I could come up with :))
Once speech is done, the button is enabled, the textfield is cleared and the background color is changed again.

Here's how it looks

A Simple `UIViewController` Example

import UIKit
import AVFoundation

class ViewController: UIViewController {

    //MARK: Outlets
    @IBOutlet weak var textField: UITextField!
    @IBOutlet weak var speakButton: UIButton!

    let synth = AVSpeechSynthesizer()

    override func viewDidLoad() {
        super.viewDidLoad()
        synth.delegate = self
    }

    @IBAction func speakButtonTapped(_ sender: UIButton) {
        //We're ready to start speaking, disable UI while we're speaking
        view.backgroundColor = .darkGray
        speakButton.isEnabled = false
        let inputText = textField.text ?? ""
        let textToSpeak = inputText.isEmpty ? "Please enter some text" : inputText

        let speakUtterance = AVSpeechUtterance(string: textToSpeak)
        synth.speak(speakUtterance)
    }
}

extension ViewController: AVSpeechSynthesizerDelegate {
    func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
        //Speaking is done, enable speech UI for next round
        speakButton.isEnabled = true
        view.backgroundColor = .lightGray
        textField.text = ""
    }
}

Hope that gives you a clue Case.

Avspeechutterance - Swift - Initializing with a Phrase