Converting a Vision Vntextobservation to a String

Converting a Vision VNTextObservation to a String

Apple finally updated Vision to do OCR. Open a playground and dump a couple of test images in the Resources folder. In my case, I called them "demoDocument.jpg" and "demoLicensePlate.jpg".

The new class is called VNRecognizeTextRequest. Dump this in a playground and give it a whirl:

import Vision

enum DemoImage: String {
    case document = "demoDocument"
    case licensePlate = "demoLicensePlate"
}

class OCRReader {
    func performOCR(on url: URL?, recognitionLevel: VNRequestTextRecognitionLevel)  {
        guard let url = url else { return }
        let requestHandler = VNImageRequestHandler(url: url, options: [:])

        let request = VNRecognizeTextRequest  { (request, error) in
            if let error = error {
                print(error)
                return
            }

            guard let observations = request.results as? [VNRecognizedTextObservation] else { return }

            for currentObservation in observations {
                let topCandidate = currentObservation.topCandidates(1)
                if let recognizedText = topCandidate.first {
                    print(recognizedText.string)
                }
            }
        }
        request.recognitionLevel = recognitionLevel

        try? requestHandler.perform([request])
    }
}

func url(for image: DemoImage) -> URL? {
    return Bundle.main.url(forResource: image.rawValue, withExtension: "jpg")
}

let ocrReader = OCRReader()
ocrReader.performOCR(on: url(for: .document), recognitionLevel: .fast)

There's an in-depth discussion of this from WWDC19

How do I extract specific text from an image using a UITextField in Swift?

You should watch the latest WWDC on Vision framework. Basically, from iOS 13
the VNRecognizeTextRequest returns the text and also the bounding box of the text in the image.
The code can be something like this:

func startTextDetection() {
    let request = VNRecognizeTextRequest(completionHandler: self.detectTextHandler)
    request.recognitionLevel = .fast
    self.requests = [request]
}

private func detectTextHandler(request: VNRequest, error: Error?) {
    guard let observations = request.results as? [VNRecognizedTextObservation] else {
        fatalError("Received invalid observations")
    }
    for lineObservation in observations {
        guard let textLine = lineObservation.topCandidates(1).first else {
            continue
        }

        let words = textLine.string.split{ $0.isWhitespace }.map{ String($0)}
        for word in words {
            if let wordRange = textLine.string.range(of: word) {
                if let rect = try? textLine.boundingBox(for: wordRange)?.boundingBox {
                     // here you can check if word == textField.text
                     // rect is in image coordinate space, normalized with origin in the bottom left corner
                }
            }
        }
   }
}

Detect if face is within a circle

I have also done similar project for fun. Link here: https://github.com/sawin0/FaceDetection

For those who don't want to dive into repo.

I have quick suggestion for you, if you have path of circle and face as CGPath then you can compare circle's and face's bounding box using contains(_:using:transform:) .

Here is a code snippet

    let circleBox = circleCGPath.boundingBox
    let faceBox = faceRectanglePath.boundingBox
    
    if(circleBox.contains(faceBox)){
        print("face is inside the circle")
    }else{
        print("face is outside the circle")
    }

I hope this helps you and others too.

P.S. If there is any better way to do this then please feel free to share.

Convert Vision boundingBox from VNFaceObservation to rect to draw on image

You have to do the transition and scale according to the image.
Example

func drawVisionRequestResults(_ results: [VNFaceObservation]) {
    print("face count = \(results.count) ")
    previewView.removeMask()

    let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -self.view.frame.height)

    let translate = CGAffineTransform.identity.scaledBy(x: self.view.frame.width, y: self.view.frame.height)

    for face in results {
        // The coordinates are normalized to the dimensions of the processed image, with the origin at the image's lower-left corner.
        let facebounds = face.boundingBox.applying(translate).applying(transform)
        previewView.drawLayer(in: facebounds)
    }
}

Converting a Vision Vntextobservation to a String