Arkit - Viewport Size VS Real Screen Resolution

ARKit – Viewport Size vs Real Screen Resolution

Why there is a difference?

Let's explore some important display characteristics of your iPhone 7:

a resolution of 750 (W) x 1,334 (H) pixels (16 : 9)
viewport rez of 375 (W) x 667 (H) pixels (16 : 9)

Because mobile devices with the same screen size can have very different resolutions, developers often use viewports when they are creating 3D scenes or mobile friendly webpages. In VR and AR fields: the lower resolution is – the quicker a renderer is, and CPU/GPU burden is considerably less. The idea of creating viewports is mainly used for mobile devices. In macOS Screen Resolution and Viewport Resolution are identical.

Sample Image

In iPhone, as well as in other mobile devices, Viewport is a scaled down version (usually 2 or 3 times smaller in each axis) of resolution that allows 3D scenes viewports or websites to be viewed more consistently across different devices and (very important!) with less energy's consumption. Viewports are often more standardized and smaller than resolution sizes.

Snapshots almost always reflect a real screen resolutions:

let screenSize = sceneView.snapshot().size

/*   750 x 1,334    */
/*   iPhone 7 rez   */

SceneView size often reflects a standardized screen resolution (4 times smaller than specs rez):

let viewportSize = sceneView.bounds.size 

/*   375 x 667     */
/*   ViewPort rez  */

Viewport Rez (1/4) to Screen Rez aspect ratio in iPhone 7:

Schematic depiction!

Sample Image

Viewport size and its real layout in mobile device:

Real depiction!

Sample Image

Additional reference: Phone X has a ViewPort resolution nine times smaller (375 x 812) than screen resolution (1125 x 2436).

What coordinates are used in Hit-Testing?

In Hit-Testing and Ray-Casting coordinates of ViewPort are used.

Let's make 3 taps using hit-testing method – first tap in a Upper Left corner (near x=0 and y=0), second tap in center of the screen and third tap in a Lower Right Corner (near x=667 and y=375):

let point: CGPoint = gestureRecognize.location(in: sceneView)

print(point)

Sample Image

Coordinates of iPhone 7 Viewport is printed in a console:

Sample Image

Quod Erat Demonstrandum!

Camera Intrinsics Resolution vs Real Screen Resolution

Intrinsics 3x3 matrix

Intrinsics camera matrix converts between the 2D camera plane and 3D world coordinate space. Here's a decomposition of an intrinsic matrix, where:

Sample Image

fx and fy is a Focal Length in pixels
xO and yO is a Principal Point Offset in pixels
s is an Axis Skew

According to Apple Documentation:

The values fx and fy are the pixel focal length, and are identical for square pixels. The values ox and oy are the offsets of the principal point from the top-left corner of the image frame. All values are expressed in pixels.

So you let's examine what your data is:

[1569,     0,    931]
[   0,  1569,    723]
[   0,     0,      1]

fx=1569, fy=1569
xO=931, yO=723
s=0

To convert a known focal length in pixels to mm use the following expression:

F(mm) = F(pixels) * SensorWidth(mm) / ImageWidth(pixels)

Points Resolution vs Pixels Resolution

Look at this post to find out what a Point Rez and what a Pixel Rez are.

Let's explore what is what when using iPhoneX data.

@IBOutlet var arView: ARSCNView!

DispatchQueue.main.asyncAfter(deadline: .now() + 1.0) {
        
    let imageRez = (self.arView.session.currentFrame?.camera.imageResolution)!
    let intrinsics = (self.arView.session.currentFrame?.camera.intrinsics)!
    let viewportSize = self.arView.frame.size
    let screenSize = self.arView.snapshot().size
                    
    print(imageRez as Any)
    print(intrinsics as Any)
    print(viewportSize as Any)
    print(screenSize as Any)
}

Apple Documentation:

imageResolution instance property describes the image in the capturedImage buffer, which contains image data in the camera device's native sensor orientation. To convert image coordinates to match a specific display orientation of that image, use the viewMatrix(for:) or projectPoint(_:orientation:viewportSize:) method.

iPhone X imageRez (aspect ratio is 4:3).

These aspect ratio values correspond to camera sensor values:

(1920.0, 1440.0)

iPhone X intrinsics:

simd_float3x3([[1665.0, 0.0, 0.0],     // first column
               [0.0, 1665.0, 0.0],     // second column
               [963.8, 718.3, 1.0]])   // third column

iPhone X viewportSize (ninth part of screenSize):

(375.0, 812.0)

iPhone X screenSize (resolution declared in tech spec):

(1125.0, 2436.0)

Pay attention, there's no snapshot() method for RealityKit's ARView.

ARKit on different iPhones

Of course different iPhone models present different resolutions. There's a big difference between iPhone's screen size and viewport size. Look at this table. In some cases viewport size is 1/9 of screen size, sometimes – 1/4. Though, some models have identical screen size and viewport size.

Device	Screen Size	Viewport Size
iPhone 12 Pro Max	1284 x 2778	428 x 926
iPhone X	1125 x 2436	375 x 812
iPhone SE 2	750 x 1334	375 x 667
iPhone 8 Plus	1080 x 1920	414 x 736
iPhone 6s	750 x 1334	375 x 667

What is the real Focal Length of the camera used in RealityKit?

ARKit and RealityKit do definitely have identical values of focal length parameter. That's because these two frameworks are supposed to work together. And although there's no focal length instance property for ARView at the moment, you can easily print in Console a focal length for ARSCNView or SCNView.

@IBOutlet var sceneView: ARSCNView!

sceneView.pointOfView?.camera?.focalLength

However, take into account that ARKit, RealityKit and SceneKit frameworks don't use a screen resolution, they rather use a viewport size. A magnification factor for iPhones' viewports is usually 1/2 or 1/3.

Intrinsic Camera Matrix

Sample Image

As you said in ARKit there's a 3x3 camera matrix allowing you convert between the 2D camera plane and 3D world coordinate space.

var intrinsics: simd_float3x3 { get }

Using this matrix you can print 4 important parameters: fx, fy, ox and oy. Let's print them all:

DispatchQueue.main.asyncAfter(deadline: .now() + 2.0) {
                    
    print(" Focal Length: \(self.sceneView.pointOfView?.camera?.focalLength)")
    print("Sensor Height: \(self.sceneView.pointOfView?.camera?.sensorHeight)")
    // SENSOR HEIGHT IN mm
                    
    let frame = self.sceneView.session.currentFrame

    // INTRINSICS MATRIX
    print("Intrinsics fx: \(frame?.camera.intrinsics.columns.0.x)")
    print("Intrinsics fy: \(frame?.camera.intrinsics.columns.1.y)")
    print("Intrinsics ox: \(frame?.camera.intrinsics.columns.2.x)")
    print("Intrinsics oy: \(frame?.camera.intrinsics.columns.2.y)")
}

For iPhone X the following values are printed:

Sample Image

When you apply your formulas you'll get a implausible result (read on to find out why).

About Wide-Angle Lens and OIS

The iPhone X has two rear camera sensors, and both those modules are equipped with an optical image stabilizer (OIS). The wide-angle lens offers a 28-millimeter focal length and an aperture of f/1.8, while the telephoto lens is 56 millimeters and f/2.4.

ARKit and RealityKit use a wide-angle lens rear module. In iPhone X case it's a 28-mm lens. But what about printed value focal length = 20.78 mm, huh? I believe that the discrepancy between the value of 28 mm and 20.78 mm is due to the fact that video stabilization eats up about 25% of the total image area. This is done in order to eventually get a focal length's value of 28 mm for final image.

Sample Image

Red frame is a cropping margin at stabilisation stage.

Conclusion

This is my own conclusion. I didn't find any reference materials on that subject, so do not judge me strictly if my opinion is wrong (I admit it may be).

We all know a fact that camera shake is magnified with an increase in focal length. So, the lower value of focal length is, the less camera shake is. It's very important for non-jittering high-quality world tracking in AR app. Also, I firmly believe that Optical Image Stabilisers work much better with lower values of focal length. Hence, it's not a surprise that ARKit engineers have chosen a lower value of focal length for AR experience (capturing a wider image area), and then after stabilization, we get a modified version of the image, like it has focal length = 28 mm.

So, in my humble opinion, it makes no sense to calculate a REAL focal length for RealityKit and ARKit 'cause there is a "FAKE" focal length already implemented by Apple engineers for a robust AR experience.

Project Point method: converting rawFeaturePoint to Screen Space

Theory

func projectPoint(_ point: simd_float3, 
              orientation: UIInterfaceOrientation, 
             viewportSize: CGSize) -> CGPoint

Xcode tip says – instance method projectPoint(...) returns the projection of the specified point into a 2D pixel coordinate space whose origin is in the upper left corner and whose size matches that of the viewportSize parameter.

The difference between Screen Size and Viewport size is described Here and Here (I see you said you know about that).

Solution

The trick is that the 2D point is projected correctly only when the 3D point is inside the frustum's coverage area of the camera – it's not a secret that the distance is calculated according to the Pythagorean theorem...

import ARKit
 
extension ViewController: ARSessionDelegate {
    
    func session(_ session: ARSession, didUpdate frame: ARFrame) {
        
        let point = simd_float3(0.3, 0.5,-2.0)
        
        if self.sceneView.isNode(self.sphere,
                             insideFrustumOf: self.sceneView.pointOfView!) {
            
            let pp = frame.camera.projectPoint(point,
                                  orientation: .portrait,
                                 viewportSize: CGSize(width: 375, height: 812))

            self.label_A.text = String(format: "%.2f", pp.x / 375)
            self.label_B.text = String(format: "%.2f", pp.y / 812)
        }
    }
}

As you can see, outputting values in normalized coordinates (0.00 ... 1.00) is very simple:

class ViewController: UIViewController {

    @IBOutlet var sceneView: ARSCNView!
    @IBOutlet var label_A: UILabel!
    @IBOutlet var label_B: UILabel!
    let sphere = SCNNode(geometry: SCNSphere(radius: 0.1))
    
    override func viewDidLoad() {
        super.viewDidLoad()

        sceneView.session.delegate = self
        sceneView.scene = SCNScene()
        
        sphere.geometry?.firstMaterial?.diffuse.contents = UIColor.green
        sphere.position = SCNVector3(0.3, 0.5,-2.0)
        sceneView.scene.rootNode.addChildNode(sphere)
        
        let config = ARWorldTrackingConfiguration()
        sceneView.session.run(config)
    }
}

I used iPhone X parameters – vertical viewportSize is 375 x 812.

Transforming ARFrame#capturedImage to view size

This turned out to be quite complicated because displayTransform(for:viewportSize) expects normalized image coordinates, it seems you have to flip the coordinates only in portrait mode and the image needs to be not only transformed but also cropped. The following code does the trick for me. Suggestions how to improve this would be appreciated.

guard let frame = session.currentFrame else { return }
let imageBuffer = frame.capturedImage

let imageSize = CGSize(width: CVPixelBufferGetWidth(imageBuffer), height: CVPixelBufferGetHeight(imageBuffer))
let viewPort = sceneView.bounds
let viewPortSize = sceneView.bounds.size

let interfaceOrientation : UIInterfaceOrientation
if #available(iOS 13.0, *) {
    interfaceOrientation = self.sceneView.window!.windowScene!.interfaceOrientation
} else {
    interfaceOrientation = UIApplication.shared.statusBarOrientation
}

let image = CIImage(cvImageBuffer: imageBuffer)

// The camera image doesn't match the view rotation and aspect ratio
// Transform the image:

// 1) Convert to "normalized image coordinates"
let normalizeTransform = CGAffineTransform(scaleX: 1.0/imageSize.width, y: 1.0/imageSize.height)

// 2) Flip the Y axis (for some mysterious reason this is only necessary in portrait mode)
let flipTransform = (interfaceOrientation.isPortrait) ? CGAffineTransform(scaleX: -1, y: -1).translatedBy(x: -1, y: -1) : .identity

// 3) Apply the transformation provided by ARFrame
// This transformation converts:
// - From Normalized image coordinates (Normalized image coordinates range from (0,0) in the upper left corner of the image to (1,1) in the lower right corner)
// - To view coordinates ("a coordinate space appropriate for rendering the camera image onscreen")
// See also: https://developer.apple.com/documentation/arkit/arframe/2923543-displaytransform

let displayTransform = frame.displayTransform(for: interfaceOrientation, viewportSize: viewPortSize)

// 4) Convert to view size
let toViewPortTransform = CGAffineTransform(scaleX: viewPortSize.width, y: viewPortSize.height)

// Transform the image and crop it to the viewport
let transformedImage = image.transformed(by: normalizeTransform.concatenating(flipTransform).concatenating(displayTransform).concatenating(toViewPortTransform)).cropped(to: viewPort)

Arkit - Viewport Size VS Real Screen Resolution