Opengl - Mouse Coordinates to Space Coordinates

OpenGL - Mouse coordinates to Space coordinates

In a rendering, each mesh of the scene usually is transformed by the model matrix, the view matrix and the projection matrix.

Projection matrix:

The projection matrix describes the mapping from 3D points of a scene, to 2D points of the viewport. The projection matrix transforms from view space to the clip space, and the coordinates in the clip space are transformed to the normalized device coordinates (NDC) in the range (-1, -1, -1) to (1, 1, 1) by dividing with the w component of the clip coordinates.
View matrix:

The view matrix describes the direction and position from which the scene is looked at. The view matrix transforms from the world space to the view (eye) space. In the coordinate system on the viewport, the X-axis points to the left, the Y-axis up and the Z-axis out of the view (Note in a right hand system the Z-Axis is the cross product of the X-Axis and the Y-Axis).
Model matrix:

The model matrix defines the location, orientation and the relative size of an mesh in the scene. The model matrix transforms the vertex positions from of the mesh to the world space.

The model matrix looks like this:

( X-axis.x, X-axis.y, X-axis.z, 0 )
( Y-axis.x, Y-axis.y, Y-axis.z, 0 )
( Z-axis.x, Z-axis.y, Z-axis.z, 0 )
( trans.x,  trans.y,  trans.z,  1 )

View

On the viewport the X-axis points to the left, the Y-axis up and the Z-axis out of the view (Note in a right hand system the Z-Axis is the cross product of the X-Axis and the Y-Axis).

view coordinates

The code below defines a matrix that exactly encapsulates the steps necessary to calculate a look at the scene:

Converting model coordinates into viewport coordinates.
Rotation, to look in the direction of the view.
Movement to the eye position

The following code does the same as gluLookAt or glm::lookAt does:

using TVec3  = std::array< float, 3 >;
using TVec4  = std::array< float, 4 >;
using TMat44 = std::array< TVec4, 4 >;

TVec3 Cross( TVec3 a, TVec3 b ) { return { a[1] * b[2] - a[2] * b[1], a[2] * b[0] - a[0] * b[2], a[0] * b[1] - a[1] * b[0] }; }
float Dot( TVec3 a, TVec3 b ) { return a[0]*b[0] + a[1]*b[1] + a[2]*b[2]; }
void Normalize( TVec3 & v )
{
    float len = sqrt( v[0] * v[0] + v[1] * v[1] + v[2] * v[2] );
    v[0] /= len; v[1] /= len; v[2] /= len;
}

TMat44 Camera::LookAt( const TVec3 &pos, const TVec3 &target, const TVec3 &up )
{ 
    TVec3 mz = { pos[0] - target[0], pos[1] - target[1], pos[2] - target[2] };
    Normalize( mz );
    TVec3 my = { up[0], up[1], up[2] };
    TVec3 mx = Cross( my, mz );
    Normalize( mx );
    my = Cross( mz, mx );

    TMat44 v{
        TVec4{ mx[0], my[0], mz[0], 0.0f },
        TVec4{ mx[1], my[1], mz[1], 0.0f },
        TVec4{ mx[2], my[2], mz[2], 0.0f },
        TVec4{ Dot(mx, pos), Dot(my, pos), -Dot(mz, pos), 1.0f }
    };

    return v;
}

Projection

The projection matrix describes the mapping from 3D points of a scene, to 2D points of the viewport. It transforms from eye space to the clip space, and the coordinates in the clip space are transformed to the normalized device coordinates (NDC) by dividing with the w component of the clip coordinates. The NDC are in range (-1,-1,-1) to (1,1,1).
Every geometry which is out of the NDC is clipped.

The objects between the near plane and the far plane of the camera frustum are mapped to the range (-1, 1) of the NDC.

Orthographic Projection

At Orthographic Projection the coordinates in the eye space are linearly mapped to normalized device coordinates.

Orthographic Projection

Orthographic Projection Matrix:

r = right, l = left, b = bottom, t = top, n = near, f = far 

2/(r-l)         0               0               0
0               2/(t-b)         0               0
0               0               -2/(f-n)        0
-(r+l)/(r-l)    -(t+b)/(t-b)    -(f+n)/(f-n)    1

Perspective Projection

At Perspective Projection the projection matrix describes the mapping from 3D points in the world as they are seen from of a pinhole camera, to 2D points of the viewport.
The eye space coordinates in the camera frustum (a truncated pyramid) are mapped to a cube (the normalized device coordinates).

Perspective Projection

Perspective Projection Matrix:

r = right, l = left, b = bottom, t = top, n = near, f = far

2*n/(r-l)      0              0                0
0              2*n/(t-b)      0                0
(r+l)/(r-l)    (t+b)/(t-b)    -(f+n)/(f-n)    -1    
0              0              -2*f*n/(f-n)     0

where :

a = w / h
ta = tan( fov_y / 2 );

2 * n / (r-l) = 1 / (ta * a)
2 * n / (t-b) = 1 / ta

If the projection is symmetric, where the line of sight is in the center of the view port and the field of view is not displaced, then the matrix can be simplified:

1/(ta*a)  0     0              0
0         1/ta  0              0
0         0    -(f+n)/(f-n)   -1    
0         0    -2*f*n/(f-n)    0

The following function will calculate the same projection matrix as gluPerspective does:

#include <array>

const float cPI = 3.14159265f;
float ToRad( float deg ) { return deg * cPI / 180.0f; }

using TVec4  = std::array< float, 4 >;
using TMat44 = std::array< TVec4, 4 >;

TMat44 Perspective( float fov_y, float aspect )
{
    float fn = far + near
    float f_n = far - near;
    float r = aspect;
    float t = 1.0f / tan( ToRad( fov_y ) / 2.0f );

    return TMat44{ 
        TVec4{ t / r, 0.0f,  0.0f,                 0.0f },
        TVec4{ 0.0f,  t,     0.0f,                 0.0f },
        TVec4{ 0.0f,  0.0f, -fn / f_n,            -1.0f },
        TVec4{ 0.0f,  0.0f, -2.0f*far*near / f_n,  0.0f }
    };
}

3 Solutions to recover view space position in perspective projection

With field of view and aspect

Since the projection matrix is defined by the field of view and the aspect ratio it is possible to recover the viewport position with the field of view and the aspect ratio. Provided that it is a symmetrical perspective projection and the normalized device coordinates, the depth and the near and far plane are known.

Recover the Z distance in view space:

z_ndc = 2.0 * depth - 1.0;
z_eye = 2.0 * n * f / (f + n - z_ndc * (f - n));

Recover the view space position by the XY normalized device coordinates:

ndc_x, ndc_y = xy normalized device coordinates in range from (-1, -1) to (1, 1):

viewPos.x = z_eye * ndc_x * aspect * tanFov;
viewPos.y = z_eye * ndc_y * tanFov;
viewPos.z = -z_eye;

2. With the projection matrix

The projection parameters, defined by the field of view and the aspect ratio are stored in the projection matrix. Therefore the viewport position can be recovered by the values from the projection matrix, from a symmetrical perspective projection.

Note the relation between projection matrix, field of view and aspect ratio:

prjMat[0][0] = 2*n/(r-l) = 1.0 / (tanFov * aspect);
prjMat[1][1] = 2*n/(t-b) = 1.0 / tanFov;

prjMat[2][2] = -(f+n)/(f-n)
prjMat[2][2] = -2*f*n/(f-n)

Recover the Z distance in view space:

A     = prj_mat[2][2];
B     = prj_mat[3][2];
z_ndc = 2.0 * depth - 1.0;
z_eye = B / (A + z_ndc);

Recover the view space position by the XY normalized device coordinates:

viewPos.x = z_eye * ndc_x / prjMat[0][0];
viewPos.y = z_eye * ndc_y / prjMat[1][1];
viewPos.z = -z_eye;

3. With the inverse projection matrix

Of course the viewport position can be recovered by the inverse projection matrix.

mat4 inversePrjMat = inverse( prjMat );
vec4 viewPosH      = inversePrjMat * vec4(ndc_x, ndc_y, 2.0*depth - 1.0, 1.0)
vec3 viewPos       = viewPos.xyz / viewPos.w;

See further:

How to render depth linearly in modern OpenGL with gl_FragCoord.z in fragment shader?
Transform the modelMatrix
Perspective projection and view matrix: Both depth buffer and triangle face orientation are reversed in OpenGL
How to compute the size of the rectangle that is visible to the camera at a given coordinate?
How to recover view space position given view space depth value and ndc xy
Is it possble get which surface of cube will be click in OpenGL?

Converting mouse coordinates to OpenGL is erroneously offset

At Perspective Projection the projection matrix describes the mapping from 3D points in the world as they are seen from of a pinhole camera, to 2D points of the viewport.

The viewing volume is a Frustum (a truncated pyramid), where the top of the pyramid is the viewer's position.

If you start a ray from the camera position, then all the points on the ray have the same xy window coordinate (and xy normalized device coordinate), the points have just a different "depth" (z coordinate). The projection of the view ray onto the viewport is a point.

This means your assumption is wrong, the direction "into the screen" is not (0, 0, -1):

// Create directional vector pointing into the screen.
glm::vec3 into_screen = glm::vec3(0, 0, -1);

Note, that would be correct for Orthographic Projection, but it is wrong for Perspective Projection.

The direction depends on the window coordinate. Luckily it does not depend on the depth and the window coordinate is givens by the mouse position.

To find the ray which "hits" the current mouse position you've to compute 2 points on the ray.

Find the intersection of the ray with the near plane (depth = 0.0) and the far plane (depth). Since all
= 1.0)

This means the window coordinates of 2 points on a ray, that starts at the camera position and goes through the mouse cursor are:

point 1: (mouseX, height-mouseY, 0.0)
point 2: (mouseX, height-mouseY, 1.0)

Adapt GetOGLPos:

glm::vec3 GetOGLPos(int x, int y, float depth)
{
    GLint viewport[4];
    GLdouble modelview[16];
    GLdouble projection[16];
    GLfloat winX, winY, winZ;
    GLdouble posX, posY, posZ;

    glGetDoublev(GL_MODELVIEW_MATRIX, modelview);
    glGetDoublev(GL_PROJECTION_MATRIX, projection);
    glGetIntegerv(GL_VIEWPORT, viewport);

    winX = (float)x;
    winY = (float)viewport[3] - (float)y;
    winZ = depth;
    //glReadPixels(x, int(winY), 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, &winZ);

    gluUnProject(winX, winY, winZ, modelview, projection, viewport, &posX, &posY, &posZ);

    return glm::vec3(posX, posY, posZ);
}

Define the ray

void passiveMotion(int x, int y)
{
    glm::vec3 clickPositionNear = GetOGLPos(x, y, 0.0);
    glm::vec3 clickPositionFar = GetOGLPos(x, y, 1.0);

    glm::vec3 into_screen = glm::normalize(clickPositionFar - clickPositionNear);

    ray r = ray(
        clickPositionNear,
        into_screen
    );

    // [...]
}

Sample Image

Computing Mouse Position to 3d Space - OpenGL

The computation of normalised_x and normalised_y is wrong. Normalized device coordinates are in range [-1.0, 1.0]:

float normalised_x = 2.0f * (float)mousePosition.x / (float)window.getWidth() - 1.0f;
float normalised_y = 1.0f - 2.0f * (float)mousePosition.y / (float)window.getHeight();

How to get world coordinates from the screen coordinates

I assume that screenPoint.xy is the poisition of the mouse in window coordinates (pixel). And screenPoint.z is the depth from the depth buffer. You must transform the position to Normalized Device Cooridnates. NDC are in range (-1, -1, -1):

glm::vec3 fC = screenPoint;
glm::vec3 ndc = glm::vec3(fC.x / 1920.0, 1.0 - fC.y / 1080.0, fC.z) * 2.0 - 1.0;
glm::vec4 worldPosition = finalMatrix * glm::vec4(ndc, 1);

worldPosition is a Homogeneous coordinates. You must divide it by it's w component to get a Cartesian coordinate (see Perspective divide):

glm::vec3 p = glm::vec3(worldPosition) / worldPosition.w;

See also OpenGL - Mouse coordinates to Space coordinates.

Translating mouse coordinates to model coordinates in OpenGL when rotations are involved

It is not really wrong, you are reading the depth buffer to figure out the window-space Z value to use for reverse projection.

The problem is that there is limited precision available in the depth buffer, and that introduces a bit of inaccuracy. In reality you cannot expect the range of unprojected values to be perfectly [-0.5,0.5]. You are going to have to introduce a small epsilon here, so your effective range would then be something like [-0.5015,0.5015].

You could probably lessen the impact by increasing the precision of your depth buffer and/or decreasing the range between the near and far clip planes. The depth buffer is generally 24-bit fixed-point by default, but a 32-bit fixed-point or floating-point depth buffer might slightly improve your situation. However, you are never going to completely eliminate this problem.

Opengl - Mouse Coordinates to Space Coordinates