How to Convert a 3D Point into 2D Perspective Projection

How to convert a 3D point into 2D perspective projection?

The standard way to represent 2D/3D transformations nowadays is by using homogeneous coordinates. [x,y,w] for 2D, and [x,y,z,w] for 3D. Since you have three axes in 3D as well as translation, that information fits perfectly in a 4x4 transformation matrix. I will use column-major matrix notation in this explanation. All matrices are 4x4 unless noted otherwise.

The stages from 3D points and to a rasterized point, line or polygon looks like this:

Transform your 3D points with the inverse camera matrix, followed with whatever transformations they need. If you have surface normals, transform them as well but with w set to zero, as you don't want to translate normals. The matrix you transform normals with must be isotropic; scaling and shearing makes the normals malformed.
Transform the point with a clip space matrix. This matrix scales x and y with the field-of-view and aspect ratio, scales z by the near and far clipping planes, and plugs the 'old' z into w. After the transformation, you should divide x, y and z by w. This is called the perspective divide.
Now your vertices are in clip space, and you want to perform clipping so you don't render any pixels outside the viewport bounds. Sutherland-Hodgeman clipping is the most widespread clipping algorithm in use.
Transform x and y with respect to w and the half-width and half-height. Your x and y coordinates are now in viewport coordinates. w is discarded, but 1/w and z is usually saved because 1/w is required to do perspective-correct interpolation across the polygon surface, and z is stored in the z-buffer and used for depth testing.

This stage is the actual projection, because z isn't used as a component in the position any more.

The algorithms:

Calculation of field-of-view

This calculates the field-of view. Whether tan takes radians or degrees is irrelevant, but angle must match. Notice that the result reaches infinity as angle nears 180 degrees. This is a singularity, as it is impossible to have a focal point that wide. If you want numerical stability, keep angle less or equal to 179 degrees.

fov = 1.0 / tan(angle/2.0)

Also notice that 1.0 / tan(45) = 1. Someone else here suggested to just divide by z. The result here is clear. You would get a 90 degree FOV and an aspect ratio of 1:1. Using homogeneous coordinates like this has several other advantages as well; we can for example perform clipping against the near and far planes without treating it as a special case.

Calculation of the clip matrix

This is the layout of the clip matrix. aspectRatio is Width/Height. So the FOV for the x component is scaled based on FOV for y. Far and near are coefficients which are the distances for the near and far clipping planes.

[fov * aspectRatio][        0        ][        0              ][        0       ]
[        0        ][       fov       ][        0              ][        0       ]
[        0        ][        0        ][(far+near)/(far-near)  ][        1       ]
[        0        ][        0        ][(2*near*far)/(near-far)][        0       ]

Screen Projection

After clipping, this is the final transformation to get our screen coordinates.

new_x = (x * Width ) / (2.0 * w) + halfWidth;
new_y = (y * Height) / (2.0 * w) + halfHeight;

Trivial example implementation in C++

#include <vector>
#include <cmath>
#include <stdexcept>
#include <algorithm>

struct Vector
{
    Vector() : x(0),y(0),z(0),w(1){}
    Vector(float a, float b, float c) : x(a),y(b),z(c),w(1){}

    /* Assume proper operator overloads here, with vectors and scalars */
    float Length() const
    {
        return std::sqrt(x*x + y*y + z*z);
    }
    
    Vector Unit() const
    {
        const float epsilon = 1e-6;
        float mag = Length();
        if(mag < epsilon){
            std::out_of_range e("");
            throw e;
        }
        return *this / mag;
    }
};

inline float Dot(const Vector& v1, const Vector& v2)
{
    return v1.x*v2.x + v1.y*v2.y + v1.z*v2.z;
}

class Matrix
{
    public:
    Matrix() : data(16)
    {
        Identity();
    }
    void Identity()
    {
        std::fill(data.begin(), data.end(), float(0));
        data[0] = data[5] = data[10] = data[15] = 1.0f;
    }
    float& operator[](size_t index)
    {
        if(index >= 16){
            std::out_of_range e("");
            throw e;
        }
        return data[index];
    }
    Matrix operator*(const Matrix& m) const
    {
        Matrix dst;
        int col;
        for(int y=0; y<4; ++y){
            col = y*4;
            for(int x=0; x<4; ++x){
                for(int i=0; i<4; ++i){
                    dst[x+col] += m[i+col]*data[x+i*4];
                }
            }
        }
        return dst;
    }
    Matrix& operator*=(const Matrix& m)
    {
        *this = (*this) * m;
        return *this;
    }

    /* The interesting stuff */
    void SetupClipMatrix(float fov, float aspectRatio, float near, float far)
    {
        Identity();
        float f = 1.0f / std::tan(fov * 0.5f);
        data[0] = f*aspectRatio;
        data[5] = f;
        data[10] = (far+near) / (far-near);
        data[11] = 1.0f; /* this 'plugs' the old z into w */
        data[14] = (2.0f*near*far) / (near-far);
        data[15] = 0.0f;
    }

    std::vector<float> data;
};

inline Vector operator*(const Vector& v, const Matrix& m)
{
    Vector dst;
    dst.x = v.x*m[0] + v.y*m[4] + v.z*m[8 ] + v.w*m[12];
    dst.y = v.x*m[1] + v.y*m[5] + v.z*m[9 ] + v.w*m[13];
    dst.z = v.x*m[2] + v.y*m[6] + v.z*m[10] + v.w*m[14];
    dst.w = v.x*m[3] + v.y*m[7] + v.z*m[11] + v.w*m[15];
    return dst;
}

typedef std::vector<Vector> VecArr;
VecArr ProjectAndClip(int width, int height, float near, float far, const VecArr& vertex)
{
    float halfWidth = (float)width * 0.5f;
    float halfHeight = (float)height * 0.5f;
    float aspect = (float)width / (float)height;
    Vector v;
    Matrix clipMatrix;
    VecArr dst;
    clipMatrix.SetupClipMatrix(60.0f * (M_PI / 180.0f), aspect, near, far);
    /*  Here, after the perspective divide, you perform Sutherland-Hodgeman clipping 
        by checking if the x, y and z components are inside the range of [-w, w].
        One checks each vector component seperately against each plane. Per-vertex
        data like colours, normals and texture coordinates need to be linearly
        interpolated for clipped edges to reflect the change. If the edge (v0,v1)
        is tested against the positive x plane, and v1 is outside, the interpolant
        becomes: (v1.x - w) / (v1.x - v0.x)
        I skip this stage all together to be brief.
    */
    for(VecArr::iterator i=vertex.begin(); i!=vertex.end(); ++i){
        v = (*i) * clipMatrix;
        v /= v.w; /* Don't get confused here. I assume the divide leaves v.w alone.*/
        dst.push_back(v);
    }

    /* TODO: Clipping here */

    for(VecArr::iterator i=dst.begin(); i!=dst.end(); ++i){
        i->x = (i->x * (float)width) / (2.0f * i->w) + halfWidth;
        i->y = (i->y * (float)height) / (2.0f * i->w) + halfHeight;
    }
    return dst;
}

If you still ponder about this, the OpenGL specification is a really nice reference for the maths involved.
The DevMaster forums at http://www.devmaster.net/ have a lot of nice articles related to software rasterizers as well.

Graphics - equation to convert 3d point to 2d projection

It is called the Perspective Projection and the formula you seek is just the matrix-multiplication found here:

http://en.wikipedia.org/wiki/3D_projection#Perspective_projection

Projecting 3D Points to 2D Points

Old tutorials are great, mostly from the days before T&L and shaders, even before hardware rendering. Start with an understanding of them, i.e. /technical/math-and-physics/3d-matrix-math-demystified-r695">http://www.gamedev.net/page/resources//technical/math-and-physics/3d-matrix-math-demystified-r695 . Then go on to application for a camera: http://www.codeguru.com/cpp/misc/misc/graphics/article.php/c10123/Deriving-Projection-Matrices.htm

Basically, you translate and then rotate the entire 'universe' around the camera. So you have your point: you define a translation (movement) for the camera, then a rotation matrix. You apply them to any 'world content,' i.e. your point. You can then "cheat" and simply divide the x and y values by z to project to the 2d plane, but you should really do calculations that correct for field-of-view properly.

Projecting 3D points to 2D plane

If you have your target point P with coordinates r_P = (x,y,z) and a plane with normal n=(nx,ny,nz) you need to define an origin on the plane, as well as two orthogonal directions for x and y. For example if your origin is at r_O = (ox, oy, oz) and your two coordinate axis in the plane are defined by e_1 = (ex_1,ey_1,ez_1), e_2 = (ex_2,ey_2,ez_2) then orthogonality has that Dot(n,e_1)=0, Dot(n,e_2)=0, Dot(e_1,e_2)=0 (vector dot product). Note that all the direction vectors should be normalized (magnitude should be one).

Your target point P must obey the equation:

r_P = r_O + t_1*e_1 + t_2*e_2 + s*n

where t_1 and t_2 are your 2D coordinates along e_1 and e_2 and s the normal separation (distance) between the plane and the point.

There scalars are found by projections:

s = Dot(n, r_P-r_O)
t_1 = Dot(e_1, r_P-r_O)    
t_2 = Dot(e_2, r_P-r_O)

Example with a plane origin r_O = (-1,3,1) and normal:

n = r_O/|r_O| = (-1/√11, 3/√11, 1/√11)

You have to pick orthogonal directions for the 2D coordinates, for example:

e_1 = (1/√2, 0 ,1/√2)
e_2 = (-3/√22, -2/√22, 3/√22)

such that Dot(n,e_1) = 0 and Dot(n,e_2) = 0 and Dot(e_1, e_2) = 0.

The 2D coordinates of a point P r_P=(1,7,-3) are:

t_1 = Dot(e_1, r_P-r_O) = ( 1/√2,0,1/√2)·( (1,7,-3)-(-1,3,1) ) =  -√2
t_2 = Dot(e_2, r_P-r_O) = (-3/√22, -2/√22, 3/√22)·( (1,7,-3)-(-1,3,1) ) = -26/√22

and the out of plane separation:

s = Dot(n, r_P-r_O) = 6/√11

Projecting a 3D point to a 2D screen position issue

It turns out the 4th method was right, all it had missing was retrieving the value returned by Vector3.Project().

The 3 other methods still gave significantly different results and I still don't know why they didn't work. If someone knows, I'd appreciate to know.

3d to 2d point conversion

NOTE:
This is a big wall of text and I completely glaze over a lot of important stuff - but my intention here is just an overview...hopefully some of the terms/concepts here will lead you to better Googling for appropriate chunks on the web.

It helps if you walk your way through "Life as a point":

Here we are, a nice little 3-dimensional point:

var happyPoint = new Point(0, 0, 0);

And here is its buddy, defined in relation to his friend:

var friendlyPoint = new Point(1, 0, 0);

For now, let's call these two points our "model" - we'll use the term "model space" to talk about points within a single three-dimensional structure (like a house, monster, etc).

Models don't live in a vacuum, however...it's usually easier to separate the "model space" and "world space" to make things easier for model tweaking (otherwise, all your models would need to be in the same scale, have the same orientation, etc, etc, plus trying to work on them in a 3d modelling program would be friggin impossible)

So we'll define a "World Transform" for our "Model" (ok, 2 points is a lame model, but a model it remains).

What is a "World Transform"? Simply put:

A world transform W = T X R X S, where
T = translation - that is, sliding it along the X, Y, or Z axes
R = rotation - turning the model with respect to an axis
S = scaling - resizing a model (maintaining all the relative points within) along an axis

We'll take the easy out here, and just define our world transform as the Identity matrix - basically, this means we don't want it to translate, rotate, or scale:

I highly recommend you brush up on your Matrix math, especially multiplication and Vector->Matrix operations its used ALL THE FREAKING TIME in 3D graphics.

So cleverly skipping over the actual matrix multiplication, I'll just tell you that multiplying our "world transform" and our model points just ends up with our model points again (albeit in this fun new 4-dimensional vector representation, which I won't touch here).

So we've got our points, and we've absolutely located them in "space"...now what?

Well, where are we looking at it from? This leads to the concept of View Transformations or Camera Projection - basically, it's just another matrix multiplication - observe:

Say we've got a point X, at...oh, (4 2) or so:

 |
 |
 |
 |    
 |    X
 |
 ------------------------

From the perspective of the origin (0 0), X is at (4 2) - but say we put our camera off to the right?

 |
 |
 |
 |    
 |    X         >-camera
 |
 ------------------------

What is the "position" of X, relative to the camera? Probably something closer to either (0 9) or (9 0), depending on what your camera's "up" and "right" directions are. This is what View transformations are - mapping one set of 3D points to another set of 3D points such that they are "correct" from the perspective of an observer. In your case of a top-down fixed camera, your observer would be some fixed position in the sky, and all the models would be transformed accordingly.

So let's draw!

Unfortunately, our screen isn't 3D (yet), so first we need to "project" this point onto a 2D surface. Projection is...well, its basically a mapping that looks like:

(x, y, z) => (x, y)

The number of possible projections is nigh-infinite: for example, we could just shift over the X and Y coordinates by Z:

func(x, y, z) => new point2d(x + z, y + z);

Usually, you want this projection to mimic the projection the human retina does when looking at 3D scenes, however, so we bring in the concepts of a View Projection. There are a few different view projections, like Orthographic, YawPitchRoll-defined, and Perspective/FOV-defined; each of these has a couple of key bits of data you need to properly build the projection.

A Perspective/FOV based projection, for example, needs:

The position of your "eyeball" (i.e., the screen)
How far into the distance your "eyeball" is capable of focusing (the "far clipping plane")
Your angular field of view (i.e., hold your arms out, just at the edges of your peripheral vision)
The ratio of width to height for the "lens" you're looking through (typically your screen resolution)

Once you've got these numbers, you create something called a "bounding frustum", which looks a bit like a pyramid with the top lopped off:

\-----------------/
 \               /
  \             /
   \           /
    \         /
     \-------/

Or from the front:

 ___________________
|   _____________   |
|  |             |  |
|  |             |  |
|  |             |  |
|  |             |  |
|  |             |  |
|  |_____________|  |
|___________________|

I won't do the matrix calculations here, since that's all well defined elsewhere - in fact, most libraries have helper methods that'll generate the corresponding matrices for you - but here's roughly how it works:

Let's say your happy little point lies in this frustum:

\-----------------/
 \               /
  \ o<-pt       /
   \           /
    \         /
     \-------/

 ___________________
|   _____________   |
|  |             |  |
|  |             |  |
|o |             |  |
|^---- pt        |  |
|  |             |  |
|  |_____________|  |
|___________________|

Notice it's way off to the side, so far that it's out of the "near clip plane" rectangle - What would it look like if you "looked into" the smaller end of the pyramid?

Much like looking into a Prism (or a lens), the point would be "bent" into view:

 ___________________
|   _____________   |
|  |             |  |
|  |             |  |
|>>>o <- pt is   |  |
|  |    shifted  |  |
|  |             |  |
|  |_____________|  |
|___________________|

Put another way, if you had a bright light behind the frustum, where would the shadows from your points be "cast" upon the near clipping field? (the smaller rectangle) That's all projection is - a mapping of one point to another, in this case, removing the Z component and altering the X and Y accordingly in a way that "makes sense" to our eyes.

How to Convert a 3D Point into 2D Perspective Projection