How to make perspective projection matrix in OpenGL manually?

Introduction
The problem with negative W coordinates
Aspect ratio
Vertical field of view
Z buffering issue
Near and far clipping planes

Introduction

As of OpenGL3 all the old stuff: glBegin*, gluPerspective, glVertex, glColor, anything like this, is deprecated or removed. From now, you'll need to roll your own rendering pipeline, and set up your own matrixes. Many tutorials treat projection matrices as black boxes, and usually create these kind of matrixes using external libraries like GLM . If you want to understand how perpective projection matrices work, and want to understand what it means and how to set them up, keep reading. There a lot of things are encoded in this matrix so it's good to know how to set it up correctly.

I assume you are familiar with this vector, matrix stuff and know what homogenous coordinates mean.

Basic perspective

The basic idea of perspective is: if something 2 or 3 times farther it appears 2 or 3 times smaller, that's it. So you need a matrix that calculates $x/z$ and $y/z$. It can be done by simply exchanging $z$ and $w$ coordinates, with a simple matrix like this:

$$ \left( x, y, z, 1 \right) \left( \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{array} \right) = \left(x, y, 1, z\right) $$

By turning these homogenous coordinates into regular coordinates, you have $\left(x/z, y/z, 1/z\right)$. These coordinates are the clipping coordinates. That's something like what we want. But it's not the best yet.

The problem with negative W coordinates

But setting the matrix above, your scene may appear upside down or nothing would appear at all. The problem is that by convention $z$ coordinates are negative in the forward direction (in front of the camera). So using the matrix above, it's turned into negative $w$. So it flips the sign of your $x$ and $y$ coordinates and your scene appears upside down.

The other case is that nothing appears at all. That's because how clipping works in OpenGL. To make a fragment appear on screen, its coordinates must be between $-1$ and $1$ in all 3 dimensions. So the following relation needs to be tested:

$$ -1 \le \frac{x}{w} \le 1 $$

Since division is slow, the designers assumed $w$ always positive and multiplied with it:

$$ -w \le x \le w $$

So there is no need to need to do the division, but in this case this test always fails if $w$ is negative, so nothing will appear.

In order to fix this, just change the sign in the fourth column:

$$ \left( x, y, z, 1 \right) \left( \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & -1 \\ 0 & 0 & 1 & 0 \end{array} \right) = \left(x, y, 1, -z\right) $$

This will fix it. In regular coordinates it will be $(-x/z, -y/z, -1/z)$. You can use this matrix for perspective right away, but we are still not there yet.

Aspect ratio

Your application probably runs in a horizontal rectangular window, and everything is elongated along the horizontal axis. Since vertical coordinates run from -1 to 1, from bottom to top, horizontal ones run from -1 to 1 left to right. In order to fix it, you can scale down the $x$ component of the coordinates with the aspect ratio to get correct squares and circles. So our matrix so far is:

$$ \left( \begin{array}{cccc} 1/a & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & -1 \\ 0 & 0 & 1 & 0 \end{array} \right) $$

Vertical field of view

Another thing these perspective setting functions offer is setting the vertical field of view. In our matrix, transforming $(0, 1, -1)$ and $(0, -1, -1)$, result in $(0, 1, 1)$ and $(0, -1, 1)$, not so much change. But these to points will be at the top and bottom of the window, and if you imagine drawing a line to them you see that the angle between them is 90 degrees, so our matrix defines a 90° field of field. In order to alter the field of view, just scale the $x$ and $y$ coordinates. If you scale them up, you reduce the field of view, if you scale it down you increase your field of view.

In terms of angles the scaling factor you need is $cot(\alpha / 2)$. So the cotangent of half of the the FOV. So our matrix so far is:

$$ \left( \begin{array}{cccc} \frac{\mathrm{cot}(\alpha/2)}{a} & 0 & 0 & 0 \\ 0 & \mathrm{cot}(\alpha/2) & 0 & 0 \\ 0 & 0 & 0 & -1 \\ 0 & 0 & 1 & 0 \end{array} \right) $$

Z buffering issue

If you set this matrix and enable depth buffering, you will see it behaves in a weird way. It does the exact opposite it should do: if you draw a distant object, overdrawing it with a nearby object won't overwrite it. The reason is due to how does the depth buffer work: the initial values of the depth buffers are all 1's. If an incoming fragment has smaller $z$ coordinate than the stored value, it will be drawn and the depth value will be updated. So if you draw something at $z = -4$ the mapped $z$ coordinate will be $1/4 = 0.25$ (as the default $w$ coordinate is 1). If you then draw something nearby at $z = -2$, the corresponding $z$ value will be $0.5$, it's higher so it won't update it. Exact opposite what you look for.

A quick fix is flipping the sign in the last row:

$$ \left( \begin{array}{cccc} \frac{\mathrm{cot}(\alpha/2)}{a} & 0 & 0 & 0 \\ 0 & \mathrm{cot}(\alpha/2) & 0 & 0 \\ 0 & 0 & 0 & -1 \\ 0 & 0 & -1 & 0 \end{array} \right) $$

So with this matrix, the two $z$ values will be $-0.25$ and $-0.5$ respectively, with these values the Z buffer will work.

Near and far clipping planes

Our matrix defines a frustum whose near plane at $z = -1$ and the far plane is at infinite. The depth values range between $-1$ and $0$. That's not very effective distribution, but it's already reasonable enough, if you render objects that are nearby. But you may experience severe Z-fighting even at moderate distances (beyond 100 units).

In practice you probably wouldn't render features beyond a certain distance eg. beyond 1000 units. So it would be a good idea set up the matrix that distributes all depth values inside this region and cull everything that's beyond it.

So we have $1/-z$ at the $z$ clip coordinate, it's $1$ at the near plane, and $0$ at infinite. We can modify the near plane by scaling it, eg.: $2/-z$ is $1$ at $z = -2$. The far plane can be modified by adding or subtracting a constant, eg. $1/-z$ is $0.001$ at $z = -1000$. So subtracting $0.001$ from it will result in $0$.

Since we set $-z$ as $w$ coordinate, we need an expression that's divided by $-z$, and makes scaling a biasing possible. The expression that does it is:

$$ \frac{A + Bz}{-z} = -\frac{A}{z} - B \label{eq:basic_expr} $$

We can use this expression to construct a system of equations, but for desired result: z values at the far clipping plane should evaluate to $1$, z values at the near plane should evaluate to $-1$.

$$ \left\{ \begin{array}{rcl} - \frac{A + Bn}{n} & = & -1 \\ - \frac{A + Bf}{f} & = & 1 \end{array} \right. $$

The solution of these equations is left as an exercise for the reader (or any computer algebra system). The solutions are:

$$ \begin{array}{rcl} A & = & - \frac{2nf}{n-f} \\ B & = & \frac{n+f}{n-f} \end{array} $$

From $\eqref{eq:basic_expr}$ we can see if we change $z$ the value changes monotonously between $-1$ and $1$, so we have the correct solution.

These solutions assume you specify near and far plane with negative values. If you want to use positive values as gluPerspective does, you should simply remove the sign from the $A$ so our final matrix is:

$$ \left( \begin{array}{cccc} \frac{\mathrm{cot}(\alpha/2)}{a} & 0 & 0 & 0 \\ 0 & \mathrm{cot}(\alpha/2) & 0 & 0 \\ 0 & 0 & \frac{n+f}{n-f} & -1 \\ 0 & 0 & \frac{2nf}{n-f} & 0 \end{array} \right) $$

You can put the elements of this matrix in row mayor order into the array in C:

    const GLfloat projection[] =
    {
        FOV/ASPECT, 0, 0, 0,
        0, FOV, 0, 0,
        0, 0, (NEAR+FAR)/(NEAR-FAR), -1,
        0, 0, (2*NEAR*FAR)/(NEAR-FAR), 0 
    };