Pre-amble

Terms to parse differently than expected (maybe)

Space - a set of co-ordinate axes. A point will need a change-of-basis transformation to move from one distinct space to another.
Origin - (0,0,0) in a 3-dimensional co-ordinate space
Creator - the entity that made the choices on the following matters:

Translation - could be used to mean moving between spaces but this write-up it only means moving an object’s position, done through matrix multiplication ONLY by using homogeneous co-ordinates
NDC - these letters parse to abbreviation normalised device co-ordinates
Clip - the space where we decide what’s inside/outside of our field of view. When used for it’s common meaning e.g. clipping a hedge, the word will be accompanied by quotation marks
View - used interchangeably to mean view space, the space titled view and view, as in what’s in your vision
Components - components of a vector e.g. $x,y,z$ of a 3-dimensional vector
Position - interchangeably used to mean vector, a point in space

Pre

I recommend 3blue1brown videos on linear algebra, your goal is to get an intuitive understanding of matrices, vectors and common transformations so you can imagine some of the transformations below in your head. Think of the below:

Skim the 3 https://learnopengl.com/Getting-started chapters Transformations to Camera to see some of the pictures

https://www.songho.ca/opengl/files/gl_camera02.gif this is a visualisation of world-to-view, haven’t found one for projection yet

Post

Implement something that touches upon every stage of this imaginary world to monitor pipeline. Camera implementation as is implemented in the OpenGL online book. Alternatively using a library like SDL or raylib to skip learning the OpenGL API for the GPU as it’s quite explicit to speak and isn’t relevant to the world-to-screen pipeline ideas. Here’s mine https://github.com/jsbaasi/camcpu that is buggy (doesn’t clip) at the time of writing.

Look at other explanations of the world-to-screen pipeline to form your own perspective. Overall pipeline as described by Song Ho Ahn, brilliant resources from them.

Intro

I learnt some of this stuff as part of an esp I made for assault cube, source can be found at achack. I thought it would be helpful for myself to condense some of my linear algebra learnings into this write-up, targeting 3d-space-to-2d-screen transformations that a game developer may use.

Hopefully others find this useful but, disclaimer, I am not a mathematician or a graphics programmer, just some dude. If you have any questions or would like to discuss further, please reach me through jsbaasi at stormblessed dot fr or jsbaasi on Discord. If there any corrections to make, please open an MR on github

Model space

Our journey begins with objects in their own “space”. For each object, it lives in a space whose axes describe the object as being at the origin point (0,0,0).

Simple so far.

Model space to World space

Each object goes through a change-of-basis + translation to be put into the world space. It is now in a space with other objects.

World space to View space

A camera now comes into the mix, it’s also an object in world space. We need to pick a point to view space from, this point is where we’ll put the camera. Next we ask ourselves, how do we note where the camera is looking?

We can do with 3 basis vectors. One to describe what up and down means for the camera, one to describe what left and right means for the camera, and one to describe what forwards and backwards means for the camera. $\hat{i}$, $\hat{j}$, $\hat{k}$.

For example, I have placed my camera at (3,3,3). It is looking at the origin. It’s forward vector will be origin - (3,3,3) which is (-3,-3,-3). These forward, up, right vectors are special and known as basis vectors. They are stored as unit vectors for convention I guess, so must be normalised. Forward becomes
$\vec{forward} = \frac{1}{\sqrt27} \begin{pmatrix} -3 \\ -3 \\ -3 \end{pmatrix}$
If we don’t care about roll (like fps games), then we set $\vec{up}$ to $\begin{pmatrix} 0 \ 1 \ 0 \end{pmatrix}$, calculate the cross-product between $\vec{forward}$ and $\vec{up}$ to get $\vec{right}$ if we’re in right-handed convention or $\vec{left}$ otherwise. Then calculate the cross product between $\vec{right}$ and $\vec{forward}$ to set the real $\vec{up}$ and not just some vector that occupies the up plane. We can generate these bases from:

Deriving the transformation matrix for world-to-view

We have the basis vectors of the camera and it’s position within the world space.

In the view space the camera is at the centre, thus we can imagine to move the camera itself from the world space to view space we first translated it to the origin, then rotated it to face whichever axis we determined to be the forward vector (this is sometimes opposite, where the camera is rotated to face the opposite direction of where it’s looking at)

We need homogenous co-ordinates to describe a translation in a matrix multiplication, otherwise we’d need to do two operations. Your choice whether you want to go to the fourth dimension or more maths.

So the final matrix will be the opposite of it’s current position THEN the opposite of the rotation the camera had. So when other objects are transformed by this matrix, they are translated away to the same distance and rotated to keep the same orientation relative to whatever rotations the camera had.

View space to Clip space

We now have to decide:

  1. what’s in our field of view
  2. if you’re doing perspective projection (objects further away are smaller) as opposed to orthographic projection (objects are same size everywhere) then what is the dimensions of each object when we’ve applied our perspective projection to it.
  3. what co-ordinate conventions we are living by e.g. if it’s OpenGL we look down $-z$, but conventionally $z_{near}$ and $z_{far}$ are given as positive values so we must negate $z$ at some point

1.

We do this by geometrically describing a box (if orthographic) or frustum (if perspective) in the direction that the camera is facing. Objects outside of this projection are “clipped”. Practically this means we compare whether the position components of the object meet the following $-w <= x,y,z <= w$ criteria after the projection matrix multiplication

2.

We do this by geometrically describing a frustum. Objects within this frustum gets mapped to a cube co-ordinate space constrained at [-1, 1]. Thus, objects at the back of the frustum will under-go a greater compression compared to objects that are closer to the front.

Deriving the transformation matrix for view-to-clip

This one is a bit trickier but we can arrive to the optimised version that everyone has agreed on by summarising our motivations for each of the components, working row by row of the final matrix, setting 0s where there’s no relationship between the components. Obviously you can decide your own route of getting from view to clip to NDC. Such an exercise may help you understand the decisions that were made as opposed to e.g. dropping to 3 dimensions after the translation in world-to-view transformation or not being in 4 dimensions at all and having to do translation in a separate operation.

X and Y

Z

W

Conclusion

| Model space |
| model matrix

| World space |
| view matrix

| View space |
| projection matrix

| Clip space |
| de-homogenize

| NDC space |
| range map

| Screen space |

Misc

Row-major vs Column-major

Column-major (opengl and vulkan default) is better for shaders and row-major (directx default) is better for cpu cache line apparently

$$ \begin{vmatrix} a&b&c&d \\ e&f&g&h \\ i&j&k&l \\ m&n&o&p \end{vmatrix} $$

it’s a way to decide which way we want to flatten our matrix into a 1d memory array, keeping the columns contiguous
$a,e,i,m || b,f,j,n || c,g,k,o || d,h,l,p$
OR keeping the rows contiguous
$a,b,c,d || e,f,g,h || i,j,k,l || m,n,o,p$
glm maths library mimics glsl vec/matrix maths, some info here and thus the api for manipulating objects is column major, e.g. mat[0][3] is the same as my_matrix.first_column.w_component

Radians and degrees

GLM functions expect angles to be given in radians, so wrap your angles in glm::radians()

Pre- vs post- multiply

GLM functions like glm::translate(input_matrix, translation_vector) post-multiply meaning the translation_vector is multiplied to the right of the input_matrix like input_matrix * translation_vector. So if you wanted to apply a rotation to an object around the object’s $up$ axis then I would build it like translation_from_origin * rotation_around_up * translation_to_origin and in GLM calls it would look like:

glm::translate(input_matrix, translation_to_origin);
glm::translate(input_matrix, rotation_around_up);
glm::translate(input_matrix, translation_from_origin);

Something to stay aware of.