# Projection

These are the equations used to relate objects in 3D space to their images in 2D space. This serves as the base for any camera calibration process.

## Pinhole Projection

One of the most common projection models for perception is the pinhole projection model. The mathematics roughly take the form:

$\begin{bmatrix} x - c_x \\ y - c_y \\ f \end{bmatrix} = \Gamma_o^i \begin{bmatrix} X_o \\ Y_o \\ Z_o \end{bmatrix}$

Where:

• $$f$$ is the focal length
• $$c_x$$ and $$c_y$$ are the principal point offsets from our image coordinate sytem
• $$\Gamma_o^i$$ is the extrinsic transformation between object and image space
• $$x$$ and $$y$$ are the image space coordinates (in pixels), and
• $$X_o$$, $$Y_o$$, and $$Z_o$$ are the object space coordinates of our point in the "world"

Often, this equation is collapsed for brevity, and we get something akin to the following:

$\begin{bmatrix} x \\ y \end{bmatrix} = f \begin{bmatrix} X_c/ Z_c \\ Y_c / Z_c \end{bmatrix} + \begin{bmatrix} c_x \\ c_y \end{bmatrix}$

Where:

$\begin{bmatrix} X_c \\ Y_c \\ Z_c \end{bmatrix} = \Gamma_o^i \begin{bmatrix} X_o \\ Y_o \\ Z_o \end{bmatrix}$

All the above parameters are included in our calibration model by default.

## What Happened to $$f_x$$ and $$f_y$$?

Many computer vision calibration pipelines will model the projection relationship with two focal distances, namely $$f_x$$ and $$f_y$$. Here, we choose not to model focal length in this way. This has a few advantages:

1. Geometrically, there is no justification for multiple focal distances.
2. The observability of estimating $$f_x$$ and $$f_y$$ is low. These two values are highly correlated to each other, and are tightly coupled to errors in our image observations. We can avoid this problem entirely by estimating only $$f$$.
3. The low observability will couple the values estimated for $$f_x$$ and $$f_y$$ to the data set used to estimate them, which means that they do not generalize as well as a single focal length will. In other words, errors in the estimation of $$f_x$$ and $$f_y$$ make our model inconsistent; we can repeat a calibration multiple times and get different values every time.

If you remain concerned about how we address scale differences in the $$x$$ and $$y$$ dimensions of the image plane, read the documentation on Affinity.