Skip to content


These are the equations used to relate objects in 3D space to their images in 2D space. This serves as the base for any camera calibration process.

Pinhole Projection

One of the most common projection models for perception is the pinhole projection model. The mathematics roughly take the form:

\[ \begin{bmatrix} x - c_x \\ y - c_y \\ f \end{bmatrix} = \Gamma_o^i \begin{bmatrix} X_o \\ Y_o \\ Z_o \end{bmatrix} \]


  • \(f\) is the focal length
  • \(c_x\) and \(c_y\) are the principal point offsets from our image coordinate sytem
  • \(\Gamma_o^i\) is the extrinsic transformation between object and image space
  • \(x\) and \(y\) are the image space coordinates (in pixels), and
  • \(X_o\), \(Y_o\), and \(Z_o\) are the object space coordinates of our point in the "world"

Often, this equation is collapsed for brevity, and we get something akin to the following:

\[ \begin{bmatrix} x \\ y \end{bmatrix} = f \begin{bmatrix} X_c/ Z_c \\ Y_c / Z_c \end{bmatrix} + \begin{bmatrix} c_x \\ c_y \end{bmatrix} \]


\[ \begin{bmatrix} X_c \\ Y_c \\ Z_c \end{bmatrix} = \Gamma_o^i \begin{bmatrix} X_o \\ Y_o \\ Z_o \end{bmatrix} \]

All the above parameters are included in our calibration model by default.

What Happened to \(f_x\) and \(f_y\)?

Many computer vision calibration pipelines will model the projection relationship with two focal distances, namely \(f_x\) and \(f_y\). Here, we choose not to model focal length in this way. This has a few advantages:

  1. Geometrically, there is no justification for multiple focal distances.
  2. The observability of estimating \(f_x\) and \(f_y\) is low. These two values are highly correlated to each other, and are tightly coupled to errors in our image observations. We can avoid this problem entirely by estimating only \(f\).
  3. The low observability will couple the values estimated for \(f_x\) and \(f_y\) to the data set used to estimate them, which means that they do not generalize as well as a single focal length will. In other words, errors in the estimation of \(f_x\) and \(f_y\) make our model inconsistent; we can repeat a calibration multiple times and get different values every time.

Further Reading

Check out our post on projection and focal length on the Tangram Vision Blog. It provides a more detailed historical background as well as context surrounding our decision to model our projective model with a single focal length.


If you remain concerned about how we address scale differences in the \(x\) and \(y\) dimensions of the image plane, read the documentation on Affinity.