Skip to content

Camera Overview

Cameras are a fundamental visual component. In addition to the common fields that every component contains, they are defined by the following types:

Intrinsics CameraIntrinsics Intrinsic parameters that describe the camera model.
Covariance DMatrix<f64> A n×n covariance matrix describing the variance-covariance of intrinsic parameters.
Pixel pitch f64 The metric size of a pixel in real space. If unknown, should be equal to 1.0.

Pixel pitch units

It is common practice to leave most observations and arithmetic in units of pixels when dealing with image data. However, this practice can get confusing when trying to compare two different camera types, as "1 pixel" may not equate to the same metric error between cameras. Pixel pitch allows us to compare cameras using a common unit, i.e. units-per-pixel.

Note that we leave the unit ambiguous here, as this field is primarily for making analysis easier on human eyes. Common units include microns-per-pixel (μm / pixel) and meters-per-pixel (m / pixel).

Image Coordinate System

Our image coordinate system is a right handed coordinate system with the \(x\)-axis extending right along columns of pixels, the \(y\)-axis extending downwards along rows of pixels, and "depth" axis (also known as the planar normal) extending into the plane itself.

Image coordinate system

In this system, we choose to represent the origin as the upper-left corner of the upper-left-most pixel. Such a pixel would be coordinate \((0, 0)\). Note that this is different from some computer vision libraries that treat the origin of the image coordinate system as the center of the upper-left-most pixel. If you're not careful, this could result in a 0.5 pixel coordinate offset that goes unaccounted for.

Intrinsic Modeling

A camera can be a complex thing to model.

"Intrinsics" can refer to different models depending on the lens type. Read below for more information on how Tangram Vision handles these within the Tangram Vision Platform.


These are the equations used to relate objects in 3D space to their images in 2D space. This serves as the base for any camera calibration process.



Distortion is a way of describing how a camera's calibration deviates from a perfect projection model. Distortion effects can be mild (e.g. most narrow FOV cameras) or extreme (e.g. a fisheye lens).



Affinity models aberrations or defects in the image plane itself. Affinity artifacts could be caused by an angled image plane with respect to a lens, rectangular pixel wells, or poorly-manufactured hardware. Its effects are rarely seen in commodity cameras, but they are still worth knowing.