Version: 8.0

MetriCal Results

When calibrating with MetriCal, the metrical calibrate command can produce an immense amount of data. This can include the following outputs:

Logs printed to stderr
Logs converted into an HTML document, if the --report-path argument is provided.
A collection of all calibration results serialized into a JSON file. By default, this is named results.json.

MetriCal Logs

The logs output by MetriCal contain much of the same information contained in the results.json, but often presented in the form of tables, graphs, or charts. This can be a convenient representation to get an over-arching summary of what happened during the calibration.

Interpreting Pre-Calibration Metrics

MetriCal will output several tables before running the full calibration. Many of these tables can help determine how useful a dataset will be for calibration. When the --interactive flag is used during a metrical calibrate run, the process will pause completely and wait for user feedback before running the calibration itself.

Frames With Detections

This table shows a count of total obsevations provided to MetriCal vs. the number of observations left after data filtering.

For cameras: the number of frames with detections is the number of frames with features in view of the camera.
For LiDAR: the number of individual "point clouds"

>> # frames with detections <<
+-----------------------+---------------+-----------------------+----------------------+
| Component             | # Frames read | # Frames after filter | Frames with features |
+-----------------------+---------------+-----------------------+----------------------+
|      color (b10329e4) |            15 |                    15 |                   15 |
+-----------------------+---------------+-----------------------+----------------------+
| infrared-1 (4f3fb515) |            15 |                    15 |                   15 |
+-----------------------+---------------+-----------------------+----------------------+
| infrared-2 (6d31496f) |            15 |                    15 |                   15 |
+-----------------------+---------------+-----------------------+----------------------+

This can be a useful heuristic to check if the motion filter is filtering too aggressively, or if the real problem is due to the object space not being in view of the cameras / LiDAR.

Binned Coverage Count

This chart demonstrates how many features were detected in each "bin" in a 10×10 grid representing the image extent. As is shown in the example below, the colors of the feature counts shift from red (bad feature coverage) to blue (excellent feature coverage). In general when capturing data, these feature coverage charts should ideally be green to blue in every bin.

>> % coverage, by feature count << +--------- | | | +----- | |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  +-----

--------------+---------------------------------------------------------------------+       Component       |          Binned Feature Coverage (10x10 bins across image)          |                       |                                                                     |                       | █ no features   █ 1-15 features   █ 16-50 features   █ >50 features | ------------------+---------------------------------------------------------------------+ color (b10329e4)      |                                                                     |  |      0     0     0     0     0     1     1     0     0     0        |  |                                                                     |  |      0     0     6     9     15    9     8     7     3     0        |  |                                                                     |  |      0     2     10    15    15    21    13    9     2     0        |  |                                                                     |  |      0     2     6     18    24    21    18    8     3     0        |  |                                                                     |  |      0     4     8     18    26    22    15    7     3     0        |  |                                                                     |  |      0     2     6     16    29    28    18    7     5     0        |  |                                                                     |  |      0     4     9     19    24    20    20    6     4     0        |  |                                                                     |  |      0     2     6     16    21    22    19    7     2     0        |  |                                                                     |  |      0     1     4     10    9     9     8     7     1     0        |  |                                                                     |  |      0     0     0     0     0     0     0     0     0     0        |  |                                                                     | ------------------+---------------------------------------------------------------------+

It may not always be possible to achieve the "best" possible coverage. In such cases, it is recommended that you fill the image extent as much as is practical.

Interpreting Post-Calibration Metrics

Binned Reprojection Errors

Similar to the binned coverage charts in the pre-calibration, the binned reprojection errors display a similar representation except the collective reprojection RMSE (root-mean-square-error) is printed instead of the total feature count.

Like the coverage charts, this is color-coded from red (bad, large reprojection errors) to blue (excellent reprojection errors). In most calibrations, the aim will probably be to get this number as low as possible.

warning
Our color coding is merely a guideline. You should set your own internal tolerances for this sort of metric.

>> Binned Reprojection Errors << +-----------------------+----------------------------------------------------------------------------------+ | Component | Binned reprojection error | | | computed as √([SSE in u and v] ÷ [2 × num points]) | | | | | | █ <0.1px █ 0.1-0.25 px █ 0.25-1.0px █ >1px / no data | +-----------------------+----------------------------------------------------------------------------------+ | color (b10329e4) | | | | - - - - - 0.220 0.101 - - - | | | | | | - - 0.097 0.125 0.154 0.125 0.140 0.159 0.200 - | | | | | | - 0.081 0.136 0.109 0.101 0.093 0.139 0.174 0.063 - | | | | | | - 0.239 0.125 0.095 0.108 0.078 0.117 0.079 0.242 - | | | | | | - 0.265 0.187 0.122 0.123 0.124 0.080 0.080 0.137 - | | | | | | - 0.225 0.139 0.185 0.125 0.109 0.100 0.092 0.160 - | | | | | | - 0.295 0.156 0.212 0.106 0.451 0.087 0.110 0.210 - | | | | | | - 0.310 0.122 0.118 0.134 0.141 0.100 0.136 0.120 - | | | | | | - 0.238 0.128 0.081 0.113 0.097 0.095 0.164 0.106 - | | | | | | - - - - - - - - - - | | | | +-----------------------+----------------------------------------------------------------------------------+

For other results at the end of calibration, see the other results contained in the serialized results and metrics below.

Serialized Results and Metrics

Every calibration outputs a comprehensive JSON of metrics, by default named results.json. This file contains:

The optimized plex representing the calibrated system

The optimized object space (with any updated spatial constraints for a given object space)

Metrics derived over the dataset that was calibrated:

Summary statistics for the adjustment

Optimized object space features (e.g. 3D target or corner locations)

Residual metrics for each observation used in the adjustment

Optimized Plex

The optimized Plex is a description of the now-calibrated System. This Plex is typically more "complete" and information-rich than the input Plex, since it is based off of the real data used to calibrate by the System.

The optimized plex can be pulled out of the results.json by using jq:

jq .plex results.json > plex.json

The schema for plexes is described on the plex overview page.

Optimized Object Space

MetriCal will optimize over the object spaces used in every calibration. For example, If your object space consists of a checkerboard, MetriCal will directly estimate how flat (or not) the checkerboard actually is using the calibration data.

This comes in two forms in the results.json file:

An optimized object space definition that can be re-used in future calls to metrical calibrate.

A collection of optimized object space features (i.e. the actual 3D feature or point data) optimized using the calibration data.

The former can be extracted from the results.json by using jq:

jq .object_space results.json > object_space.json

The schema for object spaces is described on the object space overview page.

Conversely, the latter is embedded in the metrics themselves:

jq .metrics.optimized_object_spaces results.json > optimized_object_spaces.json

The latter is interesting insofar as it can be plotted in 3D to visually see how object features such as targets or the positions of corner points were estimated:

{ "1c22b1c6-4d5a-4058-a71d-c9716a099d48": { "ids": [40, 53, 34, 3, 43], "xs": [0.3838, 0.7679, 0.6717, 0.2883, 0.6717], "ys": [0.1917, 0.096, 0.2879, 0.5761, 0.192], "zs": [0.00245, -0.00159, 0.00122, -0.00155, 0.00085] } }

Which is a JSON object where the keys are UUIDs for each object space, and the values are an object containing the feature identifiers (ids), as well as Cartesian coordinate data (xs, ys, zs) for each feature.

Summary Statistics

The main entrypoint into the metrics contained in the results.json is the collection of summary statistics. Of all the metrics output in a results.json file, the Summary Statistics for a calibration run the most risk of being misinterpreted. Always bear in mind that these figures represent broad, global mathematical strokes, and should be interpreted holistically along with the rest of the metrics of a calibration. These summary statistics can be extracted from the metrics using jq:

jq .metrics.summary_statistics results.json > summary_statistics.json

In total, this looks something like:

{ "per_component_rmse": [ { "uuid": "f70140d5-48e3-4617-9ccb-b5664ef19542", "rmse": 0.2069233539915885 }, { "uuid": "09bb8a44-ea8a-47cc-aab0-1023ea0205c7", "rmse": 0.18624513108850418 }, { "uuid": "ff8330ae-8bce-46f8-9628-5c7ef2c34a29", "rmse": 0.20874344157641334 } ], "optimized_object_rmse": 0.20092021837150498, "posterior_variance": 0.6812943730598403 }

Per-Component RMSE

Per-Component RMSE is the Root Mean Square Error for each component in the calibration. For a component that has been appropriately modeled (i.e. there are no un-modeled systematic error sources present), this represents the mean quantity of error from observations taken by a single component.

Units for RMSE are specific to the component in question, and should not necessarily be compared directly. For example, camera components will be making observations in units of pixels in image space, which means our RMSE is in units of pixels as well. For other modalities such as LiDAR, the units are typically in meters (or whatever unit the raw measurements are expressed in).

Comparing Camera RMSE
If two cameras have pixels of different sizes, then it is important to first convert these RMSEs to some metric size so as to compare them equally. This is what pixel_pitch in the Plex API is for: cameras can be compared more equally with that in mind, as the pixel size between two cameras is not always equal!.

Posterior Variance

Also known as "a-posteriori variance factor" or "normalized cost," the posterior variance is a relative measure of the gain/loss of information from the calibration.

Precision, Not Accuracy
Uncertainty is necessarily a measure of precision, not accuracy. Prior and posterior variance tell us about the data that we observed and its relation to the model we chose for our calibration, but doesn't say anything about the accuracy of the model itself.

Posterior variance doesn't make sense without discussing prior variance, or the "a-priori variance factor,". Prior variance in MetriCal is a global scale on the uncertainty of our input data. This could be considered a relative measure of confidence in a given "geometric network" of data input into our calibration.

MetriCal always starts with a prior variance of 1.0 in the adjustment — in other words, no calibration is considered special with regards to its input uncertainty. MetriCal will just use default uncertainties for any given observed quantity and scale the whole "network" with 1.0.

This means that the posterior variance is only useful when compared to the prior, or 1.0. Posterior variance $\hat{\sigma}_0^2$ can be computed in any least-squares adjustment by using the following formula:
$\hat{\sigma}_0^2 = \frac{r^T \cdot C_l^{-1} \cdot r}{\mathsf{D.o.F}}$
where $r$ is the residuals vector, $C_l$ is the covariance matrix of the observations in the calibration, and D.o.F. refers to the total degrees of freedom in the entire adjustment. The upper part of the above fraction is the cost function of a least-squares process (the weighted square sum of residuals), which is why this is sometimes referred to as "normalized cost."

Posterior vs Prior Variance

The trick here is in interpreting this value relative to our prior variance of 1.0. There are three possible scenarios that can occur:

Posterior variance is approximately 1.0 ( $\hat{\sigma}_0^2$ = 1.0)

Posterior variance is less than the prior variance ( $\hat{\sigma}_0^2$ < 1.0)

Posterior variance is greater than the prior variance ( $\hat{\sigma}_0^2$ > 1.0)

The first scenario is the simplest, but also the least interesting. If the posterior variance matches the prior variance well, then our uncertainty has been correctly quantified, and that the final variances of our estimated parameters match expectations.

In the second, the residual error across the data set is now smaller than what was expected. This could mean the problem was pessimistic in its initial estimate of uncertainty in the problem definition. Taking a more Bayesian approach, it can be interpreted as having more information or certainty in the results of the calibration using this data set than it had going in.

The posterior variance is now larger than what was expected at the outset. This implies the opposite of Posterior < Prior: the problem was optimistic in its initial estimate of uncertainty. In other words, we now have more uncertainty (less certainty) in the results using this data set than we thought we ought to have, after considering the data.

What's Best?

From the latter two scenarios, it might be tempting to say that posterior variance should always be less than or equal to 1.0. After all, it should be better to remain pessimistic or realistic with regards to our uncertainty than it is to be optimistic and have more error, right?

Unfortunately, this is a very broad brush with which to explain our posterior variance. This kind of naive explanation may lead to some biased inferences; in particular, there's a good number of reasons why posterior variance might be smaller than prior variance:

We set our prior variances to be very large, and that was unrealistic.

The data set contained much more data than was necessary to estimate the parameters to the appropriate level of significance. This relates to the observability of our parameters as well as the number of parameters we are observing.

Conversely, there's a number of good reasons for why posterior variance may be larger than prior variance:

The prior variance was set to be very small, and that was unrealistic. This can occur if the data set is good, but observations from the data are qualitatively bad for some reason (e.g. a blurry lens that was installed incorrectly). The model and data would not agree, so residual error increases.

The data set did not contain enough degrees of freedom (D.o.F) to be able to minimize residuals to the level of desired significance. This can occur when individual frames in a camera do not detect enough points to account for the number of parameters we have to estimate for that pose / frame / intrinsics model / etc.

The data actually disagrees with prior estimates of our parameters. This can occur if parameters are "fixed" to incorrect values, and the data demonstrates this through larger residual error. This can also occur when there are large projective compensations in our model, and our data set does not contain frames or observations that would help discriminate correlations across parameters.

It is easy to misattribute any one of these causes to a problem in the calibration; for instance, if the model and correspondent covariances in the plex are acceptable and the other calibration outputs don't show any signs that the calibration is invalid in some way, then posterior variance likely will not reveal any new insight into the calibration.

When should I worry about posterior variance?
Generally speaking, posterior variance needs to differ by quite a large margin before it is worth worrying about, and you'll likely see other problems in the calibration process that will lead to more fruitful investigations if something is "wrong" or can be improved upon.
As a rule of thumb, if posterior variance isn't less than $\frac{1}{3}$ or greater than 3.0 (a factor of 3 between posterior and prior variance), then you shouldn't worry about it.

Residual Metrics

Residual metrics are generated for each and every cost or observation added to the calibration. The most immediately familiar residual metric might be reprojection error, but similar metrics can be derived for other modalities and observations as well. A full list of these is linked below:

Image reprojection

Circle misalignment

IMU preintegration error

Object inertial extrinsics error

Composed relative extrinsics error

Paired 3D point error

Paired plane normal error

MetriCal Results

MetriCal Logs​

Interpreting Pre-Calibration Metrics​

Frames With Detections​

Binned Coverage Count​

Interpreting Post-Calibration Metrics​

Binned Reprojection Errors​

Serialized Results and Metrics​

Optimized Plex​

Optimized Object Space​

Summary Statistics​

Per-Component RMSE​

Posterior Variance​

Posterior vs Prior Variance​

What's Best?​

Residual Metrics​

MetriCal Logs

Interpreting Pre-Calibration Metrics

Frames With Detections

Binned Coverage Count

Interpreting Post-Calibration Metrics

Binned Reprojection Errors

Serialized Results and Metrics

Optimized Plex

Optimized Object Space

Summary Statistics

Per-Component RMSE

Posterior Variance

Posterior vs Prior Variance

What's Best?

Residual Metrics