Modules

Scorers

class grand_challenge_metrics.scorers.DetectionScore(true_positives, false_negatives, false_positives)
false_negatives

Alias for field number 1

false_positives

Alias for field number 2

true_positives

Alias for field number 0

grand_challenge_metrics.scorers.find_hits_for_targets(*, targets, predictions, radius)[source]

Generates a list of the predicted points that are within a radius r of the targets. The indicies are returned in sorted order, from closest to farthest point.

Parameters:
  • targets (list[tuple[float, ...]]) – A list of target points

  • predictions (list[tuple[float, ...]]) – A list of predicted points

  • radius (float) – The maximum distance that two points can be apart for them to be considered a hit

Return type:

list[tuple[int, ...]]

Returns:

  • A list which has the same length as the targets list. Each element within

  • this list contains another list that contains the indicies of the

  • predictions that are considered hits.

grand_challenge_metrics.scorers.score_detection(*, ground_truth, predictions, radius=1.0)[source]

Generates the number of true positives, false positives and false negatives for the ground truth points given the predicted points.

If multiple predicted points hit one ground truth point then this is considered as 1 true positive, and 0 false negatives.

If one predicted point is a hit for N ground truth points then this is considered as 1 true positive, and N-1 false negatives.

Parameters:
  • ground_truth (list[tuple[float, ...]]) – A list of the ground truth points

  • predictions (list[tuple[float, ...]]) – A list of the predicted points

  • radius (float) – The maximum distance that two points can be separated by in order to be considered a hit

Return type:

DetectionScore

Returns:

  • A tuple containing the number of true positives, false positives and

  • false negatives.

Annotations

class grand_challenge_metrics.annotations.BoundingBox(*, x1, x2, y1, y2)[source]
__init__(*, x1, x2, y1, y2)[source]

A bounding box is a face defined by 4 edges on a 2D plane. It must have a non-zero width and height.

Parameters:
  • x1 (float) – Left edge of the bounding box

  • x2 (float) – Right edge of the bounding box

  • y1 (float) – Bottom edge of the bounding box

  • y2 (float) – Top edge of the bounding box

Raises:

ValueError – If the bounding box has zero width or height

property area: float

Return the area of the bounding box in natural units

intersection(*, other)[source]

Calculates the intersection area between this bounding box and another, axis aligned, bounding box.

Parameters:

other (BoundingBox) – The other bounding box

Returns:

The intersection area in natural units if the two bounding boxes overlap, zero otherwise.

Return type:

float

jaccard_index(*, other)[source]

Calculates the intersection over union between this bounding box and a second, axis aligned, bounding box.

Parameters:

other (BoundingBox) – The other bounding box

Returns:

The intersection over union in natural units

Return type:

float

union(*, other)[source]

Calculates the union between this bounding box and another, axis aligned, bounding box.

Parameters:

other (BoundingBox) – The other bounding box

Returns:

The union area in natural units

Return type:

float

Stats

class grand_challenge_metrics.stats.HausdorffMeasures(distance, modified_distance, percentile_distance)
distance

Alias for field number 0

modified_distance

Alias for field number 1

percentile_distance

Alias for field number 2

grand_challenge_metrics.stats.absolute_volume_difference(s1, s2, voxelspacing=None)[source]

Calculate absolute volume difference from s2 to s1

Parameters:
  • s1 (ndarray) – Input data containing objects. Can be any type but will be converted into binary: background where 0, object everywhere else. s1 is taken to be the reference.

  • s2 (ndarray) – Input data containing objects. Can be any type but will be converted into binary: background where 0, object everywhere else.

  • voxelspacing (Union[tuple[Union[float, int], ...], list[Union[float, int]], float, int, None]) – The voxelspacing in a distance unit i.e. spacing of elements along each dimension. If a sequence, must be of length equal to the input rank; if a single number, this is used for all axes. If not specified, a grid spacing of unity is implied.

Return type:

float

Returns:

The absolute volume difference between the object(s) in input1 and the object(s) in input2. This is a percentage value in the range \([0, +inf]\) for which a \(0\) denotes an ideal score.

Notes

This is a real metric

grand_challenge_metrics.stats.accuracies_from_confusion_matrix(cm)[source]

Computes accuracy scores from a confusion matrix

Parameters:

cm (ndarray) – N x N Input confusion matrix

Return type:

1d ndarray containing accuracy scores for all N classes

grand_challenge_metrics.stats.calculate_confusion_matrix(y_true, y_pred, labels)[source]

Efficient confusion matrix calculation, based on sklearn interface

Parameters:
  • y_true (ndarray) – Target multi-object segmentation mask

  • y_pred (ndarray) – Predicted multi-object segmentation mask

  • labels (list[int]) – Inclusive list of N labels to compute the confusion matrix for.

Return type:

N x N confusion matrix for Y_pred w.r.t. Y_true

Notes

By definition a confusion matrix \(C\) is such that \(C_{i, j}\) is equal to the number of observations known to be in group \(i\) but predicted to be in group \(j\).

grand_challenge_metrics.stats.dice_from_confusion_matrix(cm)[source]

Computes Dice scores from a confusion matrix

Parameters:

cm (ndarray) – N x N Input confusion matrix

Return type:

1d ndarray containing Dice scores for all N classes

grand_challenge_metrics.stats.dice_to_jaccard(dice)[source]

Conversion computation from Dice to Jaccard

Parameters:

dice (ndarray) – 1 or N Dice values within [0 .. 1]

Return type:

1 or N Jaccard values within [0 .. 1]

grand_challenge_metrics.stats.hausdorff_distance(s1, s2, voxelspacing=None, connectivity=1)[source]

Computes the (symmetric) Hausdorff Distance (HD) between the binary objects in two images. It is defined as the maximum surface distance between the objects.

Parameters:
  • s1 (ndarray) – Input data containing objects. Can be any type but will be converted into binary: background where 0, object everywhere else.

  • s2 (ndarray) – Input data containing objects. Can be any type but will be converted into binary: background where 0, object everywhere else.

  • voxelspacing (Union[tuple[Union[float, int], ...], list[Union[float, int]], float, int, None]) – The voxelspacing in a distance unit i.e. spacing of elements along each dimension. If a sequence, must be of length equal to the input rank; if a single number, this is used for all axes. If not specified, a grid spacing of unity is implied.

  • connectivity (int) – The neighbourhood/connectivity considered when determining the surface of the binary objects. This value is passed to scipy.ndimage.generate_binary_structure and should usually be \(> 1\).

Return type:

float

Returns:

The symmetric Hausdorff Distance between the object(s) in `s1` and the object(s) in `s2`. The distance unit is the same as for the spacing of elements along each dimension, which is usually given in mm.

Notes

This is a real metric. Implementation inspired by medpy.metric.binary http://pythonhosted.org/MedPy/_modules/medpy/metric/binary.html

grand_challenge_metrics.stats.hausdorff_distance_measures(s1, s2, voxelspacing=None, connectivity=1, percentile=0.95)[source]

Returns multiple Hausdorff measures - (hd, modified_hd, percentile_hd) Since measures share common calculations, together the measures can be calculated more efficiently

Parameters:
  • s1 (ndarray) – Input data containing objects. Can be any type but will be converted into binary: background where 0, object everywhere else.

  • s2 (ndarray) – Input data containing objects. Can be any type but will be converted into binary: background where 0, object everywhere else.

  • voxelspacing (Union[tuple[Union[float, int], ...], list[Union[float, int]], float, int, None]) – The voxelspacing in a distance unit i.e. spacing of elements along each dimension. If a sequence, must be of length equal to the input rank; if a single number, this is used for all axes. If not specified, a grid spacing of unity is implied.

  • connectivity (int) – The neighbourhood/connectivity considered when determining the surface of the binary objects. This value is passed to scipy.ndimage.generate_binary_structure and should usually be \(> 1\).

  • percentile (float) – The percentile at which to calculate the Hausdorff Distance

Return type:

HausdorffMeasures

Returns:

The hausdorff distance, modified hausdorff distance and percentile hausdorff distance

Notes

This returns real metrics.

grand_challenge_metrics.stats.jaccard_from_confusion_matrix(cm)[source]

Computes Jaccard scores from a confusion matrix a.k.a. intersection over union (IoU)

Parameters:

cm (ndarray) – N x N Input confusion matrix

Return type:

1d ndarray containing Jaccard scores for all N classes

grand_challenge_metrics.stats.jaccard_to_dice(jacc)[source]

Conversion computation from Jaccard to Dice

Parameters:

jacc (ndarray) – 1 or N Jaccard values within [0 .. 1]

Return type:

1 or N Dice values within [0 .. 1]

grand_challenge_metrics.stats.mean_contour_distance(s1, s2, voxelspacing=None)[source]

Computes the (symmetric) Mean Contour Distance between the binary objects in two images. It is defined as the maximum average surface distance between the objects.

Parameters:
  • s1 (ndarray) – Input data containing objects. Can be any type but will be converted into binary: background where 0, object everywhere else.

  • s2 (ndarray) – Input data containing objects. Can be any type but will be converted into binary: background where 0, object everywhere else.

  • voxelspacing (Union[tuple[Union[float, int], ...], list[Union[float, int]], float, int, None]) – The voxelspacing in a distance unit i.e. spacing of elements along each dimension. If a sequence, must be of length equal to the input rank; if a single number, this is used for all axes. If not specified, a grid spacing of unity is implied.

Return type:

float

Returns:

The symmetric Mean Contour Distance between the object(s) in `s1` and the object(s) in `s2`. The distance unit is the same as for the spacing of elements along each dimension, which is usually given in mm.

Notes

This is a real metric that mimics the ITK MeanContourDistanceFilter.

grand_challenge_metrics.stats.modified_hausdorff_distance(s1, s2, voxelspacing=None, connectivity=1)[source]

Computes the (symmetric) Modified Hausdorff Distance (MHD) between the binary objects in two images. It is defined as the maximum average surface distance between the objects.

Parameters:
  • s1 (ndarray) – Input data containing objects. Can be any type but will be converted into binary: background where 0, object everywhere else.

  • s2 (ndarray) – Input data containing objects. Can be any type but will be converted into binary: background where 0, object everywhere else.

  • voxelspacing (Union[tuple[Union[float, int], ...], list[Union[float, int]], float, int, None]) – The voxelspacing in a distance unit i.e. spacing of elements along each dimension. If a sequence, must be of length equal to the input rank; if a single number, this is used for all axes. If not specified, a grid spacing of unity is implied.

  • connectivity (int) – The neighbourhood/connectivity considered when determining the surface of the binary objects. This value is passed to scipy.ndimage.generate_binary_structure and should usually be \(> 1\).

Return type:

float

Returns:

The symmetric Modified Hausdorff Distance between the object(s) in `s1` and the object(s) in `s2`. The distance unit is the same as for the spacing of elements along each dimension, which is usually given in mm.

Notes

This is a real metric.

grand_challenge_metrics.stats.percentile_hausdorff_distance(s1, s2, percentile=0.95, voxelspacing=None, connectivity=1)[source]

Nth Percentile Hausdorff Distance.

Computes a percentile for the (symmetric) Hausdorff Distance between the binary objects in two images. It is defined as the maximum surface distance between the objects at the nth percentile.

Parameters:
  • s1 (ndarray) – Input data containing objects. Can be any type but will be converted into binary: background where 0, object everywhere else.

  • s2 (ndarray) – Input data containing objects. Can be any type but will be converted into binary: background where 0, object everywhere else.

  • percentile (int | float) – The percentile to perform the comparison on the two sorted distance sets

  • voxelspacing (Union[tuple[Union[float, int], ...], list[Union[float, int]], float, int, None]) – The voxelspacing in a distance unit i.e. spacing of elements along each dimension. If a sequence, must be of length equal to the input rank; if a single number, this is used for all axes. If not specified, a grid spacing of unity is implied.

  • connectivity (int) – The neighbourhood/connectivity considered when determining the surface of the binary objects. This value is passed to scipy.ndimage.generate_binary_structure and should usually be \(> 1\).

Return type:

float

Returns:

The maximum Percentile Hausdorff Distance between the object(s) in `s1` and the object(s) in `s2` at the `percentile` percentile. The distance unit is the same as for the spacing of elements along each dimension, which is usually given in mm.

See also

hd()

Notes

This is a real metric.

grand_challenge_metrics.stats.relative_absolute_volume_difference(s1, s2)[source]

Calculate relative absolute volume difference from s2 to s1

Parameters:
  • s1 (ndarray) – Input data containing objects. Can be any type but will be converted into binary: background where 0, object everywhere else. s1 is taken to be the reference.

  • s2 (ndarray) – Input data containing objects. Can be any type but will be converted into binary: background where 0, object everywhere else.

Return type:

float

Returns:

The relative absolute volume difference between the object(s) in input1 and the object(s) in input2. This is a percentage value in the range \([0, +inf]\) for which a \(0\) denotes an ideal score.

Notes

This is not a real metric! it is asymmetric.

ROC

class grand_challenge_metrics.roc.BootstrappedCIPointError(mean_fprs, mean_tprs, low_tpr_vals, high_tpr_vals, low_fpr_vals, high_fpr_vals)[source]
high_fpr_vals: ndarray

Alias for field number 5

high_tpr_vals: ndarray

Alias for field number 3

low_fpr_vals: ndarray

Alias for field number 4

low_tpr_vals: ndarray

Alias for field number 2

mean_fprs: ndarray

Alias for field number 0

mean_tprs: ndarray

Alias for field number 1

class grand_challenge_metrics.roc.BootstrappedROCCICurves(fpr_vals, mean_tpr_vals, low_tpr_vals, high_tpr_vals, low_az_val, high_az_val)[source]
fpr_vals: ndarray

Alias for field number 0

high_az_val: ndarray

Alias for field number 5

high_tpr_vals: ndarray

Alias for field number 3

low_az_val: ndarray

Alias for field number 4

low_tpr_vals: ndarray

Alias for field number 2

mean_tpr_vals: ndarray

Alias for field number 1

grand_challenge_metrics.roc.average_roc_curves(roc_curves, bins=200)[source]

Averages ROC curves using vertical averaging (fixed FP rates), which gives a 1D measure of variability.

Parameters:
  • curves – List of BootstrappedROCCICurves to be averaged

  • (optional) (bins) – Number of false-positives to iterate over. (Default: 200)

Returns:

ROC class containing the average over all ROCs.

Return type:

BootstrappedROCCICurves

grand_challenge_metrics.roc.get_bootstrapped_ci_point_error(y_score, y_true, num_bootstraps=100, ci_to_use=0.95, exclude_first_last=True)[source]

Produces Confidence-Interval errors for individual points from ROC Useful when only few ROC points exist so they will be plotted individually e.g. when range of score values in y_score is very small (e.g. manual observer scores)

Note that this method only works by analysing the cloud of boostrapped points generatedfor a particular threshold value. A fixed number of threshold values is essential. Therefore the scores in y_score must be from a fixed discrete set of values, eg. [1,2,3,4,5]

Bootstrapping is done by selecting len(y_score) samples randomly (with replacement) from y_score and y_true. This is done num_boostraps times.

Parameters:
  • y_score (ndarray) – The scores produced by the system being evaluated. A discrete set of possible scores must be used.

  • y_true (ndarray) – The true labels (1 or 0) which are the reference standard being used

  • num_bootstraps (integer) – How many times to make a random sample with replacement

  • ci_to_use (float) – Which confidence interval is required.

  • exclude_first_last (bool) – The first and last ROC point (0,0 and 1,1) are usually irrelevant in these scenarios where only a few ROC points will be individually plotted. Set this to true to ignore these first and last points.

Return type:

BootstrappedCIPointError

Returns:

  • mean_fprs – The array of mean fpr values (1 per possible ROC point)

  • mean_tprs – The array of mean tpr values (1 per possible ROC point)

  • low_tpr_vals – The tpr vals (one per ROC point) representing lowest val in CI

  • high_tpr_vals – The tpr vals (one per ROC point) representing the highest val in CI

  • low_fpr_vals – The fpr vals (one per ROC point) representing lowest val in CI_to_use

  • high_fpr_vals – The fpr vals (one per ROC point) representing the highest val in CI

grand_challenge_metrics.roc.get_bootstrapped_roc_ci_curves(y_pred, y_true, num_bootstraps=100, ci_to_use=0.95)[source]

Produces Confidence-Interval Curves to go alongside a regular ROC curve This is done by using boostrapping. Bootstrapping is done by selecting len(y_pred) samples randomly (with replacement) from y_pred and y_true. This is done num_boostraps times.

Parameters:
  • y_pred (ndarray) – The predictions (scores) produced by the system being evaluated

  • y_true (ndarray) – The true labels (1 or 0) which are the reference standard being used

  • num_bootstraps (int) – How many times to make a random sample with replacement

  • ci_to_use (float) – Which confidence interval is required.

Return type:

BootstrappedROCCICurves

Returns:

  • fpr_vals – An equally spaced set of fpr vals between 0 and 1

  • mean_tpr_vals – The mean tpr vals (one per fpr_val) obtained by boostrapping

  • low_tpr_vals – The tpr vals (one per fpr_val) representing lower curve for CI

  • high_tpr_vals – The tpr vals (one per fpr_val) representing the upper curve for CI

  • low_Az_val – The lower Az (AUC) val for the given CI_to_use

  • high_Az_val – The higher Az (AUC) val for the given CI_to_use