Prediction¶

The prediction module consumes EcoTaxa archive files and produces different kinds of output according to respective task.

Semantic segmentation¶

The semantic segmentation task applies a PyTorch model for semantic segmentation to each input image and produces an EcoTaxa archive with the original metadata, the measurements and (optionally) the detected segments overlayed onto the original images.

maze-ipp predict semantic_segmentation.yaml

semantic_segmentation.yaml is configured in the following way:

model_fn must point to a trained semantic segmentation model. tiled must be true so that the model is applied to all image regions. If draw is true, the detected segments will be stored alongside the measurements.

Feature calculation¶

maze-ipp predict extract_features.yaml

extract_features.yaml is configured in the following way:

model_fn must point to a trained feature extractor model. save_raw_predictions must be true so that a HDF5 file is created. The HDF5 file that is created will contain two datasets: object_id and predictions.

Polyhierarchical classification¶

Polyhierarchical classification works by predicting scores for each image, generating polyhierarchical descriptions and mapping them back to EcoTaxa categories.

maze-ipp predict polytaxo.yaml

polytaxo.yaml is configured in the following way:

model_fn must point to a trained polytaxo classifier model. polytaxo must be configured.

For more information see the polytaxo GitHub page.

Configuration Example¶

Here is some general example configuration:

## [required] Configuration of the input.
input:
  ## [required] Path to an input EcoTaxa archive. May contain wildcard
  ## characters ('?', '*').
  path: ...

  ## [optional] Ignore these directories. May contain wildcard characters
  ## ('?', '*').
  # ignore_patterns: []

## [required] Configuration of the input.
model:
  ## [required] A file containing a ScriptModule (or ScriptFunction)
  ## previously saved with :func:`torch.jit.save <torch.jit.save>`
  model_fn: ...

  ## [optional] A device to load and execute the model (e.g. 'cpu' or
  ## 'cuda:0').
  # device: "cpu"

  ## [optional] Number of threads that each execute an instance of the
  ## model.
  # n_threads: 0

  ## [optional] Batch size
  # batch_size: 0

  ## [optional] Enable automatic mixed precision inference to improve
  ## performance.
  # autocast: false

  ## [optional] Datatype to use for the processing (e.g. 'float32')
  # dtype: "float32"

  ## [optional] Model metadata.
  # meta:
  #   ## [required] Ordered mapping of output names to output descriptions,
  #   ## e.g. {"pred": {"channel_names": ["Prosoma", "Oilsack"]}}. Only a
  #   ## single output is supported.
  #   outputs: ...

  ## [optional] Apply the model to square tiles on each input image.
  ## Required for semantic segmentation.
  # tiling:
  #   ## [optional] Edge length of one tile
  #   # size: 1024

  #   ## [optional] Stride of the tiling. `size - stride` is the overlap of two
  #   ## consecutive tiles.
  #   # stride: 896
  # ## OR ##
  # tiling: false

## [optional] Save raw predictions into an HDF5 file, e.g. for feature
## extraction.
# save_raw_h5: false

## [optional] Measure predicted segments and store into EcoTaxa archive.
## (Only applies for semantic segmentation.)
# segmentation:
#   ## [optional] Draw segments.
#   # draw: false

#   ## [optional] Fill holes in segments. Can be boolean or a list of channel
#   ## names.
#   # fill_holes: ...
# ## OR ##
# segmentation: false

## [optional] Predict object properties using a PolyTaxo classifier and
## store into an EcoTaxa archive.
# polytaxo:
#   ## [required] PolyTaxonomy filename.
#   poly_taxonomy_fn: ...

#   ## [required] EcoTaxa project taxonomy filename.
#   ecotaxa_taxonomy_fn: ...

#   ## [optional] Update validated object_annotation_category with compatible
#   ## predictions. Incompatible predictions will not be added, even if they
#   ## obtain higher scores than any compatible prediction.
#   ## If false, the prediction only depends on the model output.
#   # compatible_predictions_only: true

#   ## [optional] Save only objects with updated annotations and skip
#   ## unchanged objects.
#   # skip_unchanged_objects: true

#   ## [optional] Filter expression to apply to validated objects.
#   ## Objects not matching this filter are skipped.
#   # filter_validated: ...

#   ## [optional] Save raw description as meta-data.
#   # save_raw_descriptions: false

#   ## [optional] Strip metadata unrelated to annotation.
#   # strip_metadata: true

#   ## [optional] Absolute threshold to apply to prediction scores. Any
#   ## accepted prediction must obtain a higher score than `threshold`. If a
#   ## score is below 1-threshold, a negative descriptor will be added to the
#   ## description.
#   # threshold: 0.9

#   ## [optional] Relative threshold to apply to prediction scores. Any
#   ## accepted prediction must obtain a higher score than the next-best
#   ## prediction's score + `threshold_relative`.
#   # threshold_relative: 0.0

#   ## [optional] Augmentation rules to apply to previously validated
#   ## annotations.
#   ## These rules enrich already validated annotations by incorporating
#   ## implicit defaults or taxonomic knowledge that could not be represented
#   ## in EcoTaxa.
#   ## These rules have the form `<query>: <update>`: If the query expression
#   ## matches the description, the update expression is applied.
#   # taxonomy_augmentation_rules: ...

#   ## [optional] Constraint rules to apply to predicted annotations.
#   ## These rules limit or exclude certain predictions based on contextual
#   ## factors or known exceptions within the taxonomy. The purpose is to
#   ## prevent inaccurate or inappropriate predictions that do not align with
#   ## known biological or taxonomic constraints.
#   ## These rules have the form `<query>: <update>`: If the query expression
#   ## matches the description, the update expression is applied.
#   # prediction_constraint_rules: ...
# ## OR ##
# polytaxo: false

## [required] Directory where the output files are created.
target_dir: ...

## [optional] The interval at which progress is logged, e.g. 10s or 1m.
# log_interval: ...

The example configuration can be generated using the maze-ipp config predict command.

Configuration Schema¶

This is the complete documentation for the configuration of the pipeline:

pydantic model DataDescriptorSchema¶

field channel_names: Sequence[str] | None = None¶: List of channel names

pydantic model EcoTaxaInputConfig¶

field ignore_patterns: List[str] = []¶: Ignore these directories. May contain wildcard characters (‘?’, ‘*’).

field max_n_objects: int | None = None¶: Maximum number of objects. (For debugging.)

field path: str [Required]¶: Path to an input EcoTaxa archive. May contain wildcard characters (‘?’, ‘*’).

pydantic model ModelConfig¶

field autocast: bool = False¶: Enable automatic mixed precision inference to improve performance.

field batch_size: int = 0¶: Batch size

field device: str = 'cpu'¶: A device to load and execute the model (e.g. ‘cpu’ or ‘cuda:0’).

field dtype: str = 'float32'¶: Datatype to use for the processing (e.g. ‘float32’)

field meta: ModelMetaSchema | None = None¶: Model metadata.

field model_fn: str [Required]¶: A file containing a ScriptModule (or ScriptFunction) previously saved with torch.jit.save

field n_threads: int = 0¶: Number of threads that each execute an instance of the model.

field tiling: TilingConfig | Literal[False] = False¶: Apply the model to square tiles on each input image. Required for semantic segmentation.

pydantic model ModelMetaSchema¶

field outputs: OrderedDict[str, DataDescriptorSchema] [Required]¶: Ordered mapping of output names to output descriptions, e.g. {“pred”: {“channel_names”: [“Prosoma”, “Oilsack”]}}. Only a single output is supported.

pydantic model PolyTaxoConfig¶

field compatible_predictions_only: bool = True¶: Update validated object_annotation_category with compatible predictions. Incompatible predictions will not be added, even if they obtain higher scores than any compatible prediction. If false, the prediction only depends on the model output.

field ecotaxa_taxonomy_fn: str [Required]¶: EcoTaxa project taxonomy filename.

field filter_validated: str | None = None¶: Filter expression to apply to validated objects. Objects not matching this filter are skipped.

field poly_taxonomy_fn: str [Required]¶: PolyTaxonomy filename.

field prediction_constraint_rules: OrderedDict[str, str] | None = None¶: Constraint rules to apply to predicted annotations. These rules limit or exclude certain predictions based on contextual factors or known exceptions within the taxonomy. The purpose is to prevent inaccurate or inappropriate predictions that do not align with known biological or taxonomic constraints. These rules have the form <query>: <update>: If the query expression matches the description, the update expression is applied.

field save_raw_descriptions: bool = False¶: Save raw description as meta-data.

field skip_unchanged_objects: bool = True¶: Save only objects with updated annotations and skip unchanged objects.

field strip_metadata: bool = True¶: Strip metadata unrelated to annotation.

field taxonomy_augmentation_rules: OrderedDict[str, str] | None = None¶: Augmentation rules to apply to previously validated annotations. These rules enrich already validated annotations by incorporating implicit defaults or taxonomic knowledge that could not be represented in EcoTaxa. These rules have the form <query>: <update>: If the query expression matches the description, the update expression is applied.

field threshold: float = 0.9¶: Absolute threshold to apply to prediction scores. Any accepted prediction must obtain a higher score than threshold. If a score is below 1-threshold, a negative descriptor will be added to the description.

field threshold_relative: float = 0.0¶: Relative threshold to apply to prediction scores. Any accepted prediction must obtain a higher score than the next-best prediction’s score + threshold_relative.

pydantic model PredictionPipelineConfig¶

field input: EcoTaxaInputConfig [Required]¶: Configuration of the input.

field log_interval: str | float = '60s'¶: The interval at which progress is logged, e.g. 10s or 1m.

field model: ModelConfig [Required]¶: Configuration of the input.

field polytaxo: PolyTaxoConfig | Literal[False] = False¶: Predict object properties using a PolyTaxo classifier and store into an EcoTaxa archive.

field save_raw_h5: bool = False¶: Save raw predictions into an HDF5 file, e.g. for feature extraction.

field segmentation: SegmentationConfig | Literal[False] = False¶: Measure predicted segments and store into EcoTaxa archive. (Only applies for semantic segmentation.)

field target_dir: str [Required]¶: Directory where the output files are created.

pydantic model SegmentationConfig¶

field draw: bool = False¶: Draw segments.

field fill_holes: bool | Tuple[str, ...] = False¶: Fill holes in segments. Can be boolean or a list of channel names.

pydantic model TilingConfig¶

field size: int = 1024¶: Edge length of one tile

field stride: int = 896¶: Stride of the tiling. size - stride is the overlap of two consecutive tiles.

Prediction¶

Semantic segmentation¶

Feature calculation¶

Polyhierarchical classification¶

Configuration Example¶

Configuration Schema¶

MAZE-IPP

Navigation

Related Topics