LOKI Resegmentation

The maze-ipp loki command implements an image processing pipeline for the resegmentation of raw data captured by the LOKI imaging system.

It provides the following features:

  • Sample folder discovery

  • Merging of telemetry metadata.

  • Segmentation using thresholding or a deep learning model.

  • Duplicate Detection

  • Merging of existing EcoTaxa annotations

  • Generation of import-ready EcoTaxa archives

  • Logging and error handling

  • Progress reporting

  • YAML configuration.

Sample folder discovery

By default, the input path is searched for valid sample folders. Sample folders are recognized if they contain the subfolders "Telemetrie" and "Pictures". This sample folder discovery can be disabled by setting discover to false.

Configuration

Here is the example configuration:

## [required] Configuration of the input.
input:
  ## [required] Path to a LOKI input directory. May contain wildcard
  ## characters ('?', '*').
  path: ...

  ## [optional] Try to discover all LOKI samples ('LOKI_XXXXX.XX') inside
  ## the specified path by looking for directories that contain 'Pictures'
  ## and 'Telemetrie' folders.
  ## This should only be set to False if 'Pictures' or 'Telemetrie' are
  ## missing.
  ## If False, the `path` pattern needs to point to exact locations of
  ## 'LOKI_XXXXX.XX' directories (containing 'Log', 'Pictures' and
  ## 'Telemetrie').
  # discover: true

  ## [optional] Ignore these directories. May contain wildcard characters
  ## ('?', '*').
  # ignore_patterns: []

  ## [optional] Filter input objects by Python expression.
  # filter_expr: ...

  ## [optional] Default metadata for all objects.
  # default_meta: {}

  ## [optional] EcoTaxa TSV file containing valid frame IDs.
  ## Input frames with no corresponding objects in this file will be
  ## skipped.
  ## For LOKI data, object_frame_id is usually the 'DDDDDDDD TTTTTT  ttt'
  ## part of the object_id.
  ## If not present, object_frame_id is extracted from object_id.
  # valid_frames_fn: ...

  ## [optional] Merge telemetry. (Default: true)
  # merge_telemetry:
  #   ## [optional] Maximum delta between object time and telemetry time.
  #   # tolerance: ...
  # ## OR ##
  # merge_telemetry: false

  ## [optional] Detect duplicates. (Default: false)
  # detect_duplicates:
  #   ## [optional] Minimum similarity of two objects.
  #   # min_similarity: 0.98

  #   ## [optional] Maximum age of a previous object.
  #   # max_age: 1
  # ## OR ##
  # detect_duplicates: false

## [required] Configuration of the segmentation.
segmentation:
  ## [optional] Use thresholding for segmentation.
  # threshold:
  #   ## [required] Extract objects brighter than this threshold.
  #   threshold_brighter: ...

  ## [optional] Use a PyTorch model for segmentation.
  # pytorch:
  #   ## [optional] Stitch objects to reconstruct frames.
  #   # stitch:

  #   # ## OR ##
  #   # stitch: false

  #   ## [required] A file containing a ScriptModule (or ScriptFunction)
  #   ## previously saved with :func:`torch.jit.save <torch.jit.save>`
  #   model_fn: ...

  #   ## [optional] A device to load and execute the model (e.g. 'cpu' or
  #   ## 'cuda:0').
  #   # device: "cpu"

  #   ## [optional] Number of threads that each execute an instance of the
  #   ## model.
  #   # n_threads: 0

  #   ## [optional] Batch size
  #   # batch_size: 0

  #   ## [optional] Enable automatic mixed precision inference to improve
  #   ## performance.
  #   # autocast: false

  #   ## [optional] Datatype to use for the processing (e.g. 'float32')
  #   # dtype: "float32"

  #   ## [optional] Perform full-frame post-processing steps.
  #   # postprocess:
  #   #   ## [optional] Apply morphological closing (close small gaps) using this
  #   #   ## radius.
  #   #   # closing_radius: 0

  #   #   ## [optional] Apply morphological opening (remove small objects) using
  #   #   ## this radius.
  #   #   # opening_radius: 0

  #   #   ## [optional] Merge segments closer than the specified distance.
  #   #   # merge_segments_distance: 0

  #   #   ## [optional] Remove objects with an area below the specified threshold.
  #   #   # min_area: 0

  #   #   ## [optional] Use multiple threads to perform the post-processing.
  #   #   # n_threads: 0

  #   #   ## [optional] Clear objects touching the image border.
  #   #   # clear_border: false
  #   # ## OR ##
  #   # postprocess: false

  #   ## [optional] Pad extracted regions with this number of pixels on each
  #   ## border.
  #   # padding: 75

  #   ## [optional] Minimum intensity of extracted regions.
  #   # min_intensity: null

  #   ## [optional] Hide everything in a vignette that is not part of current
  #   ## object.
  #   # apply_mask: false

  #   ## [optional] Color for the background when hiding foreign object parts.
  #   ## Can be a scalar (`0`), a tuple (`(r,g,b)=(255,0,0)`), a color name
  #   ## (`'black'`) or a quantile (`'quantile:0.25'`).
  #   # background_color: 0

  #   ## [optional] When hiding non-object image regions, keep background.
  #   # keep_background: true

  ## [optional] Filter objects by Python expression.
  # filter_expr: ...

## [required] Configuration of the post-processing.
postprocess:
  ## [optional] Draw a scalebar on each object image.
  # scalebar:
  #   ## [required] Pixels per millimeter.
  #   px_per_mm: ...

  ## [optional] Filter objects by Python expression.
  # filter_expr: ...

  ## [optional] Detect duplicates.
  # detect_duplicates:
  #   ## [optional] Minimum similarity of two objects.
  #   # min_similarity: 0.98

  #   ## [optional] Maximum age of a previous object.
  #   # max_age: 1
  # ## OR ##
  # detect_duplicates: false

  ## [optional] Merge annotations.
  # merge_annotations:
  #   ## [required] EcoTaxa TSV file containing annotations for objects.
  #   ## Required columns are object_width, object_height, object_posx and
  #   ## object_posy(identifying the bounding box of an object) and
  #   ## object_frame_id (identifying the frame that an object is part of).
  #   ## For LOKI data, object_frame_id is usually the 'DDDDDDDD TTTTTT  ttt'
  #   ## part of the object_id.
  #   ## If not present, object_frame_id is extracted from object_id.
  #   annotations_fn: ...

  #   ## [optional] Minimum overlap of object and annotation bounding box in
  #   ## IoU.
  #   # min_overlap: 0.5

  #   ## [optional] Minimum overlap of object and annotation bounding so that
  #   ## the resulting annotation_status remains 'validated'.
  #   # min_validated_overlap: 0.8

  ## [optional] Rescale the image intensities so that the brightest value
  ## is white.
  # rescale_max_intensity: false

## [required] Configuration of the output.
output:
  ## [required] Directory where the EcoTaxa archives are created.
  target_dir: ...

  ## [optional] Skip if archive already exists.
  # skip_existing: false

  ## [optional] Format string for the names of image files inside the
  ## archive. All fields in metadata can be used.
  # image_fn: "{object_id}.jpg"

  ## [optional] Store the mask of each object alongside its image.
  # store_mask: false

  ## [optional] Include a type header in the produced TSV file. (Required
  ## for successful import into EcoTaxa.)
  # type_header: true

## [optional] The interval at which progress is logged, e.g. 10s or 1m.
# log_interval: ...

The example configuration can be generated using the maze-ipp config loki command.

Configuration Schema

This is the complete documentation for the configuration of the pipeline:

pydantic model DetectDuplicatesConfig
field max_age: int = 1

Maximum age of a previous object.

field min_similarity: float = 0.98

Minimum similarity of two objects.

pydantic model EcoTaxaOutputConfig
field image_fn: str = '{object_id}.jpg'

Format string for the names of image files inside the archive. All fields in metadata can be used.

field skip_existing: bool = False

Skip if archive already exists.

field store_mask: bool = False

Store the mask of each object alongside its image.

field target_dir: str [Required]

Directory where the EcoTaxa archives are created.

field type_header: bool = True

Include a type header in the produced TSV file. (Required for successful import into EcoTaxa.)

pydantic model LokiInputConfig
field default_meta: Dict = {}

Default metadata for all objects.

field detect_duplicates: DetectDuplicatesConfig | Literal[False] = False

Detect duplicates. (Default: false)

field discover: bool = True

Try to discover all LOKI samples (‘LOKI_XXXXX.XX’) inside the specified path by looking for directories that contain ‘Pictures’ and ‘Telemetrie’ folders. This should only be set to False if ‘Pictures’ or ‘Telemetrie’ are missing. If False, the path pattern needs to point to exact locations of ‘LOKI_XXXXX.XX’ directories (containing ‘Log’, ‘Pictures’ and ‘Telemetrie’).

field filter_expr: str | None = None

Filter input objects by Python expression.

field ignore_patterns: List[str] = []

Ignore these directories. May contain wildcard characters (‘?’, ‘*’).

field merge_telemetry: MergeTelemetryConfig | Literal[False] [Optional]

Merge telemetry. (Default: true)

field path: str [Required]

Path to a LOKI input directory. May contain wildcard characters (‘?’, ‘*’).

field save_meta: bool = False

Save calculated input metadata in the target directory (for debugging).

field slice: int | None = None

Process only this many objects (for debugging).

field valid_frames_fn: str | None = None

EcoTaxa TSV file containing valid frame IDs. Input frames with no corresponding objects in this file will be skipped. For LOKI data, object_frame_id is usually the ‘DDDDDDDD TTTTTT ttt’ part of the object_id. If not present, object_frame_id is extracted from object_id.

pydantic model MergeAnnotationsConfig
field annotations_fn: str [Required]

EcoTaxa TSV file containing annotations for objects. Required columns are object_width, object_height, object_posx and object_posy(identifying the bounding box of an object) and object_frame_id (identifying the frame that an object is part of). For LOKI data, object_frame_id is usually the ‘DDDDDDDD TTTTTT ttt’ part of the object_id. If not present, object_frame_id is extracted from object_id.

field min_overlap: float = 0.5

Minimum overlap of object and annotation bounding box in IoU.

field min_validated_overlap: float = 0.8

Minimum overlap of object and annotation bounding so that the resulting annotation_status remains ‘validated’.

pydantic model MergeTelemetryConfig
field tolerance: str | None = None

Maximum delta between object time and telemetry time.

pydantic model PostprocessingConfig
field detect_duplicates: DetectDuplicatesConfig | Literal[False] = False

Detect duplicates.

field filter_expr: str | None = None

Filter objects by Python expression.

field merge_annotations: MergeAnnotationsConfig | None = None

Merge annotations.

field rescale_max_intensity: bool = False

Rescale the image intensities so that the brightest value is white.

field scalebar: ScalebarConfig | None = None

Draw a scalebar on each object image.

field slice: int | None = None

Process only this many objects (for debugging).

pydantic model PytorchSegmentationConfig
field apply_mask: bool = False

Hide everything in a vignette that is not part of current object.

field autocast: bool = False

Enable automatic mixed precision inference to improve performance.

field background_color: Any = 0

Color for the background when hiding foreign object parts. Can be a scalar (0), a tuple ((r,g,b)=(255,0,0)), a color name ('black') or a quantile ('quantile:0.25').

field batch_size: int = 0

Batch size

field device: str = 'cpu'

A device to load and execute the model (e.g. ‘cpu’ or ‘cuda:0’).

field dtype: str = 'float32'

Datatype to use for the processing (e.g. ‘float32’)

field full_frame_archive_fn: str | None = None

Write segmented full-frames to this file in the target directory (debug).

field keep_background: bool = True

When hiding non-object image regions, keep background.

field min_intensity: int = None

Minimum intensity of extracted regions.

field model_fn: str [Required]

A file containing a ScriptModule (or ScriptFunction) previously saved with torch.jit.save

field n_threads: int = 0

Number of threads that each execute an instance of the model.

field padding: int = 75

Pad extracted regions with this number of pixels on each border.

field postprocess: SegmentationPostprocessingConfig | Literal[False] = False

Perform full-frame post-processing steps.

field stitch: StitchConfig | Literal[False] = True

Stitch objects to reconstruct frames.

pydantic model ScalebarConfig
field px_per_mm: float [Required]

Pixels per millimeter.

pydantic model SegmentationConfig
field filter_expr: str | None = None

Filter objects by Python expression.

field pytorch: PytorchSegmentationConfig | None = None

Use a PyTorch model for segmentation.

field threshold: ThresholdSegmentationConfig | None = None

Use thresholding for segmentation.

validator parse_shortform  »  all fields
pydantic model SegmentationPipelineConfig
field input: LokiInputConfig [Required]

Configuration of the input.

field log_interval: str | float = '60s'

The interval at which progress is logged, e.g. 10s or 1m.

field output: EcoTaxaOutputConfig [Required]

Configuration of the output.

field postprocess: PostprocessingConfig [Required]

Configuration of the post-processing.

field segmentation: SegmentationConfig [Required]

Configuration of the segmentation.

pydantic model SegmentationPostprocessingConfig
field clear_border: bool = False

Clear objects touching the image border.

field closing_radius: int = 0

Apply morphological closing (close small gaps) using this radius.

field merge_segments_distance: int = 0

Merge segments closer than the specified distance.

field min_area: int = 0

Remove objects with an area below the specified threshold.

field n_threads: int = 0

Use multiple threads to perform the post-processing.

field opening_radius: int = 0

Apply morphological opening (remove small objects) using this radius.

pydantic model StitchConfig
field skip_single: bool = False

Remove stitched frames with only one object (debug).

pydantic model ThresholdSegmentationConfig
field threshold_brighter: float [Required]

Extract objects brighter than this threshold.

Merging existing annotations

In the case of reprocessing, it is sometimes necessary to merge existing annotations into the new dataset. These can be extracted from an EcoTaxa export using the pyecotaxa extract-meta helper command:

pyecotaxa extract-meta --fix-bbox LOKI <INPUT_ARCH_FN> <OUTPUT_TSV_FN>