LOKI Resegmentation¶
The maze-ipp loki command implements an image processing pipeline for the resegmentation of raw data captured by the LOKI imaging system.
It provides the following features:
Sample folder discovery
Merging of telemetry metadata.
Segmentation using thresholding or a deep learning model.
Duplicate Detection
Merging of existing EcoTaxa annotations
Generation of import-ready EcoTaxa archives
Logging and error handling
Progress reporting
YAML configuration.
Sample folder discovery¶
By default, the input path is searched for valid sample folders.
Sample folders are recognized if they contain the subfolders "Telemetrie" and "Pictures".
This sample folder discovery can be disabled by setting discover to false.
Configuration¶
Here is the example configuration:
## [required] Configuration of the input.
input:
## [required] Path to a LOKI input directory. May contain wildcard
## characters ('?', '*').
path: ...
## [optional] Try to discover all LOKI samples ('LOKI_XXXXX.XX') inside
## the specified path by looking for directories that contain 'Pictures'
## and 'Telemetrie' folders.
## This should only be set to False if 'Pictures' or 'Telemetrie' are
## missing.
## If False, the `path` pattern needs to point to exact locations of
## 'LOKI_XXXXX.XX' directories (containing 'Log', 'Pictures' and
## 'Telemetrie').
# discover: true
## [optional] Ignore these directories. May contain wildcard characters
## ('?', '*').
# ignore_patterns: []
## [optional] Filter input objects by Python expression.
# filter_expr: ...
## [optional] Default metadata for all objects.
# default_meta: {}
## [optional] EcoTaxa TSV file containing valid frame IDs.
## Input frames with no corresponding objects in this file will be
## skipped.
## For LOKI data, object_frame_id is usually the 'DDDDDDDD TTTTTT ttt'
## part of the object_id.
## If not present, object_frame_id is extracted from object_id.
# valid_frames_fn: ...
## [optional] Merge telemetry. (Default: true)
# merge_telemetry:
# ## [optional] Maximum delta between object time and telemetry time.
# # tolerance: ...
# ## OR ##
# merge_telemetry: false
## [optional] Detect duplicates. (Default: false)
# detect_duplicates:
# ## [optional] Minimum similarity of two objects.
# # min_similarity: 0.98
# ## [optional] Maximum age of a previous object.
# # max_age: 1
# ## OR ##
# detect_duplicates: false
## [required] Configuration of the segmentation.
segmentation:
## [optional] Use thresholding for segmentation.
# threshold:
# ## [required] Extract objects brighter than this threshold.
# threshold_brighter: ...
## [optional] Use a PyTorch model for segmentation.
# pytorch:
# ## [optional] Stitch objects to reconstruct frames.
# # stitch:
# # ## OR ##
# # stitch: false
# ## [required] A file containing a ScriptModule (or ScriptFunction)
# ## previously saved with :func:`torch.jit.save <torch.jit.save>`
# model_fn: ...
# ## [optional] A device to load and execute the model (e.g. 'cpu' or
# ## 'cuda:0').
# # device: "cpu"
# ## [optional] Number of threads that each execute an instance of the
# ## model.
# # n_threads: 0
# ## [optional] Batch size
# # batch_size: 0
# ## [optional] Enable automatic mixed precision inference to improve
# ## performance.
# # autocast: false
# ## [optional] Datatype to use for the processing (e.g. 'float32')
# # dtype: "float32"
# ## [optional] Perform full-frame post-processing steps.
# # postprocess:
# # ## [optional] Apply morphological closing (close small gaps) using this
# # ## radius.
# # # closing_radius: 0
# # ## [optional] Apply morphological opening (remove small objects) using
# # ## this radius.
# # # opening_radius: 0
# # ## [optional] Merge segments closer than the specified distance.
# # # merge_segments_distance: 0
# # ## [optional] Remove objects with an area below the specified threshold.
# # # min_area: 0
# # ## [optional] Use multiple threads to perform the post-processing.
# # # n_threads: 0
# # ## [optional] Clear objects touching the image border.
# # # clear_border: false
# # ## OR ##
# # postprocess: false
# ## [optional] Pad extracted regions with this number of pixels on each
# ## border.
# # padding: 75
# ## [optional] Minimum intensity of extracted regions.
# # min_intensity: null
# ## [optional] Hide everything in a vignette that is not part of current
# ## object.
# # apply_mask: false
# ## [optional] Color for the background when hiding foreign object parts.
# ## Can be a scalar (`0`), a tuple (`(r,g,b)=(255,0,0)`), a color name
# ## (`'black'`) or a quantile (`'quantile:0.25'`).
# # background_color: 0
# ## [optional] When hiding non-object image regions, keep background.
# # keep_background: true
## [optional] Filter objects by Python expression.
# filter_expr: ...
## [required] Configuration of the post-processing.
postprocess:
## [optional] Draw a scalebar on each object image.
# scalebar:
# ## [required] Pixels per millimeter.
# px_per_mm: ...
## [optional] Filter objects by Python expression.
# filter_expr: ...
## [optional] Detect duplicates.
# detect_duplicates:
# ## [optional] Minimum similarity of two objects.
# # min_similarity: 0.98
# ## [optional] Maximum age of a previous object.
# # max_age: 1
# ## OR ##
# detect_duplicates: false
## [optional] Merge annotations.
# merge_annotations:
# ## [required] EcoTaxa TSV file containing annotations for objects.
# ## Required columns are object_width, object_height, object_posx and
# ## object_posy(identifying the bounding box of an object) and
# ## object_frame_id (identifying the frame that an object is part of).
# ## For LOKI data, object_frame_id is usually the 'DDDDDDDD TTTTTT ttt'
# ## part of the object_id.
# ## If not present, object_frame_id is extracted from object_id.
# annotations_fn: ...
# ## [optional] Minimum overlap of object and annotation bounding box in
# ## IoU.
# # min_overlap: 0.5
# ## [optional] Minimum overlap of object and annotation bounding so that
# ## the resulting annotation_status remains 'validated'.
# # min_validated_overlap: 0.8
## [optional] Rescale the image intensities so that the brightest value
## is white.
# rescale_max_intensity: false
## [required] Configuration of the output.
output:
## [required] Directory where the EcoTaxa archives are created.
target_dir: ...
## [optional] Skip if archive already exists.
# skip_existing: false
## [optional] Format string for the names of image files inside the
## archive. All fields in metadata can be used.
# image_fn: "{object_id}.jpg"
## [optional] Store the mask of each object alongside its image.
# store_mask: false
## [optional] Include a type header in the produced TSV file. (Required
## for successful import into EcoTaxa.)
# type_header: true
## [optional] The interval at which progress is logged, e.g. 10s or 1m.
# log_interval: ...
The example configuration can be generated using the maze-ipp config loki command.
Configuration Schema¶
This is the complete documentation for the configuration of the pipeline:
- pydantic model DetectDuplicatesConfig¶
- field max_age: int = 1¶
Maximum age of a previous object.
- field min_similarity: float = 0.98¶
Minimum similarity of two objects.
- pydantic model EcoTaxaOutputConfig¶
- field image_fn: str = '{object_id}.jpg'¶
Format string for the names of image files inside the archive. All fields in metadata can be used.
- field skip_existing: bool = False¶
Skip if archive already exists.
- field store_mask: bool = False¶
Store the mask of each object alongside its image.
- field target_dir: str [Required]¶
Directory where the EcoTaxa archives are created.
- field type_header: bool = True¶
Include a type header in the produced TSV file. (Required for successful import into EcoTaxa.)
- pydantic model LokiInputConfig¶
- field default_meta: Dict = {}¶
Default metadata for all objects.
- field detect_duplicates: DetectDuplicatesConfig | Literal[False] = False¶
Detect duplicates. (Default: false)
- field discover: bool = True¶
Try to discover all LOKI samples (‘LOKI_XXXXX.XX’) inside the specified path by looking for directories that contain ‘Pictures’ and ‘Telemetrie’ folders. This should only be set to False if ‘Pictures’ or ‘Telemetrie’ are missing. If False, the
pathpattern needs to point to exact locations of ‘LOKI_XXXXX.XX’ directories (containing ‘Log’, ‘Pictures’ and ‘Telemetrie’).
- field filter_expr: str | None = None¶
Filter input objects by Python expression.
- field ignore_patterns: List[str] = []¶
Ignore these directories. May contain wildcard characters (‘?’, ‘*’).
- field merge_telemetry: MergeTelemetryConfig | Literal[False] [Optional]¶
Merge telemetry. (Default: true)
- field path: str [Required]¶
Path to a LOKI input directory. May contain wildcard characters (‘?’, ‘*’).
- field save_meta: bool = False¶
Save calculated input metadata in the target directory (for debugging).
- field slice: int | None = None¶
Process only this many objects (for debugging).
- field valid_frames_fn: str | None = None¶
EcoTaxa TSV file containing valid frame IDs. Input frames with no corresponding objects in this file will be skipped. For LOKI data, object_frame_id is usually the ‘DDDDDDDD TTTTTT ttt’ part of the object_id. If not present, object_frame_id is extracted from object_id.
- pydantic model MergeAnnotationsConfig¶
- field annotations_fn: str [Required]¶
EcoTaxa TSV file containing annotations for objects. Required columns are object_width, object_height, object_posx and object_posy(identifying the bounding box of an object) and object_frame_id (identifying the frame that an object is part of). For LOKI data, object_frame_id is usually the ‘DDDDDDDD TTTTTT ttt’ part of the object_id. If not present, object_frame_id is extracted from object_id.
- field min_overlap: float = 0.5¶
Minimum overlap of object and annotation bounding box in IoU.
- field min_validated_overlap: float = 0.8¶
Minimum overlap of object and annotation bounding so that the resulting annotation_status remains ‘validated’.
- pydantic model MergeTelemetryConfig¶
- field tolerance: str | None = None¶
Maximum delta between object time and telemetry time.
- pydantic model PostprocessingConfig¶
- field detect_duplicates: DetectDuplicatesConfig | Literal[False] = False¶
Detect duplicates.
- field filter_expr: str | None = None¶
Filter objects by Python expression.
- field merge_annotations: MergeAnnotationsConfig | None = None¶
Merge annotations.
- field rescale_max_intensity: bool = False¶
Rescale the image intensities so that the brightest value is white.
- field scalebar: ScalebarConfig | None = None¶
Draw a scalebar on each object image.
- field slice: int | None = None¶
Process only this many objects (for debugging).
- pydantic model PytorchSegmentationConfig¶
- field apply_mask: bool = False¶
Hide everything in a vignette that is not part of current object.
- field autocast: bool = False¶
Enable automatic mixed precision inference to improve performance.
- field background_color: Any = 0¶
Color for the background when hiding foreign object parts. Can be a scalar (
0), a tuple ((r,g,b)=(255,0,0)), a color name ('black') or a quantile ('quantile:0.25').
- field batch_size: int = 0¶
Batch size
- field device: str = 'cpu'¶
A device to load and execute the model (e.g. ‘cpu’ or ‘cuda:0’).
- field dtype: str = 'float32'¶
Datatype to use for the processing (e.g. ‘float32’)
- field full_frame_archive_fn: str | None = None¶
Write segmented full-frames to this file in the target directory (debug).
- field keep_background: bool = True¶
When hiding non-object image regions, keep background.
- field min_intensity: int = None¶
Minimum intensity of extracted regions.
- field model_fn: str [Required]¶
A file containing a ScriptModule (or ScriptFunction) previously saved with
torch.jit.save
- field n_threads: int = 0¶
Number of threads that each execute an instance of the model.
- field padding: int = 75¶
Pad extracted regions with this number of pixels on each border.
- field postprocess: SegmentationPostprocessingConfig | Literal[False] = False¶
Perform full-frame post-processing steps.
- field stitch: StitchConfig | Literal[False] = True¶
Stitch objects to reconstruct frames.
- pydantic model SegmentationConfig¶
- field filter_expr: str | None = None¶
Filter objects by Python expression.
- field pytorch: PytorchSegmentationConfig | None = None¶
Use a PyTorch model for segmentation.
- field threshold: ThresholdSegmentationConfig | None = None¶
Use thresholding for segmentation.
- validator parse_shortform » all fields¶
- pydantic model SegmentationPipelineConfig¶
- field input: LokiInputConfig [Required]¶
Configuration of the input.
- field log_interval: str | float = '60s'¶
The interval at which progress is logged, e.g. 10s or 1m.
- field output: EcoTaxaOutputConfig [Required]¶
Configuration of the output.
- field postprocess: PostprocessingConfig [Required]¶
Configuration of the post-processing.
- field segmentation: SegmentationConfig [Required]¶
Configuration of the segmentation.
- pydantic model SegmentationPostprocessingConfig¶
- field clear_border: bool = False¶
Clear objects touching the image border.
- field closing_radius: int = 0¶
Apply morphological closing (close small gaps) using this radius.
- field merge_segments_distance: int = 0¶
Merge segments closer than the specified distance.
- field min_area: int = 0¶
Remove objects with an area below the specified threshold.
- field n_threads: int = 0¶
Use multiple threads to perform the post-processing.
- field opening_radius: int = 0¶
Apply morphological opening (remove small objects) using this radius.
Merging existing annotations¶
In the case of reprocessing, it is sometimes necessary to merge existing annotations into the new dataset.
These can be extracted from an EcoTaxa export using the pyecotaxa extract-meta helper command:
pyecotaxa extract-meta --fix-bbox LOKI <INPUT_ARCH_FN> <OUTPUT_TSV_FN>