nirs4all.analysis.transfer_utils module

Transfer Selection Utilities.

This module provides utility functions for preprocessing application, pipeline generation, and dataset handling in transfer learning scenarios.

Supports both object-based and string-based preprocessing definitions: - Object-based (recommended): Pass transformer instances directly - String-based (legacy): Use string names that resolve to base preprocessings

nirs4all.analysis.transfer_utils.apply_augmentation(X: ndarray, pipelines: List[str | List[Any] | Any], preprocessings: Dict[str, Any] | None = None) → ndarray[source]

Apply multiple pipelines and concatenate their outputs.

Supports both object-based and string-based pipeline definitions.

Parameters:

X – Input data matrix (n_samples, n_features).
pipelines – List of pipelines. Each can be: - A transformer object - A list of transformer objects (stacked) - A string name (legacy, resolved from preprocessings)
preprocessings – Optional dictionary of transforms (for string resolution).

Returns:

Horizontally stacked transformed features.

Example

>>> # Object-based (recommended)
>>> apply_augmentation(X, [StandardNormalVariate(), [MSC(), FirstDerivative()]])
>>> # String-based (legacy)
>>> apply_augmentation(X, ["snv", "msc>d1"])

nirs4all.analysis.transfer_utils.apply_pipeline(X: ndarray, transforms: List[Any]) → ndarray[source]

Apply a sequence of transforms to X.

Parameters:

X – Input data matrix (n_samples, n_features).
transforms – List of transformer instances.

Returns:

Transformed data matrix.

Example

>>> from nirs4all.operators.transforms import StandardNormalVariate, FirstDerivative
>>> transforms = [StandardNormalVariate(), FirstDerivative()]
>>> X_transformed = apply_pipeline(X, transforms)

nirs4all.analysis.transfer_utils.apply_preprocessing_objects(X: ndarray, transforms: Any | List[Any]) → ndarray[source]

Apply preprocessing object(s) to X.

This is the primary function for object-based preprocessing.

Parameters:

X – Input data matrix (n_samples, n_features).
transforms – Single transformer or list of transformers.

Returns:

Transformed data matrix.

Example

>>> from nirs4all.operators.transforms import StandardNormalVariate, FirstDerivative
>>> X_t = apply_preprocessing_objects(X, [StandardNormalVariate(), FirstDerivative()])
>>> X_t = apply_preprocessing_objects(X, StandardNormalVariate())  # single

nirs4all.analysis.transfer_utils.apply_single_preprocessing(X: ndarray, pp_name: str, preprocessings: Dict[str, Any] | None = None) → ndarray[source]

Apply a single preprocessing by name.

Parameters:

X – Input data matrix (n_samples, n_features).
pp_name – Name of the preprocessing (e.g., “snv”, “d1”).
preprocessings – Optional dictionary of transforms. Uses base if None.

Returns:

Transformed data matrix.

nirs4all.analysis.transfer_utils.apply_stacked_pipeline(X: ndarray, pipeline: str | List[Any], preprocessings: Dict[str, Any] | None = None) → ndarray[source]

Apply a stacked pipeline to X.

Supports both: - Object-based: List of transformer instances - String-based (legacy): Pipeline name with “>” separator (e.g., “snv>d1>msc”)

Parameters:

X – Input data matrix (n_samples, n_features).
pipeline – Either a list of transformer objects or a string name.
preprocessings – Optional dictionary of transforms (for string resolution).

Returns:

Transformed data matrix.

Example

>>> # Object-based (recommended)
>>> apply_stacked_pipeline(X, [StandardNormalVariate(), FirstDerivative()])
>>> # String-based (legacy)
>>> apply_stacked_pipeline(X, "snv>d1")

nirs4all.analysis.transfer_utils.format_pipeline_name(name: str, max_length: int = 40) → str[source]

Format a pipeline name for display.

Parameters:

name – Pipeline name (e.g., “snv>d1>msc”).
max_length – Maximum length before truncation.

Returns:

Formatted name with potential truncation.

nirs4all.analysis.transfer_utils.generate_augmentation_combinations(top_k_names: List[str], max_order: int = 2) → List[Tuple[str, List[str]]][source]

Generate feature augmentation combinations from top-K pipelines.

Feature augmentation concatenates outputs from multiple preprocessings.

Parameters:

top_k_names – List of pipeline names from top-K selection.
max_order – Maximum number of pipelines to combine (2 or 3).

Returns:

List of (name, component_names) tuples.

Example

>>> names = ["snv", "d1", "msc"]
>>> combos = generate_augmentation_combinations(names, max_order=2)
>>> # Returns 2-way combinations like ("snv+d1", ["snv", "d1"])

nirs4all.analysis.transfer_utils.generate_object_augmentation_combinations(transforms: List[Any], max_order: int = 2) → List[Tuple[str, List[Any]]][source]

Generate augmentation combinations from transformer objects.

Object-based alternative to generate_augmentation_combinations.

Parameters:

transforms – List of transformer objects or stacked lists.
max_order – Maximum number of transforms to combine.

Returns:

List of (display_name, transforms_list) tuples.

nirs4all.analysis.transfer_utils.generate_object_stacked_pipelines(transforms: List[Any], max_depth: int = 2) → List[Tuple[str, List[Any]]][source]

Generate stacked pipeline combinations from transformer objects.

Object-based alternative to generate_stacked_pipelines.

Parameters:

transforms – List of transformer objects.
max_depth – Maximum pipeline depth.

Returns:

List of (display_name, transforms_list) tuples.

Example

>>> transforms = [StandardNormalVariate(), FirstDerivative()]
>>> pipelines = generate_object_stacked_pipelines(transforms, max_depth=2)
>>> # Returns: [("StandardNormalVariate", [SNV()]),
>>> #           ("FirstDerivative", [D1()]),
>>> #           ("StandardNormalVariate>FirstDerivative", [SNV(), D1()]),
>>> #           ...]

nirs4all.analysis.transfer_utils.generate_stacked_pipelines(preprocessings: Dict[str, Any], max_depth: int = 2, exclude: List[str] | None = None) → List[Tuple[str, List[str], List[Any]]][source]

Generate stacked pipeline combinations.

Parameters:

preprocessings – Dictionary of available transforms.
max_depth – Maximum pipeline depth (1 to max_depth).
exclude – List of preprocessing names to exclude.

Returns:

List of (name, component_names, transforms) tuples.

Example

>>> pp = {"snv": snv_transform, "d1": d1_transform}
>>> pipelines = generate_stacked_pipelines(pp, max_depth=2)
>>> # Returns: [("snv", ["snv"], [snv]), ("d1", ["d1"], [d1]),
>>> #           ("snv>d1", ["snv", "d1"], [snv, d1]),
>>> #           ("d1>snv", ["d1", "snv"], [d1, snv])]

nirs4all.analysis.transfer_utils.generate_top_k_stacked_pipelines(top_k_names: List[str], preprocessings: Dict[str, Any], max_depth: int = 2) → List[Tuple[str, List[str], List[Any]]][source]

Generate stacked pipeline combinations from top-K selected preprocessings.

More efficient than generate_stacked_pipelines when starting from a reduced set of candidates.

Parameters:

top_k_names – List of preprocessing names from top-K selection.
preprocessings – Dictionary of available transforms.
max_depth – Maximum pipeline depth.

Returns:

List of (name, component_names, transforms) tuples.

nirs4all.analysis.transfer_utils.get_base_preprocessings() → Dict[str, Any][source]

Get the base set of preprocessing transforms.

Returns:: Dictionary mapping names to transformer instances.

Example

>>> preprocessings = get_base_preprocessings()
>>> snv = preprocessings["snv"]
>>> X_transformed = snv.fit_transform(X)

nirs4all.analysis.transfer_utils.get_transform_name(obj: Any) → str[source]

Get a readable name from a transformer object.

Parameters:: obj – Transformer instance or string.
Returns:: Human-readable name for the transform.

Example

>>> get_transform_name(StandardNormalVariate())
'StandardNormalVariate'
>>> get_transform_name(SavitzkyGolay(window_length=15))
'SavitzkyGolay'

nirs4all.analysis.transfer_utils.get_transform_signature(obj: Any) → str[source]

Get a unique signature for a transformer (for deduplication).

Includes class name and parameters if available.

Parameters:: obj – Transformer instance.
Returns:: Unique signature string.

Example

>>> get_transform_signature(SavitzkyGolay(window_length=15))
'SavitzkyGolay(polyorder=3,window_length=15)'

nirs4all.analysis.transfer_utils.normalize_preprocessing(item: Any | str | None, registry: Dict[str, Any] | None = None) → Any[source]

Normalize a preprocessing item to a transformer object.

Handles both object instances and string names (for backward compat).

Parameters:

item – Transformer instance, string name, or None.
registry – Optional name->object mapping for string resolution.

Returns:

Transformer instance (or None).

Raises:

ValueError – If string name not found in registry.

Example

>>> normalize_preprocessing(StandardNormalVariate())
StandardNormalVariate()
>>> normalize_preprocessing("snv")  # looks up in base preprocessings
StandardNormalVariate()

nirs4all.analysis.transfer_utils.normalize_preprocessing_list(items: List[Any | str | None], registry: Dict[str, Any] | None = None) → List[Any][source]

Normalize a list of preprocessing items to transformer objects.

Parameters:

items – List of transformer instances or string names.
registry – Optional name->object mapping.

Returns:

List of transformer instances (None values filtered out).

nirs4all.analysis.transfer_utils.validate_datasets(X_source: ndarray, X_target: ndarray, require_same_features: bool = True) → Tuple[ndarray, ndarray][source]

Validate and prepare source/target datasets for transfer analysis.

Parameters:

X_source – Source dataset.
X_target – Target dataset.
require_same_features – If True, require same number of features.

Returns:

Tuple of validated (X_source, X_target) arrays.

Raises:

ValueError – If datasets have incompatible shapes.