nirs4all.analysis.transfer_utils module
Transfer Selection Utilities.
This module provides utility functions for preprocessing application, pipeline generation, and dataset handling in transfer learning scenarios.
Supports both object-based and string-based preprocessing definitions: - Object-based (recommended): Pass transformer instances directly - String-based (legacy): Use string names that resolve to base preprocessings
- nirs4all.analysis.transfer_utils.apply_augmentation(X: ndarray, pipelines: List[str | List[Any] | Any], preprocessings: Dict[str, Any] | None = None) ndarray[source]
Apply multiple pipelines and concatenate their outputs.
Supports both object-based and string-based pipeline definitions.
- Parameters:
X – Input data matrix (n_samples, n_features).
pipelines – List of pipelines. Each can be: - A transformer object - A list of transformer objects (stacked) - A string name (legacy, resolved from preprocessings)
preprocessings – Optional dictionary of transforms (for string resolution).
- Returns:
Horizontally stacked transformed features.
Example
>>> # Object-based (recommended) >>> apply_augmentation(X, [StandardNormalVariate(), [MSC(), FirstDerivative()]]) >>> # String-based (legacy) >>> apply_augmentation(X, ["snv", "msc>d1"])
- nirs4all.analysis.transfer_utils.apply_pipeline(X: ndarray, transforms: List[Any]) ndarray[source]
Apply a sequence of transforms to X.
- Parameters:
X – Input data matrix (n_samples, n_features).
transforms – List of transformer instances.
- Returns:
Transformed data matrix.
Example
>>> from nirs4all.operators.transforms import StandardNormalVariate, FirstDerivative >>> transforms = [StandardNormalVariate(), FirstDerivative()] >>> X_transformed = apply_pipeline(X, transforms)
- nirs4all.analysis.transfer_utils.apply_preprocessing_objects(X: ndarray, transforms: Any | List[Any]) ndarray[source]
Apply preprocessing object(s) to X.
This is the primary function for object-based preprocessing.
- Parameters:
X – Input data matrix (n_samples, n_features).
transforms – Single transformer or list of transformers.
- Returns:
Transformed data matrix.
Example
>>> from nirs4all.operators.transforms import StandardNormalVariate, FirstDerivative >>> X_t = apply_preprocessing_objects(X, [StandardNormalVariate(), FirstDerivative()]) >>> X_t = apply_preprocessing_objects(X, StandardNormalVariate()) # single
- nirs4all.analysis.transfer_utils.apply_single_preprocessing(X: ndarray, pp_name: str, preprocessings: Dict[str, Any] | None = None) ndarray[source]
Apply a single preprocessing by name.
- Parameters:
X – Input data matrix (n_samples, n_features).
pp_name – Name of the preprocessing (e.g., “snv”, “d1”).
preprocessings – Optional dictionary of transforms. Uses base if None.
- Returns:
Transformed data matrix.
- nirs4all.analysis.transfer_utils.apply_stacked_pipeline(X: ndarray, pipeline: str | List[Any], preprocessings: Dict[str, Any] | None = None) ndarray[source]
Apply a stacked pipeline to X.
Supports both: - Object-based: List of transformer instances - String-based (legacy): Pipeline name with “>” separator (e.g., “snv>d1>msc”)
- Parameters:
X – Input data matrix (n_samples, n_features).
pipeline – Either a list of transformer objects or a string name.
preprocessings – Optional dictionary of transforms (for string resolution).
- Returns:
Transformed data matrix.
Example
>>> # Object-based (recommended) >>> apply_stacked_pipeline(X, [StandardNormalVariate(), FirstDerivative()]) >>> # String-based (legacy) >>> apply_stacked_pipeline(X, "snv>d1")
- nirs4all.analysis.transfer_utils.format_pipeline_name(name: str, max_length: int = 40) str[source]
Format a pipeline name for display.
- Parameters:
name – Pipeline name (e.g., “snv>d1>msc”).
max_length – Maximum length before truncation.
- Returns:
Formatted name with potential truncation.
- nirs4all.analysis.transfer_utils.generate_augmentation_combinations(top_k_names: List[str], max_order: int = 2) List[Tuple[str, List[str]]][source]
Generate feature augmentation combinations from top-K pipelines.
Feature augmentation concatenates outputs from multiple preprocessings.
- Parameters:
top_k_names – List of pipeline names from top-K selection.
max_order – Maximum number of pipelines to combine (2 or 3).
- Returns:
List of (name, component_names) tuples.
Example
>>> names = ["snv", "d1", "msc"] >>> combos = generate_augmentation_combinations(names, max_order=2) >>> # Returns 2-way combinations like ("snv+d1", ["snv", "d1"])
- nirs4all.analysis.transfer_utils.generate_object_augmentation_combinations(transforms: List[Any], max_order: int = 2) List[Tuple[str, List[Any]]][source]
Generate augmentation combinations from transformer objects.
Object-based alternative to generate_augmentation_combinations.
- Parameters:
transforms – List of transformer objects or stacked lists.
max_order – Maximum number of transforms to combine.
- Returns:
List of (display_name, transforms_list) tuples.
- nirs4all.analysis.transfer_utils.generate_object_stacked_pipelines(transforms: List[Any], max_depth: int = 2) List[Tuple[str, List[Any]]][source]
Generate stacked pipeline combinations from transformer objects.
Object-based alternative to generate_stacked_pipelines.
- Parameters:
transforms – List of transformer objects.
max_depth – Maximum pipeline depth.
- Returns:
List of (display_name, transforms_list) tuples.
Example
>>> transforms = [StandardNormalVariate(), FirstDerivative()] >>> pipelines = generate_object_stacked_pipelines(transforms, max_depth=2) >>> # Returns: [("StandardNormalVariate", [SNV()]), >>> # ("FirstDerivative", [D1()]), >>> # ("StandardNormalVariate>FirstDerivative", [SNV(), D1()]), >>> # ...]
- nirs4all.analysis.transfer_utils.generate_stacked_pipelines(preprocessings: Dict[str, Any], max_depth: int = 2, exclude: List[str] | None = None) List[Tuple[str, List[str], List[Any]]][source]
Generate stacked pipeline combinations.
- Parameters:
preprocessings – Dictionary of available transforms.
max_depth – Maximum pipeline depth (1 to max_depth).
exclude – List of preprocessing names to exclude.
- Returns:
List of (name, component_names, transforms) tuples.
Example
>>> pp = {"snv": snv_transform, "d1": d1_transform} >>> pipelines = generate_stacked_pipelines(pp, max_depth=2) >>> # Returns: [("snv", ["snv"], [snv]), ("d1", ["d1"], [d1]), >>> # ("snv>d1", ["snv", "d1"], [snv, d1]), >>> # ("d1>snv", ["d1", "snv"], [d1, snv])]
- nirs4all.analysis.transfer_utils.generate_top_k_stacked_pipelines(top_k_names: List[str], preprocessings: Dict[str, Any], max_depth: int = 2) List[Tuple[str, List[str], List[Any]]][source]
Generate stacked pipeline combinations from top-K selected preprocessings.
More efficient than generate_stacked_pipelines when starting from a reduced set of candidates.
- Parameters:
top_k_names – List of preprocessing names from top-K selection.
preprocessings – Dictionary of available transforms.
max_depth – Maximum pipeline depth.
- Returns:
List of (name, component_names, transforms) tuples.
- nirs4all.analysis.transfer_utils.get_base_preprocessings() Dict[str, Any][source]
Get the base set of preprocessing transforms.
- Returns:
Dictionary mapping names to transformer instances.
Example
>>> preprocessings = get_base_preprocessings() >>> snv = preprocessings["snv"] >>> X_transformed = snv.fit_transform(X)
- nirs4all.analysis.transfer_utils.get_transform_name(obj: Any) str[source]
Get a readable name from a transformer object.
- Parameters:
obj – Transformer instance or string.
- Returns:
Human-readable name for the transform.
Example
>>> get_transform_name(StandardNormalVariate()) 'StandardNormalVariate' >>> get_transform_name(SavitzkyGolay(window_length=15)) 'SavitzkyGolay'
- nirs4all.analysis.transfer_utils.get_transform_signature(obj: Any) str[source]
Get a unique signature for a transformer (for deduplication).
Includes class name and parameters if available.
- Parameters:
obj – Transformer instance.
- Returns:
Unique signature string.
Example
>>> get_transform_signature(SavitzkyGolay(window_length=15)) 'SavitzkyGolay(polyorder=3,window_length=15)'
- nirs4all.analysis.transfer_utils.normalize_preprocessing(item: Any | str | None, registry: Dict[str, Any] | None = None) Any[source]
Normalize a preprocessing item to a transformer object.
Handles both object instances and string names (for backward compat).
- Parameters:
item – Transformer instance, string name, or None.
registry – Optional name->object mapping for string resolution.
- Returns:
Transformer instance (or None).
- Raises:
ValueError – If string name not found in registry.
Example
>>> normalize_preprocessing(StandardNormalVariate()) StandardNormalVariate() >>> normalize_preprocessing("snv") # looks up in base preprocessings StandardNormalVariate()
- nirs4all.analysis.transfer_utils.normalize_preprocessing_list(items: List[Any | str | None], registry: Dict[str, Any] | None = None) List[Any][source]
Normalize a list of preprocessing items to transformer objects.
- Parameters:
items – List of transformer instances or string names.
registry – Optional name->object mapping.
- Returns:
List of transformer instances (None values filtered out).
- nirs4all.analysis.transfer_utils.validate_datasets(X_source: ndarray, X_target: ndarray, require_same_features: bool = True) Tuple[ndarray, ndarray][source]
Validate and prepare source/target datasets for transfer analysis.
- Parameters:
X_source – Source dataset.
X_target – Target dataset.
require_same_features – If True, require same number of features.
- Returns:
Tuple of validated (X_source, X_target) arrays.
- Raises:
ValueError – If datasets have incompatible shapes.