nirs4all.analysis package

Submodules

Module contents

Transfer Analysis Module.

This module provides tools for transfer-optimized preprocessing selection in spectroscopy applications. It helps find preprocessing methods that minimize distributional distance between source and target datasets while preserving predictive information.

Main Components:: TransferPreprocessingSelector: Main class for preprocessing selection. TransferResult: Result from evaluating a single preprocessing. TransferSelectionResults: Full results with ranking and visualization. TransferMetrics: Container for computed transfer metrics. TransferMetricsComputer: Fast computation of transfer metrics.
Presets:: PRESETS: Dictionary of preset configurations. get_preset: Get a preset by name. list_presets: List available presets with descriptions.
Utilities:: compute_transfer_score: Compute composite transfer score. get_base_preprocessings: Get default preprocessing transforms. apply_pipeline: Apply a sequence of transforms. apply_stacked_pipeline: Apply stacked preprocessing by name. apply_augmentation: Apply and concatenate multiple preprocessings.

Example

>>> from nirs4all.analysis import TransferPreprocessingSelector
>>> selector = TransferPreprocessingSelector(preset='balanced')
>>> results = selector.fit(X_source, X_target)
>>> print(results.best.name)
'snv>d1'
>>> print(results.to_pipeline_spec())
'snv>d1'

>>> # With visualization
>>> results.plot_ranking()
>>> results.plot_metrics_comparison()

>>> # Export to pipeline spec
>>> spec = results.to_pipeline_spec(top_k=3, use_augmentation=True)
>>> # {'feature_augmentation': ['snv', 'd1', 'msc']}

class nirs4all.analysis.TransferMetrics(centroid_distance: float, cka_similarity: float, grassmann_distance: float, rv_coefficient: float, procrustes_disparity: float, trustworthiness: float, spread_distance: float, evr_source: float, evr_target: float)[source]

Bases: object

Container for transfer metrics between two datasets.

centroid_distance: float

cka_similarity: float

evr_source: float

evr_target: float

grassmann_distance: float

procrustes_disparity: float

rv_coefficient: float

spread_distance: float

to_dict() → Dict[str, float][source]: Convert to dictionary.

trustworthiness: float

class nirs4all.analysis.TransferMetricsComputer(n_components: int = 10, k_neighbors: int = 10, random_state: int = 0)[source]

Bases: object

Fast computation of transfer metrics between two datasets.

Key optimization: Computes PCA once per dataset, then reuses for all metric computations.

Parameters:

n_components – Number of PCA components for projection.
k_neighbors – Number of neighbors for trustworthiness computation.
random_state – Random state for reproducibility.

compute(X_source: ndarray, X_target: ndarray, compute_trust: bool = True) → TransferMetrics[source]

Compute all transfer metrics between two datasets.

Parameters:

X_source – Source dataset (n_samples_src, n_features).
X_target – Target dataset (n_samples_tgt, n_features).
compute_trust – Whether to compute trustworthiness (slower).

Returns:

TransferMetrics containing all computed metrics.

compute_raw_and_preprocessed(X_source_raw: ndarray, X_target_raw: ndarray, X_source_pp: ndarray, X_target_pp: ndarray, compute_trust: bool = True) → Tuple[TransferMetrics, TransferMetrics, Dict[str, float]][source]

Compute metrics for both raw and preprocessed data, plus improvement.

Parameters:

X_source_raw – Raw source dataset.
X_target_raw – Raw target dataset.
X_source_pp – Preprocessed source dataset.
X_target_pp – Preprocessed target dataset.
compute_trust – Whether to compute trustworthiness.

Returns:

Tuple of (raw_metrics, pp_metrics, improvements_dict)

class nirs4all.analysis.TransferPreprocessingSelector(preset: str | None = 'fast', preprocessings: Dict[str, Any] | None = None, n_components: int = 10, k_neighbors: int = 10, run_stage2: bool = False, stage2_top_k: int | None = 5, stage2_max_depth: int = 2, stage2_exhaustive: bool = False, run_stage3: bool = False, stage3_top_k: int = 5, stage3_max_order: int = 2, run_stage4: bool = False, stage4_top_k: int = 10, stage4_cv_folds: int = 3, stage4_models: List[str] | None = None, metric_weights: Dict[str, float] | None = None, preprocessing_spec: Dict[str, Any] | None = None, use_generator: bool | None = None, n_jobs: int = -1, verbose: int = 1, random_state: int = 0)[source]

Bases: object

Select preprocessing for optimal transfer between datasets.

This class evaluates preprocessing methods to find those that best minimize distributional distance between source and target datasets while preserving predictive information.

Supports two modes for preprocessing generation:

Combinatoric mode (default): Uses simple permutations from base preprocessings. Stage 1 evaluates all singles, Stage 2 generates permutations from top-K candidates.
Generator mode: Uses nirs4all’s generator DSL for flexible, constraint-based specification. Enable by providing preprocessing_spec.

Stages:: 1. Single Preprocessing (required): Evaluate all base preprocessings 1b. Generator Stacked (optional): If using generator with stacked specs 2. Stacking (optional): Evaluate depth-2+ combinations of top-K 2b. Generator Augmented (optional): If using generator with augmentation specs 3. Augmentation (optional): Evaluate feature concatenation 4. Validation (optional): Supervised validation with proxy models

Parameters:

preset – Preset configuration (‘fast’, ‘balanced’, ‘thorough’, ‘full’, ‘exhaustive’) or None for manual configuration. Default is ‘fast’.
preprocessings – Custom preprocessings dict or None for base set.
n_components – PCA components for metric computation.
k_neighbors – Neighbors for trustworthiness metric.
2 (Stage) – run_stage2: Enable stacking evaluation. stage2_top_k: Number of top candidates for stacking. stage2_max_depth: Maximum stacking depth.
3 (Stage) – run_stage3: Enable augmentation evaluation. stage3_top_k: Number of top candidates for augmentation. stage3_max_order: Maximum augmentation order (2 or 3).
4 (Stage) – run_stage4: Enable supervised validation. stage4_top_k: Number of candidates to validate. stage4_cv_folds: Cross-validation folds.
integration (Generator) –

preprocessing_spec: Generator specification dict for flexible
preprocessing definition. Uses nirs4all.pipeline.config.generator. Supports keywords like _or_, arrange, pick, _mutex_, etc.

use_generator: Enable generator mode. Auto-detected if preprocessing_spec
is provided. Set to False to disable even with preprocessing_spec.
Parallelization –
n_jobs: Number of parallel jobs for preprocessing evaluation.
- n_jobs=-1: Use all available CPU cores (default)
- n_jobs=1: Sequential execution (useful for debugging)
- n_jobs=N: Use N cores
Other – verbose: Verbosity level (0=silent, 1=progress, 2=detailed). random_state: Random seed for reproducibility.

Example

>>> # Quick usage with default fast preset
>>> selector = TransferPreprocessingSelector()
>>> results = selector.fit(X_source, X_target)
>>> print(results.best.name)
'snv'

>>> # With balanced preset for stacking
>>> selector = TransferPreprocessingSelector(preset='balanced')
>>> results = selector.fit(X_source, X_target)
>>> print(results.to_pipeline_spec())
'snv>d1'

>>> # Generator mode: constrained stacking
>>> selector = TransferPreprocessingSelector(
...     preprocessing_spec={
...         "_or_": ["snv", "msc", "d1", "d2", "savgol"],
...         "arrange": 2,
...         "_mutex_": [["d1", "d2"]],  # Don't stack derivatives
...     },
... )
>>> results = selector.fit(X_source, X_target)

>>> # Custom configuration
>>> selector = TransferPreprocessingSelector(
...     preset=None,
...     run_stage2=True,
...     stage2_top_k=10,
...     stage2_max_depth=2,
...     n_components=20,
... )

fit(X_source_or_config, X_target: ndarray | None = None, y_source: ndarray | None = None, y_target: ndarray | None = None) → TransferSelectionResults[source]

Run transfer-optimized preprocessing selection.

Supports two calling conventions:

Raw arrays (original API):
selector.fit(X_source, X_target, y_source, y_target)
DatasetConfigs (nirs4all-native API):
selector.fit(dataset_config) - Single dataset: Uses train as source, test as target - Multiple datasets: Combines X(“all”) from all datasets

Parameters:

X_source_or_config – Either: - np.ndarray: Source dataset (n_samples_src, n_features) - DatasetConfigs: nirs4all dataset configuration
X_target – Target dataset (required if X_source_or_config is array).
y_source – Optional source targets for supervised validation.
y_target – Optional target labels for supervised validation.

Returns:

TransferSelectionResults with ranked recommendations.

Example

>>> # Using DatasetConfigs (recommended nirs4all way)
>>> selector = TransferPreprocessingSelector(preset="balanced")
>>> results = selector.fit(DatasetConfigs(data_path))
>>> pp_list = results.to_preprocessing_list(top_k=10)

>>> # Using raw arrays
>>> results = selector.fit(X_train, X_test, y_train)

fit_from_configs(config_source, config_target, partition: str = 'train') → TransferSelectionResults[source]

Fit from DatasetConfigs or SpectroDataset.

Parameters:

config_source – DatasetConfigs or SpectroDataset for source dataset.
config_target – DatasetConfigs or SpectroDataset for target dataset.
partition – Which partition to use (‘train’ or ‘test’).

Returns:

TransferSelectionResults with ranked recommendations.

Example

>>> from nirs4all.data.config import DatasetConfigs
>>> config_src = DatasetConfigs("path/to/source.json")
>>> config_tgt = DatasetConfigs("path/to/target.json")
>>> selector = TransferPreprocessingSelector()
>>> results = selector.fit_from_configs(config_src, config_tgt)

get_preprocessing_by_name(name: str) → Any[source]

Get a preprocessing transform by name.

Parameters:: name – Preprocessing name (e.g., “snv”, “snv>d1”).
Returns:: Transformer or list of transformers for stacked pipelines.

class nirs4all.analysis.TransferResult(name: str, pipeline_type: str, components: List[str], transfer_score: float, metrics: Dict[str, float], improvement_pct: float, signal_score: float | None = None, transforms: List[Any] | None = None)[source]

Bases: object

Result from evaluating a single preprocessing for transfer.

name

Pipeline display name (e.g., ‘StandardNormalVariate>FirstDerivative’).

Type:: str

pipeline_type

Type of pipeline (‘single’, ‘stacked’, or ‘augmented’).

Type:: str

components

List of component names (e.g., [‘StandardNormalVariate’, ‘FirstDerivative’]).

Type:: List[str]

transfer_score

Combined transfer metric score (higher is better).

Type:: float

metrics

Dictionary of individual metric values.

Type:: Dict[str, float]

improvement_pct

Percentage improvement over raw baseline.

Type:: float

signal_score

Optional supervised validation score (Stage 4).

Type:: float | None

transforms

Optional list of actual transformer objects (for object-based results).

Type:: List[Any] | None

__post_init__()[source]: Validate fields after initialization.

components: List[str]

get_transforms(preprocessings: Dict[str, Any] | None = None) → List[Any][source]

Get the transformer objects for this result.

If transforms are already stored, returns them directly. Otherwise, resolves component names from the preprocessings dict.

Parameters:: preprocessings – Optional name->object mapping for resolution.
Returns:: List of transformer instances.

improvement_pct: float

metrics: Dict[str, float]

name: str

pipeline_type: str

signal_score: float | None = None

to_dict() → Dict[str, Any][source]: Convert to dictionary representation.

transfer_score: float

transforms: List[Any] | None = None

class nirs4all.analysis.TransferSelectionResults(ranking: ~typing.List[~nirs4all.analysis.results.TransferResult], raw_metrics: ~typing.Dict[str, float], timing: ~typing.Dict[str, float] = <factory>)[source]

Bases: object

Full results from transfer preprocessing selection.

Provides access to ranked recommendations, timing information, and various output formats for integration with nirs4all pipelines.

ranking

List of TransferResult sorted by transfer_score (best first).

Type:: List[nirs4all.analysis.results.TransferResult]

raw_metrics

Baseline metrics computed on raw (unprocessed) data.

Type:: Dict[str, float]

timing

Dictionary of execution time per stage.

Type:: Dict[str, float]

property best: TransferResult

Get the best recommendation.

Returns:: TransferResult with highest transfer score.
Raises:: ValueError – If no results are available.

plot_improvement_heatmap(top_k: int = 15, figsize: Tuple[int, int] = (12, 10))[source]

Plot heatmap of metric improvements vs raw data.

Parameters:

top_k – Number of top results to display.
figsize – Figure size as (width, height).

Returns:

matplotlib Figure object.

plot_metrics_comparison(top_k: int = 10, metrics: List[str] | None = None, figsize: Tuple[int, int] = (16, 10))[source]

Plot comparison of all metrics for top-K preprocessings.

Parameters:

top_k – Number of top results to display.
metrics – Specific metrics to plot. Default: all available.
figsize – Figure size as (width, height).

Returns:

matplotlib Figure object.

plot_ranking(top_k: int = 15, show_signal_score: bool = True, figsize: Tuple[int, int] = (14, 8))[source]

Plot ranked bar chart of preprocessing recommendations.

Parameters:

top_k – Number of top results to display.
show_signal_score – Include signal score if available.
figsize – Figure size as (width, height).

Returns:

matplotlib Figure object.

ranking: List[TransferResult]

raw_metrics: Dict[str, float]

summary(top_k: int = 5) → str[source]

Generate human-readable summary.

Parameters:: top_k – Number of top results to include in summary.
Returns:: Formatted summary string.

timing: Dict[str, float]

to_dataframe()[source]

Convert results to pandas DataFrame.

Returns:: DataFrame with columns for name, type, scores, and metrics.
Raises:: ImportError – If pandas is not available.

to_pipeline_spec(top_k: int = 1, use_augmentation: bool = False) → str | List[str] | Dict[str, List[str]][source]

Convert results to nirs4all pipeline specification.

Parameters:

top_k – Number of top recommendations to include.
use_augmentation – If True and top_k > 1, return augmentation spec.

Returns:

Single string for top_k=1: “snv>d1”
List for multiple without augmentation: [“snv”, “d1”]
Dict for augmentation: {“feature_augmentation”: [“snv”, “d1>msc”]}

Return type:

Pipeline specification usable in nirs4all

Example

>>> results.to_pipeline_spec()
'snv'
>>> results.to_pipeline_spec(top_k=3, use_augmentation=True)
{'feature_augmentation': ['snv', 'd1', 'msc']}

to_preprocessing_list(top_k: int = 10, preprocessings: Dict[str, Any] | None = None) → List[List[Any]][source]

Convert top-K results to a list of preprocessing transform pipelines.

Each result is converted to a list of transformer instances that can be directly used in nirs4all pipeline’s feature_augmentation.

Parameters:

top_k – Number of top results to convert.
preprocessings – Optional dict mapping names to transformers. Uses get_base_preprocessings() if not provided. Not needed if results already store transform objects.

Returns:

List of preprocessing pipelines, where each pipeline is a list of transformer instances. For stacked pipelines like SNV>D1, returns [[SNV(), D1()], …].

Example

>>> results = selector.fit(X_train, X_test)
>>> pp_list = results.to_preprocessing_list(top_k=5)
>>> # pp_list = [[SNV()], [MSC()], [SNV(), D1()], ...]
>>>
>>> # Use in pipeline:
>>> pipeline = [
...     {"feature_augmentation": {"_or_": pp_list, "pick": 1}},
...     {"model": PLSRegression()},
... ]

top_k(k: int = 5) → List[TransferResult][source]

Get top-K recommendations.

Parameters:: k – Number of top results to return.
Returns:: List of top-K TransferResult objects.

nirs4all.analysis.apply_augmentation(X: ndarray, pipelines: List[str | List[Any] | Any], preprocessings: Dict[str, Any] | None = None) → ndarray[source]

Apply multiple pipelines and concatenate their outputs.

Supports both object-based and string-based pipeline definitions.

Parameters:

X – Input data matrix (n_samples, n_features).
pipelines – List of pipelines. Each can be: - A transformer object - A list of transformer objects (stacked) - A string name (legacy, resolved from preprocessings)
preprocessings – Optional dictionary of transforms (for string resolution).

Returns:

Horizontally stacked transformed features.

Example

>>> # Object-based (recommended)
>>> apply_augmentation(X, [StandardNormalVariate(), [MSC(), FirstDerivative()]])
>>> # String-based (legacy)
>>> apply_augmentation(X, ["snv", "msc>d1"])

nirs4all.analysis.apply_pipeline(X: ndarray, transforms: List[Any]) → ndarray[source]

Apply a sequence of transforms to X.

Parameters:

X – Input data matrix (n_samples, n_features).
transforms – List of transformer instances.

Returns:

Transformed data matrix.

Example

>>> from nirs4all.operators.transforms import StandardNormalVariate, FirstDerivative
>>> transforms = [StandardNormalVariate(), FirstDerivative()]
>>> X_transformed = apply_pipeline(X, transforms)

nirs4all.analysis.apply_preprocessing_objects(X: ndarray, transforms: Any | List[Any]) → ndarray[source]

Apply preprocessing object(s) to X.

This is the primary function for object-based preprocessing.

Parameters:

X – Input data matrix (n_samples, n_features).
transforms – Single transformer or list of transformers.

Returns:

Transformed data matrix.

Example

>>> from nirs4all.operators.transforms import StandardNormalVariate, FirstDerivative
>>> X_t = apply_preprocessing_objects(X, [StandardNormalVariate(), FirstDerivative()])
>>> X_t = apply_preprocessing_objects(X, StandardNormalVariate())  # single

nirs4all.analysis.apply_single_preprocessing(X: ndarray, pp_name: str, preprocessings: Dict[str, Any] | None = None) → ndarray[source]

Apply a single preprocessing by name.

Parameters:

X – Input data matrix (n_samples, n_features).
pp_name – Name of the preprocessing (e.g., “snv”, “d1”).
preprocessings – Optional dictionary of transforms. Uses base if None.

Returns:

Transformed data matrix.

nirs4all.analysis.apply_stacked_pipeline(X: ndarray, pipeline: str | List[Any], preprocessings: Dict[str, Any] | None = None) → ndarray[source]

Apply a stacked pipeline to X.

Supports both: - Object-based: List of transformer instances - String-based (legacy): Pipeline name with “>” separator (e.g., “snv>d1>msc”)

Parameters:

X – Input data matrix (n_samples, n_features).
pipeline – Either a list of transformer objects or a string name.
preprocessings – Optional dictionary of transforms (for string resolution).

Returns:

Transformed data matrix.

Example

>>> # Object-based (recommended)
>>> apply_stacked_pipeline(X, [StandardNormalVariate(), FirstDerivative()])
>>> # String-based (legacy)
>>> apply_stacked_pipeline(X, "snv>d1")

nirs4all.analysis.compute_transfer_score(metrics: TransferMetrics, raw_metrics: TransferMetrics | None = None, weights: Dict[str, float] | None = None) → float[source]

Compute a composite transfer score from metrics.

Higher scores indicate better transfer potential.

Parameters:

metrics – TransferMetrics from preprocessed data.
raw_metrics – Optional baseline metrics for computing improvements.
weights – Optional custom weights for metric combination.

Returns:

Composite transfer score (0-1 scale, higher is better). Returns NaN if critical metrics are invalid.

nirs4all.analysis.format_pipeline_name(name: str, max_length: int = 40) → str[source]

Format a pipeline name for display.

Parameters:

name – Pipeline name (e.g., “snv>d1>msc”).
max_length – Maximum length before truncation.

Returns:

Formatted name with potential truncation.

nirs4all.analysis.generate_augmentation_combinations(top_k_names: List[str], max_order: int = 2) → List[Tuple[str, List[str]]][source]

Generate feature augmentation combinations from top-K pipelines.

Feature augmentation concatenates outputs from multiple preprocessings.

Parameters:

top_k_names – List of pipeline names from top-K selection.
max_order – Maximum number of pipelines to combine (2 or 3).

Returns:

List of (name, component_names) tuples.

Example

>>> names = ["snv", "d1", "msc"]
>>> combos = generate_augmentation_combinations(names, max_order=2)
>>> # Returns 2-way combinations like ("snv+d1", ["snv", "d1"])

nirs4all.analysis.generate_object_augmentation_combinations(transforms: List[Any], max_order: int = 2) → List[Tuple[str, List[Any]]][source]

Generate augmentation combinations from transformer objects.

Object-based alternative to generate_augmentation_combinations.

Parameters:

transforms – List of transformer objects or stacked lists.
max_order – Maximum number of transforms to combine.

Returns:

List of (display_name, transforms_list) tuples.

nirs4all.analysis.generate_object_stacked_pipelines(transforms: List[Any], max_depth: int = 2) → List[Tuple[str, List[Any]]][source]

Generate stacked pipeline combinations from transformer objects.

Object-based alternative to generate_stacked_pipelines.

Parameters:

transforms – List of transformer objects.
max_depth – Maximum pipeline depth.

Returns:

List of (display_name, transforms_list) tuples.

Example

>>> transforms = [StandardNormalVariate(), FirstDerivative()]
>>> pipelines = generate_object_stacked_pipelines(transforms, max_depth=2)
>>> # Returns: [("StandardNormalVariate", [SNV()]),
>>> #           ("FirstDerivative", [D1()]),
>>> #           ("StandardNormalVariate>FirstDerivative", [SNV(), D1()]),
>>> #           ...]

nirs4all.analysis.generate_stacked_pipelines(preprocessings: Dict[str, Any], max_depth: int = 2, exclude: List[str] | None = None) → List[Tuple[str, List[str], List[Any]]][source]

Generate stacked pipeline combinations.

Parameters:

preprocessings – Dictionary of available transforms.
max_depth – Maximum pipeline depth (1 to max_depth).
exclude – List of preprocessing names to exclude.

Returns:

List of (name, component_names, transforms) tuples.

Example

>>> pp = {"snv": snv_transform, "d1": d1_transform}
>>> pipelines = generate_stacked_pipelines(pp, max_depth=2)
>>> # Returns: [("snv", ["snv"], [snv]), ("d1", ["d1"], [d1]),
>>> #           ("snv>d1", ["snv", "d1"], [snv, d1]),
>>> #           ("d1>snv", ["d1", "snv"], [d1, snv])]

nirs4all.analysis.generate_top_k_stacked_pipelines(top_k_names: List[str], preprocessings: Dict[str, Any], max_depth: int = 2) → List[Tuple[str, List[str], List[Any]]][source]

Generate stacked pipeline combinations from top-K selected preprocessings.

More efficient than generate_stacked_pipelines when starting from a reduced set of candidates.

Parameters:

top_k_names – List of preprocessing names from top-K selection.
preprocessings – Dictionary of available transforms.
max_depth – Maximum pipeline depth.

Returns:

List of (name, component_names, transforms) tuples.

nirs4all.analysis.get_base_preprocessings() → Dict[str, Any][source]

Get the base set of preprocessing transforms.

Returns:: Dictionary mapping names to transformer instances.

Example

>>> preprocessings = get_base_preprocessings()
>>> snv = preprocessings["snv"]
>>> X_transformed = snv.fit_transform(X)

nirs4all.analysis.get_preset(name: str) → Dict[str, Any][source]

Get a preset configuration by name.

Parameters:: name – Preset name (‘fast’, ‘balanced’, ‘thorough’, ‘full’, ‘exhaustive’).
Returns:: Dictionary of configuration parameters.
Raises:: ValueError – If preset name is unknown.

nirs4all.analysis.get_transform_name(obj: Any) → str[source]

Get a readable name from a transformer object.

Parameters:: obj – Transformer instance or string.
Returns:: Human-readable name for the transform.

Example

>>> get_transform_name(StandardNormalVariate())
'StandardNormalVariate'
>>> get_transform_name(SavitzkyGolay(window_length=15))
'SavitzkyGolay'

nirs4all.analysis.get_transform_signature(obj: Any) → str[source]

Get a unique signature for a transformer (for deduplication).

Includes class name and parameters if available.

Parameters:: obj – Transformer instance.
Returns:: Unique signature string.

Example

>>> get_transform_signature(SavitzkyGolay(window_length=15))
'SavitzkyGolay(polyorder=3,window_length=15)'

nirs4all.analysis.list_presets() → Dict[str, str][source]

List available presets with descriptions.

Returns:: Dictionary mapping preset names to descriptions.

nirs4all.analysis.normalize_preprocessing(item: Any | str | None, registry: Dict[str, Any] | None = None) → Any[source]

Normalize a preprocessing item to a transformer object.

Handles both object instances and string names (for backward compat).

Parameters:

item – Transformer instance, string name, or None.
registry – Optional name->object mapping for string resolution.

Returns:

Transformer instance (or None).

Raises:

ValueError – If string name not found in registry.

Example

>>> normalize_preprocessing(StandardNormalVariate())
StandardNormalVariate()
>>> normalize_preprocessing("snv")  # looks up in base preprocessings
StandardNormalVariate()

nirs4all.analysis.normalize_preprocessing_list(items: List[Any | str | None], registry: Dict[str, Any] | None = None) → List[Any][source]

Normalize a list of preprocessing items to transformer objects.

Parameters:

items – List of transformer instances or string names.
registry – Optional name->object mapping.

Returns:

List of transformer instances (None values filtered out).

nirs4all.analysis.validate_datasets(X_source: ndarray, X_target: ndarray, require_same_features: bool = True) → Tuple[ndarray, ndarray][source]

Validate and prepare source/target datasets for transfer analysis.

Parameters:

X_source – Source dataset.
X_target – Target dataset.
require_same_features – If True, require same number of features.

Returns:

Tuple of validated (X_source, X_target) arrays.

Raises:

ValueError – If datasets have incompatible shapes.