nirs4all.pipeline.config.generator module

Generator module for pipeline configuration expansion.

This module expands pipeline configuration specifications into concrete pipeline variants. It handles combinatorial keywords (_or_, _range_, size, count, pick, arrange) and generates all possible combinations.

This is the public API module. The implementation is in the _generator subpackage, which uses a Strategy pattern for modular node handling.

Main Functions:

expand_spec(node, seed): Expand a configuration node into all variants expand_spec_iter(node, seed): Lazy iterator version for large spaces count_combinations(node): Count variants without generating them

Keywords:

_or_: Choice between alternatives _range_: Numeric sequence generation size: Number of items to select (legacy, uses combinations) pick: Unordered selection (combinations) - explicit intent arrange: Ordered arrangement (permutations) - explicit intent then_pick: Second-order combination selection then_arrange: Second-order permutation selection count: Limit number of generated variants

_log_range_: Logarithmic sequence generation _grid_: Grid search style Cartesian product _zip_: Parallel iteration (like Python’s zip) _chain_: Sequential ordered choices _sample_: Statistical sampling (uniform, log-uniform, normal) _tags_: Configuration tagging for filtering _metadata_: Arbitrary metadata attachment

Constraints: _mutex_, _requires_, _exclude_ for filtering combinations Presets: _preset_ for named configuration templates Iterator: expand_spec_iter for memory-efficient lazy expansion Export: to_dataframe, diff_configs, print_expansion_tree utilities

Examples

Basic choice expansion:

>>> expand_spec({"_or_": ["A", "B", "C"]})
['A', 'B', 'C']

Pick (combinations):

>>> expand_spec({"_or_": ["A", "B", "C"], "pick": 2})
[['A', 'B'], ['A', 'C'], ['B', 'C']]

Arrange (permutations):

>>> expand_spec({"_or_": ["A", "B", "C"], "arrange": 2})
[['A', 'B'], ['B', 'A'], ['A', 'C'], ['C', 'A'], ['B', 'C'], ['C', 'B']]

Mutual exclusion constraint (Phase 4):

>>> expand_spec({"_or_": ["A", "B", "C"], "pick": 2, "_mutex_": [["A", "B"]]})
[['A', 'C'], ['B', 'C']]  # ["A", "B"] excluded

Lazy iteration for large spaces (Phase 4):

>>> for config in expand_spec_iter({"_range_": [1, 1000000]}):
...     process(config)  # Memory efficient

Numeric range:

>>> expand_spec({"_range_": [1, 5]})
[1, 2, 3, 4, 5]

Logarithmic range:

>>> expand_spec({"_log_range_": [0.001, 1, 4]})
[0.001, 0.01, 0.1, 1.0]

Grid search:

>>> expand_spec({"_grid_": {"x": [1, 2], "y": ["A", "B"]}})
[{'x': 1, 'y': 'A'}, {'x': 1, 'y': 'B'}, {'x': 2, 'y': 'A'}, {'x': 2, 'y': 'B'}]

Parallel zip:

>>> expand_spec({"_zip_": {"x": [1, 2], "y": ["A", "B"]}})
[{'x': 1, 'y': 'A'}, {'x': 2, 'y': 'B'}]

Nested dict expansion:

>>> expand_spec({"x": {"_or_": [1, 2]}, "y": 3})
[{'x': 1, 'y': 3}, {'x': 2, 'y': 3}]

Architecture:

The _generator subpackage uses the Strategy pattern: - strategies/base.py: ExpansionStrategy abstract base class - strategies/registry.py: Strategy registration and dispatch - strategies/range_strategy.py: Handles _range_ nodes - strategies/or_strategy.py: Handles _or_ nodes with pick/arrange/constraints - strategies/log_range_strategy.py: Handles _log_range_ nodes (Phase 3) - strategies/grid_strategy.py: Handles _grid_ nodes (Phase 3) - strategies/zip_strategy.py: Handles _zip_ nodes (Phase 3) - strategies/chain_strategy.py: Handles _chain_ nodes (Phase 3) - strategies/sample_strategy.py: Handles _sample_ nodes (Phase 3) - validators/schema.py: Specification and config validation (Phase 3) - iterator.py: Lazy expansion with expand_spec_iter (Phase 4) - constraints.py: Constraint evaluation (_mutex_, _requires_) (Phase 4) - presets.py: Preset registry and resolution (Phase 4) - core.py: Main expansion logic using strategy dispatch - keywords.py: Keyword constants and detection utilities - utils/: Helper functions (sampling, combinatorics, export)

class nirs4all.pipeline.config.generator.CartesianStrategy[source]

Bases: ExpansionStrategy

Strategy for handling _cartesian_ nodes.

Generates the Cartesian product of all stages first (each stage being an _or_ node or list of options), then applies pick or arrange selection to the complete pipelines.

This differs from _grid_ which produces dicts. _cartesian_ produces lists (ordered stages) which is ideal for preprocessing pipelines.

Supported formats:

Array of stages: [stage1, stage2, …]
With pick: Select N combinations of complete pipelines
With arrange: Select N permutations of complete pipelines
With count: Limit number of results
With constraints: Filter invalid combinations

keywords

{_cartesian_, pick, arrange, count, …}

Type:: FrozenSet[str]

priority

35 (high priority, checked before grid)

Type:: int

count(node: Dict[str, Any], count_nested: callable | None = None) → int[source]

Count cartesian combinations without generating them.

Parameters:

node – Cartesian specification node.
count_nested – Callback to count nested nodes.

Returns:

Number of pipeline combinations.

expand(node: Dict[str, Any], seed: int | None = None, expand_nested: callable | None = None) → List[Any][source]

Expand a cartesian node to list of pipeline combinations.

The process: 1. Expand each stage to get its options 2. Compute Cartesian product of all stages -> complete pipelines 3. If pick/arrange specified, select from complete pipelines 4. Apply constraints if specified 5. Apply count limit if specified

Parameters:

node – Cartesian specification node.
seed – Optional seed for random sampling when count is used.
expand_nested – Callback to expand nested generator nodes.

Returns:

List of pipeline combinations.

Examples

>>> strategy.expand({
...     "_cartesian_": [
...         {"_or_": ["A", "B"]},
...         {"_or_": ["X", "Y"]}
...     ],
...     "pick": 2
... })
[[["A", "X"], ["A", "Y"]], [["A", "X"], ["B", "X"]], ...]

classmethod handles(node: Dict[str, Any]) → bool[source]

Check if node is a pure cartesian node.

Parameters:: node – Dictionary node to check.
Returns:: True if node contains _cartesian_ and only cartesian-related keys.

keywords: FrozenSet[str] = frozenset({'_cartesian_', '_exclude_', '_metadata_', '_mutex_', '_requires_', '_seed_', '_tags_', 'arrange', 'count', 'pick'})

priority: int = 35

validate(node: Dict[str, Any]) → List[str][source]

Validate cartesian node specification.

Parameters:: node – Cartesian node to validate.
Returns:: List of error messages. Empty if valid.

class nirs4all.pipeline.config.generator.ChainStrategy[source]

Bases: ExpansionStrategy

Strategy for handling _chain_ nodes.

Generates configurations in sequential order. Each item in the chain is expanded and added to the result list in order.

Supported formats:

Array: [config1, config2, …]
With count: Limits output to first n items (not random)

keywords

{_chain_, count}

Type:: FrozenSet[str]

priority

26 (between log_range and range)

Type:: int

count(node: Dict[str, Any], count_nested: callable | None = None) → int[source]

Count chain items without generating them.

Parameters:

node – Chain specification node.
count_nested – Callback to count nested nodes.

Returns:

Number of items in the chain.

expand(node: Dict[str, Any], seed: int | None = None, expand_nested: callable | None = None) → List[Any][source]

Expand a chain node to list of sequential configurations.

Parameters:

node – Chain specification node.
seed – Optional seed for random sampling when count is used.
expand_nested – Callback to expand nested generator nodes.

Returns:

List of configurations in order.

Examples

>>> strategy.expand({"_chain_": [{"x": 1}, {"x": 2}, {"x": 3}]})
[{"x": 1}, {"x": 2}, {"x": 3}]

classmethod handles(node: Dict[str, Any]) → bool[source]

Check if node is a pure chain node.

Parameters:: node – Dictionary node to check.
Returns:: True if node contains _chain_ and only chain-related keys.

keywords: FrozenSet[str] = frozenset({'_chain_', '_metadata_', '_seed_', '_tags_', 'count'})

priority: int = 26

validate(node: Dict[str, Any]) → List[str][source]

Validate chain node specification.

Parameters:: node – Chain node to validate.
Returns:: List of error messages. Empty if valid.

class nirs4all.pipeline.config.generator.ExpansionStrategy[source]

Bases: ABC

Abstract base class for generator expansion strategies.

Each strategy is responsible for: 1. Detecting if it can handle a specific node type 2. Expanding the node into all possible variants 3. Counting the variants without generating them

Subclasses must implement:

handles(node): Check if strategy can handle this node
expand(node, seed): Expand node to list of variants
count(node): Count variants without generating

keywords

Set of keywords this strategy recognizes.

Type:: FrozenSet[str]

priority

Higher priority strategies are checked first.

Type:: int

abstractmethod count(node: Dict[str, Any], count_nested: callable | None = None) → int[source]

Count the number of variants without generating them.

Parameters:

node – A dictionary node to count.
count_nested – Callback to count nested nodes recursively.

Returns:

Number of variants that would be generated.

abstractmethod expand(node: Dict[str, Any], seed: int | None = None, expand_nested: callable | None = None) → List[Any][source]

Expand a node into all possible variants.

Parameters:

node – A dictionary node to expand.
seed – Optional random seed for reproducible generation.
expand_nested – Callback to expand nested nodes recursively. This allows strategies to delegate back to the main expansion logic for nested structures.

Returns:

List of expanded variants.

abstractmethod classmethod handles(node: Dict[str, Any]) → bool[source]

Check if this strategy can handle the given node.

Parameters:: node – A dictionary node from the configuration.
Returns:: True if this strategy can expand the node, False otherwise.

keywords: FrozenSet[str] = frozenset({})

priority: int = 0

validate(node: Dict[str, Any]) → List[str][source]

Validate a node and return any errors.

Parameters:: node – A dictionary node to validate.
Returns:: List of error messages. Empty list if valid.

class nirs4all.pipeline.config.generator.ExpansionTreeNode(key: str, node_type: str, count: int, children: List[ExpansionTreeNode] | None = None, details: Dict[str, Any] | None = None)[source]

Bases: object

Node in an expansion tree visualization.

to_dict() → Dict[str, Any][source]: Convert tree to dict representation.

class nirs4all.pipeline.config.generator.GridStrategy[source]

Bases: ExpansionStrategy

Strategy for handling _grid_ nodes.

Generates all combinations (Cartesian product) of parameter values. Similar to sklearn’s ParameterGrid.

Supported formats:

Dict: {“param1”: [v1, v2], “param2”: [v3, v4]}
With count: Limits output to n random samples

keywords

{_grid_, count}

Type:: FrozenSet[str]

priority

30 (checked early due to specific structure)

Type:: int

count(node: Dict[str, Any], count_nested: callable | None = None) → int[source]

Count grid combinations without generating them.

Parameters:

node – Grid specification node.
count_nested – Callback to count nested nodes.

Returns:

Number of parameter combinations.

expand(node: Dict[str, Any], seed: int | None = None, expand_nested: callable | None = None) → List[Any][source]

Expand a grid node to list of parameter combinations.

Parameters:

node – Grid specification node.
seed – Optional seed for random sampling when count is used.
expand_nested – Callback to expand nested generator nodes.

Returns:

List of dicts with all parameter combinations.

Examples

>>> strategy.expand({"_grid_": {"x": [1, 2], "y": ["A", "B"]}})
[{"x": 1, "y": "A"}, {"x": 1, "y": "B"}, {"x": 2, "y": "A"}, {"x": 2, "y": "B"}]

classmethod handles(node: Dict[str, Any]) → bool[source]

Check if node is a pure grid node.

Parameters:: node – Dictionary node to check.
Returns:: True if node contains _grid_ and only grid-related keys.

keywords: FrozenSet[str] = frozenset({'_grid_', '_metadata_', '_seed_', '_tags_', 'count'})

priority: int = 30

validate(node: Dict[str, Any]) → List[str][source]

Validate grid node specification.

Parameters:: node – Grid node to validate.
Returns:: List of error messages. Empty if valid.

class nirs4all.pipeline.config.generator.LogRangeStrategy[source]

Bases: ExpansionStrategy

Strategy for handling _log_range_ nodes.

Generates logarithmically-spaced numeric sequences. Useful for hyperparameter search over values that span multiple orders of magnitude.

Supported formats:

Array: [from, to, num] - num values from from to to
Dict: {“from”: start, “to”: end, “num”: n}
Dict: {“from”: start, “to”: end, “base”: b} - explicit base
With count: Limits output to n random samples

keywords

{_log_range_, count}

Type:: FrozenSet[str]

priority

25 (checked before range and or strategies)

Type:: int

count(node: Dict[str, Any], count_nested: callable | None = None) → int[source]

Count log range elements without generating them.

Parameters:

node – Log range specification node.
count_nested – Not used for log range nodes.

Returns:

Number of values in the log range.

expand(node: Dict[str, Any], seed: int | None = None, expand_nested: callable | None = None) → List[Any][source]

Expand a log range node to list of numeric values.

Parameters:

node – Log range specification node.
seed – Optional seed for random sampling when count is used.
expand_nested – Not used for log range nodes (no nesting).

Returns:

List of logarithmically-spaced numeric values.

Raises:

ValueError – If log range specification is invalid.

Examples

>>> strategy.expand({"_log_range_": [0.001, 1, 4]})
[0.001, 0.01, 0.1, 1.0]
>>> strategy.expand({"_log_range_": [1, 1000, 4]})
[1.0, 10.0, 100.0, 1000.0]

classmethod handles(node: Dict[str, Any]) → bool[source]

Check if node is a pure log range node.

A pure log range node contains only _log_range_ and optionally count/seed.

Parameters:: node – Dictionary node to check.
Returns:: True if node contains _log_range_ and only log-range-related keys.

keywords: FrozenSet[str] = frozenset({'_log_range_', '_metadata_', '_seed_', '_tags_', 'count'})

priority: int = 25

validate(node: Dict[str, Any]) → List[str][source]

Validate log range node specification.

Parameters:: node – Log range node to validate.
Returns:: List of error messages. Empty if valid.

class nirs4all.pipeline.config.generator.OrStrategy[source]

Bases: ExpansionStrategy

Strategy for handling _or_ nodes with selection semantics.

Supports:

Basic choice expansion (each alternative becomes a variant)
pick: Unordered selection using combinations
arrange: Ordered arrangement using permutations
size: Legacy alias for pick (backward compatibility)
Second-order selection via then_pick/then_arrange or [outer, inner]
count: Limit number of generated variants
Constraints: _mutex_, _requires_, _exclude_ for filtering (Phase 4)

keywords

{_or_, size, count, pick, arrange, then_pick, then_arrange, _mutex_, _requires_, _exclude_}

Type:: FrozenSet[str]

priority

10 (standard priority)

Type:: int

count(node: Dict[str, Any], count_nested: callable | None = None) → int[source]

Count OR node variants without generating them.

Parameters:

node – OR specification node.
count_nested – Callback to count nested nodes.

Returns:

Number of variants.

expand(node: Dict[str, Any], seed: int | None = None, expand_nested: callable | None = None) → List[Any][source]

Expand an OR node to list of variants.

Parameters:

node – OR specification node.
seed – Optional seed for random sampling.
expand_nested – Callback to expand nested generator nodes.

Returns:

List of expanded variants.

classmethod handles(node: Dict[str, Any]) → bool[source]

Check if node is a pure OR node.

A pure OR node contains _or_ and only OR-related modifier keys.

Parameters:: node – Dictionary node to check.
Returns:: True if node is a pure OR node.

keywords: FrozenSet[str] = frozenset({'_exclude_', '_metadata_', '_mutex_', '_or_', '_requires_', '_seed_', '_tags_', '_weights_', 'arrange', 'count', 'pick', 'size', 'then_arrange', 'then_pick'})

priority: int = 10

validate(node: Dict[str, Any]) → List[str][source]

Validate OR node specification.

Parameters:: node – OR node to validate.
Returns:: List of error messages. Empty if valid.

class nirs4all.pipeline.config.generator.RangeStrategy[source]

Bases: ExpansionStrategy

Strategy for handling _range_ nodes.

Generates numeric sequences based on range specifications.

Supported formats:

Array: [from, to] or [from, to, step]
Dict: {“from”: start, “to”: end, “step”: step}
With count: Limits output to n random samples

keywords

{_range_, count}

Type:: FrozenSet[str]

priority

20 (checked before OrStrategy)

Type:: int

count(node: Dict[str, Any], count_nested: callable | None = None) → int[source]

Count range elements without generating them.

Parameters:

node – Range specification node.
count_nested – Not used for range nodes.

Returns:

Number of values in the range.

Raises:

ValueError – If range specification is invalid.

expand(node: Dict[str, Any], seed: int | None = None, expand_nested: callable | None = None) → List[Any][source]

Expand a range node to list of numeric values.

Parameters:

node – Range specification node.
seed – Optional seed for random sampling when count is used.
expand_nested – Not used for range nodes (no nesting).

Returns:

List of numeric values.

Raises:

ValueError – If range specification is invalid.

Examples

>>> strategy.expand({"_range_": [1, 5]})
[1, 2, 3, 4, 5]
>>> strategy.expand({"_range_": [0, 10, 2]})
[0, 2, 4, 6, 8, 10]

classmethod handles(node: Dict[str, Any]) → bool[source]

Check if node is a pure range node.

A pure range node contains only _range_ and optionally count.

Parameters:: node – Dictionary node to check.
Returns:: True if node contains _range_ and only range-related keys.

keywords: FrozenSet[str] = frozenset({'_metadata_', '_range_', '_seed_', '_tags_', 'count'})

priority: int = 20

validate(node: Dict[str, Any]) → List[str][source]

Validate range node specification.

Parameters:: node – Range node to validate.
Returns:: List of error messages. Empty if valid.

class nirs4all.pipeline.config.generator.SampleStrategy[source]

Bases: ExpansionStrategy

Strategy for handling _sample_ nodes.

Generates values using statistical sampling from various distributions. Supports uniform, log-uniform, normal, and choice distributions.

Supported distributions:

uniform: Uniform distribution between from and to
log_uniform: Log-uniform distribution (common for learning rates)
normal/gaussian: Normal distribution with mean and std
choice: Random selection from a list of values

keywords

{_sample_, count, seed}

Type:: FrozenSet[str]

priority

24 (between log_range and range)

Type:: int

SUPPORTED_DISTRIBUTIONS = {'choice', 'gaussian', 'log_uniform', 'normal', 'uniform'}

count(node: Dict[str, Any], count_nested: callable | None = None) → int[source]

Count sample results (simply returns num).

Parameters:

node – Sample specification node.
count_nested – Not used.

Returns:

Number of samples to generate.

expand(node: Dict[str, Any], seed: int | None = None, expand_nested: callable | None = None) → List[Any][source]

Expand a sample node to list of sampled values.

Parameters:

node – Sample specification node.
seed – Optional seed for reproducible sampling.
expand_nested – Not typically used for sample nodes.

Returns:

List of sampled values.

Examples

>>> strategy.expand({"_sample_": {"distribution": "uniform", "from": 0, "to": 1, "num": 3}}, seed=42)
[0.6394267984578837, 0.025010755222666936, 0.27502931836911926]

classmethod handles(node: Dict[str, Any]) → bool[source]

Check if node is a pure sample node.

Parameters:: node – Dictionary node to check.
Returns:: True if node contains _sample_ and only sample-related keys.

keywords: FrozenSet[str] = frozenset({'_metadata_', '_sample_', '_seed_', '_tags_', 'count'})

priority: int = 24

validate(node: Dict[str, Any]) → List[str][source]

Validate sample node specification.

Parameters:: node – Sample node to validate.
Returns:: List of error messages. Empty if valid.

exception nirs4all.pipeline.config.generator.ValidationError(message: str, path: str = '', severity: ValidationSeverity = ValidationSeverity.ERROR, code: str = '', suggestion: str | None = None)[source]

Bases: Exception

Exception for validation failures with detailed context.

message

Human-readable error description

Type:: str

path

JSONPath-like location of the error (e.g., “root._or_[0]”)

Type:: str

severity

Error severity level

Type:: nirs4all.pipeline.config._generator.validators.schema.ValidationSeverity

code

Machine-readable error code

Type:: str

suggestion

Optional suggestion for fixing the error

Type:: str | None

__str__() → str[source]: Format error message with path.

code: str = ''

message: str

path: str = ''

severity: ValidationSeverity = 'error'

suggestion: str | None = None

class nirs4all.pipeline.config.generator.ValidationResult(is_valid: bool = True, errors: ~typing.List[~nirs4all.pipeline.config._generator.validators.schema.ValidationError] = <factory>, warnings: ~typing.List[~nirs4all.pipeline.config._generator.validators.schema.ValidationError] = <factory>, info: ~typing.List[~nirs4all.pipeline.config._generator.validators.schema.ValidationError] = <factory>, node_count: int = 0, generator_count: int = 0)[source]

Bases: object

Result of configuration validation.

is_valid

True if no errors (warnings allowed)

Type:: bool

errors

List of validation errors

Type:: List[nirs4all.pipeline.config._generator.validators.schema.ValidationError]

warnings

List of validation warnings

Type:: List[nirs4all.pipeline.config._generator.validators.schema.ValidationError]

info

List of informational messages

Type:: List[nirs4all.pipeline.config._generator.validators.schema.ValidationError]

node_count

Number of nodes validated

Type:: int

generator_count

Number of generator nodes found

Type:: int

__str__() → str[source]: Format validation result summary.

add_error(error: ValidationError) → None[source]: Add a validation error.

errors: List[ValidationError]

generator_count: int = 0

info: List[ValidationError]

is_valid: bool = True

merge(other: ValidationResult) → ValidationResult[source]: Merge another validation result into this one.

node_count: int = 0

warnings: List[ValidationError]

class nirs4all.pipeline.config.generator.ValidationSeverity(value)[source]

Bases: Enum

Severity levels for validation issues.

ERROR = 'error'

INFO = 'info'

WARNING = 'warning'

class nirs4all.pipeline.config.generator.ZipStrategy[source]

Bases: ExpansionStrategy

Strategy for handling _zip_ nodes.

Generates configurations by pairing values at the same index from multiple parameter lists (like Python’s zip).

Supported formats:

Dict: {“param1”: [v1, v2], “param2”: [v3, v4]}
With count: Limits output to n random samples

keywords

{_zip_, count}

Type:: FrozenSet[str]

priority

28 (between grid and log_range)

Type:: int

count(node: Dict[str, Any], count_nested: callable | None = None) → int[source]

Count zip pairs without generating them.

Parameters:

node – Zip specification node.
count_nested – Callback to count nested nodes.

Returns:

Number of zipped pairs (minimum list length).

expand(node: Dict[str, Any], seed: int | None = None, expand_nested: callable | None = None) → List[Any][source]

Expand a zip node to list of paired parameter values.

Parameters:

node – Zip specification node.
seed – Optional seed for random sampling when count is used.
expand_nested – Callback to expand nested generator nodes.

Returns:

List of dicts with paired parameter values.

Examples

>>> strategy.expand({"_zip_": {"x": [1, 2, 3], "y": ["A", "B", "C"]}})
[{"x": 1, "y": "A"}, {"x": 2, "y": "B"}, {"x": 3, "y": "C"}]

classmethod handles(node: Dict[str, Any]) → bool[source]

Check if node is a pure zip node.

Parameters:: node – Dictionary node to check.
Returns:: True if node contains _zip_ and only zip-related keys.

keywords: FrozenSet[str] = frozenset({'_metadata_', '_seed_', '_tags_', '_zip_', 'count'})

priority: int = 28

validate(node: Dict[str, Any]) → List[str][source]

Validate zip node specification.

Parameters:: node – Zip node to validate.
Returns:: List of error messages. Empty if valid.

nirs4all.pipeline.config.generator.apply_all_constraints(combinations: List[List[Any]], mutex_groups: List[List[Any]] | None = None, requires_groups: List[List[Any]] | None = None, exclude_combos: List[List[Any]] | None = None) → List[List[Any]][source]

Apply all constraints in sequence.

Parameters:

combinations – List of combinations to filter.
mutex_groups – Mutual exclusion groups.
requires_groups – Dependency requirement pairs.
exclude_combos – Specific combinations to exclude.

Returns:

Filtered list satisfying all constraints.

nirs4all.pipeline.config.generator.apply_exclude_constraint(combinations: List[List[Any]], exclude_combos: List[List[Any]]) → List[List[Any]][source]

Filter specific combinations from results.

Parameters:

combinations – List of combinations to filter.
exclude_combos – Specific combinations to exclude.

Returns:

Filtered list excluding specified combinations.

Examples

>>> combos = [["A", "B"], ["A", "C"], ["B", "C"]]
>>> apply_exclude_constraint(combos, [["A", "B"]])
[['A', 'C'], ['B', 'C']]

nirs4all.pipeline.config.generator.apply_mutex_constraint(combinations: List[List[Any]], mutex_groups: List[List[Any]]) → List[List[Any]][source]

Filter combinations that violate mutual exclusion constraints.

A mutex constraint [A, B] means A and B cannot both be present in the same combination.

Parameters:

combinations – List of combinations to filter.
mutex_groups – List of mutex groups. Each group is a list of items that cannot appear together in the same combination.

Returns:

Filtered list of combinations that satisfy all mutex constraints.

Examples

>>> combos = [["A", "B"], ["A", "C"], ["B", "C"]]
>>> apply_mutex_constraint(combos, [["A", "B"]])
[['A', 'C'], ['B', 'C']]

>>> apply_mutex_constraint(combos, [["A", "B"], ["B", "C"]])
[['A', 'C']]

nirs4all.pipeline.config.generator.apply_requires_constraint(combinations: List[List[Any]], requires_groups: List[List[Any]]) → List[List[Any]][source]

Filter combinations that violate dependency requirements.

A requires constraint [A, B] means if A is present, B must also be present. This is a one-directional dependency from A to B.

Parameters:

combinations – List of combinations to filter.
requires_groups – List of requirement pairs. Each pair [A, B] means if A is selected, B must also be selected.

Returns:

Filtered list of combinations that satisfy all requires constraints.

Examples

>>> combos = [["A", "B"], ["A", "C"], ["B", "C"]]
>>> apply_requires_constraint(combos, [["A", "B"]])
[['A', 'B'], ['B', 'C']]  # "A, C" removed because A requires B

>>> # B and C without A is OK because no constraint on B or C

Iterate in batches for chunk processing.

Parameters:

spec – Specification to expand.
batch_size – Number of configs per batch.
seed – Random seed.

Yields:

Lists of up to batch_size configurations.

nirs4all.pipeline.config.generator.clear_presets() → int[source]

Clear all registered presets.

Returns:: Number of presets cleared.

Calculate total number of combinations without generating them.

This is more efficient than generating all combinations when you only need to know the count.

Parameters:: node – Configuration node to count.
Returns:: Number of variants that expand_spec would produce.

Examples

>>> count_combinations({"_or_": ["A", "B", "C"]})
3
>>> count_combinations({"_or_": ["A", "B", "C"], "pick": 2})
3  # C(3,2)
>>> count_combinations({"_range_": [1, 10]})
10

nirs4all.pipeline.config.generator.diff_configs(config1: Any, config2: Any, path: str = '') → Dict[str, Tuple[Any, Any]][source]

Find differences between two configurations.

Parameters:

config1 – First configuration.
config2 – Second configuration.
path – Current path (for nested diff reporting).

Returns:

Dict mapping paths to (value1, value2) tuples where values differ.

Examples

>>> config1 = {"model": "PLS", "n_components": 5}
>>> config2 = {"model": "PLS", "n_components": 10}
>>> diff_configs(config1, config2)
{'n_components': (5, 10)}

>>> config1 = {"a": {"b": 1}}
>>> config2 = {"a": {"b": 2}}
>>> diff_configs(config1, config2)
{'a.b': (1, 2)}

Expand a specification node to all possible combinations.

This is the main entry point for configuration expansion. It handles all node types and delegates to appropriate strategies for special generator nodes.

Parameters:

node – Configuration node to expand. Can be: - dict: Expanded based on keys (strategies or Cartesian product) - list: Cartesian product of expanded elements - scalar: Wrapped in a list
seed – Optional random seed for reproducible generation when using ‘count’ to limit results.

Returns:

List of expanded variants.

Examples

>>> expand_spec({"_or_": ["A", "B"]})
['A', 'B']
>>> expand_spec({"_range_": [1, 3]})
[1, 2, 3]
>>> expand_spec({"x": {"_or_": [1, 2]}, "y": "fixed"})
[{'x': 1, 'y': 'fixed'}, {'x': 2, 'y': 'fixed'}]

Lazily expand a specification node to all possible combinations.

This is the memory-efficient version of expand_spec that yields configurations one at a time instead of building a complete list.

Parameters:

node – Configuration node to expand. Can be: - dict: Expanded based on keys (strategies or Cartesian product) - list: Cartesian product of expanded elements - scalar: Yielded as single item
seed – Optional random seed for reproducible generation when using sample_size to limit results.
sample_size – If provided, yield at most this many items using reservoir sampling for uniform distribution.

Yields:

Expanded configuration variants one at a time.

Examples

>>> list(expand_spec_iter({"_or_": ["A", "B"]}))
['A', 'B']

>>> from itertools import islice
>>> large_spec = {"_range_": [1, 1000000]}
>>> list(islice(expand_spec_iter(large_spec), 5))
[1, 2, 3, 4, 5]

>>> # With sampling
>>> list(expand_spec_iter({"_range_": [1, 100]}, seed=42, sample_size=5))
[23, 45, 67, 12, 89]  # Random 5 items

Expand a specification node and track generator choices.

Like expand_spec, but also returns the choices made at each generator node (_or_, _range_, etc.) for each expanded variant. This is useful for tracking which specific values were selected to produce each pipeline configuration.

Parameters:

node – Configuration node to expand.
seed – Optional random seed for reproducible generation.

Returns:

List of (expanded_config, generator_choices) tuples. Each generator_choices is a list of dicts like: [{“_or_”: selected_value}, {“_range_”: 18}, …] in the order they were encountered during expansion.

Examples

>>> results = expand_spec_with_choices({"_or_": ["A", "B"]})
>>> results
[('A', [{'_or_': 'A'}]), ('B', [{'_or_': 'B'}])]

>>> results = expand_spec_with_choices({"x": {"_or_": [1, 2]}, "y": 3})
>>> results
[({'x': 1, 'y': 3}, [{'_or_': 1}]), ({'x': 2, 'y': 3}, [{'_or_': 2}])]

nirs4all.pipeline.config.generator.export_presets() → Dict[str, Any][source]

Export all presets for serialization.

Returns:: Dict of all presets with metadata.

nirs4all.pipeline.config.generator.extract_base_node(node: Dict[str, Any]) → Dict[str, Any][source]

Extract non-keyword keys from a node.

Returns a copy of the node with all generator and modifier keywords removed.

Parameters:: node – A dictionary node from the configuration.
Returns:: A dictionary containing only the non-keyword key-value pairs.

Examples

>>> extract_base_node({"_or_": ["A", "B"], "class": "MyClass", "size": 2})
{"class": "MyClass"}
>>> extract_base_node({"class": "MyClass", "params": {"n": 5}})
{"class": "MyClass", "params": {"n": 5}}

nirs4all.pipeline.config.generator.extract_constraints(node: Dict[str, Any]) → Dict[str, Any][source]

Extract constraint specifications from a node.

Parameters:: node – A dictionary node.
Returns:: Dict containing constraint specifications (_mutex_, _requires_, etc.)

Examples

>>> extract_constraints({"_mutex_": [["A", "B"]], "_requires_": [["C", "D"]]})
{"_mutex_": [["A", "B"]], "_requires_": [["C", "D"]]}

nirs4all.pipeline.config.generator.extract_metadata(node: Dict[str, Any]) → Dict[str, Any][source]

Extract metadata from a node.

Parameters:: node – A dictionary node.
Returns:: Metadata dict, or empty dict if no metadata.

Examples

>>> extract_metadata({"_metadata_": {"author": "user1"}})
{"author": "user1"}

nirs4all.pipeline.config.generator.extract_modifiers(node: Dict[str, Any]) → Dict[str, Any][source]

Extract modifier values from a node.

Extracts all modifier keywords (size, count, _seed_, _weights_, _exclude_) from a node and returns them as a dictionary.

Parameters:: node – A dictionary node from the configuration.
Returns:: A dictionary containing only the modifier key-value pairs found in the node.

Examples

>>> extract_modifiers({"_or_": ["A", "B"], "size": 2, "count": 1})
{"size": 2, "count": 1}
>>> extract_modifiers({"_or_": ["A", "B"]})
{}

nirs4all.pipeline.config.generator.extract_or_choices(node: Dict[str, Any]) → list[source]

Extract the choices list from an OR node.

Parameters:: node – A dictionary node containing the _or_ keyword.
Returns:: The list of choices, or an empty list if _or_ is not present.

Examples

>>> extract_or_choices({"_or_": ["A", "B", "C"]})
["A", "B", "C"]
>>> extract_or_choices({"class": "MyClass"})
[]

nirs4all.pipeline.config.generator.extract_range_spec(node: Dict[str, Any]) → Any[source]

Extract the range specification from a range node.

Parameters:: node – A dictionary node containing the _range_ keyword.
Returns:: The range specification (list or dict), or None if not present.

Examples

>>> extract_range_spec({"_range_": [1, 10, 2]})
[1, 10, 2]
>>> extract_range_spec({"_range_": {"from": 1, "to": 10}})
{"from": 1, "to": 10}

nirs4all.pipeline.config.generator.extract_tags(node: Dict[str, Any]) → list[source]

Extract tags from a node.

Parameters:: node – A dictionary node.
Returns:: List of tags, or empty list if no tags.

Examples

>>> extract_tags({"_tags_": ["baseline", "v2"]})
["baseline", "v2"]

nirs4all.pipeline.config.generator.format_config_table(configs: List[Dict[str, Any]], columns: List[str] | None = None, max_rows: int = 20) → str[source]

Format configurations as an ASCII table.

Parameters:

configs – List of configuration dicts.
columns – Specific columns to show (None for auto-detect).
max_rows – Maximum rows to display.

Returns:

Formatted ASCII table string.

nirs4all.pipeline.config.generator.get_expansion_tree(spec: Any, key: str = 'root') → ExpansionTreeNode[source]

Build an expansion tree for a specification.

Parameters:

spec – Specification to analyze.
key – Key name for this node.

Returns:

ExpansionTreeNode representing the configuration space.

Examples

>>> spec = {"x": {"_or_": [1, 2]}, "y": {"_range_": [1, 3]}}
>>> tree = get_expansion_tree(spec)
>>> tree.count
6  # 2 x 3

nirs4all.pipeline.config.generator.get_preset(name: str) → Any[source]

Retrieve a preset specification by name.

Parameters:: name – Name of the preset.
Returns:: Deep copy of the preset specification.
Raises:: KeyError – If preset doesn’t exist.

nirs4all.pipeline.config.generator.get_preset_info(name: str) → Dict[str, Any][source]

Get full preset info including metadata.

Parameters:: name – Name of the preset.
Returns:: Dict with spec, description, tags.
Raises:: KeyError – If preset doesn’t exist.

nirs4all.pipeline.config.generator.get_strategy(node: Dict[str, Any]) → ExpansionStrategy | None[source]

Find the appropriate strategy for a node.

Iterates through registered strategies (in priority order) and returns the first one that can handle the node.

Parameters:: node – A dictionary node from the configuration.
Returns:: An ExpansionStrategy instance if one handles the node, None otherwise.

Examples

>>> strategy = get_strategy({"_or_": ["A", "B"]})
>>> strategy
OrStrategy(priority=10)
>>> result = strategy.expand(node)

nirs4all.pipeline.config.generator.has_cartesian_keyword(node: Dict[str, Any]) → bool[source]: Check if a node contains the _cartesian_ keyword.

nirs4all.pipeline.config.generator.has_chain_keyword(node: Dict[str, Any]) → bool[source]: Check if a node contains the _chain_ keyword.

nirs4all.pipeline.config.generator.has_grid_keyword(node: Dict[str, Any]) → bool[source]: Check if a node contains the _grid_ keyword.

nirs4all.pipeline.config.generator.has_log_range_keyword(node: Dict[str, Any]) → bool[source]: Check if a node contains the _log_range_ keyword.

nirs4all.pipeline.config.generator.has_or_keyword(node: Dict[str, Any]) → bool[source]

Check if a node contains the _or_ keyword.

Parameters:: node – A dictionary node from the configuration.
Returns:: True if the node contains _or_, False otherwise.

nirs4all.pipeline.config.generator.has_preset(name: str) → bool[source]

Check if a preset exists.

Parameters:: name – Name to check.
Returns:: True if preset exists.

nirs4all.pipeline.config.generator.has_range_keyword(node: Dict[str, Any]) → bool[source]

Check if a node contains the _range_ keyword.

Parameters:: node – A dictionary node from the configuration.
Returns:: True if the node contains _range_, False otherwise.

nirs4all.pipeline.config.generator.has_sample_keyword(node: Dict[str, Any]) → bool[source]: Check if a node contains the _sample_ keyword.

nirs4all.pipeline.config.generator.has_zip_keyword(node: Dict[str, Any]) → bool[source]: Check if a node contains the _zip_ keyword.

nirs4all.pipeline.config.generator.import_presets(presets: Dict[str, Any], overwrite: bool = False) → int[source]

Import presets from a dict.

Parameters:

presets – Dict mapping preset names to info dicts or specs.
overwrite – If True, overwrite existing presets.

Returns:

Number of presets imported.

nirs4all.pipeline.config.generator.is_generator_node(node: Dict[str, Any]) → bool[source]

Check if a dict node contains any generator keywords.

Parameters:: node – A dictionary node from the configuration.
Returns:: True if the node contains any generation keywords (_or_, _range_, etc.), False otherwise.

Examples

>>> is_generator_node({"_or_": ["A", "B"]})
True
>>> is_generator_node({"class": "MyClass"})
False
>>> is_generator_node({"_range_": [1, 10]})
True

nirs4all.pipeline.config.generator.is_preset_reference(node: Any) → bool[source]

Check if a node is a preset reference.

Parameters:: node – Node to check.
Returns:: True if node is a dict with _preset_ key.

nirs4all.pipeline.config.generator.is_pure_cartesian_node(node: Dict[str, Any]) → bool[source]: Check if a node is a pure cartesian node.

nirs4all.pipeline.config.generator.is_pure_chain_node(node: Dict[str, Any]) → bool[source]: Check if a node is a pure chain node.

nirs4all.pipeline.config.generator.is_pure_grid_node(node: Dict[str, Any]) → bool[source]: Check if a node is a pure grid node.

nirs4all.pipeline.config.generator.is_pure_log_range_node(node: Dict[str, Any]) → bool[source]: Check if a node is a pure log range node.

nirs4all.pipeline.config.generator.is_pure_or_node(node: Dict[str, Any]) → bool[source]

Check if a node is a pure OR node (only _or_, size, count keys).

Parameters:: node – A dictionary node from the configuration.
Returns:: True if the node contains only OR-related keys, False otherwise.

Examples

>>> is_pure_or_node({"_or_": ["A", "B"], "size": 2})
True
>>> is_pure_or_node({"_or_": ["A", "B"], "class": "X"})
False

nirs4all.pipeline.config.generator.is_pure_range_node(node: Dict[str, Any]) → bool[source]

Check if a node is a pure range node (only _range_, count keys).

Parameters:: node – A dictionary node from the configuration.
Returns:: True if the node contains only range-related keys, False otherwise.

Examples

>>> is_pure_range_node({"_range_": [1, 10]})
True
>>> is_pure_range_node({"_range_": [1, 10], "count": 5})
True
>>> is_pure_range_node({"_range_": [1, 10], "size": 2})
False

nirs4all.pipeline.config.generator.is_pure_sample_node(node: Dict[str, Any]) → bool[source]: Check if a node is a pure sample node.

nirs4all.pipeline.config.generator.is_pure_zip_node(node: Dict[str, Any]) → bool[source]: Check if a node is a pure zip node.

Iterate with progress reporting.

Parameters:

spec – Specification to expand.
seed – Random seed.
report_every – Report progress every N items.

Yields:

Tuples of (index, config).

nirs4all.pipeline.config.generator.list_presets(tags: List[str] | None = None) → List[str][source]

List all registered preset names.

Parameters:: tags – If provided, filter to presets with any of these tags.
Returns:: List of preset names.

nirs4all.pipeline.config.generator.parse_constraints(node: Dict[str, Any]) → Dict[str, List[List[Any]]][source]

Extract constraint specifications from a node.

Parameters:: node – Node containing constraint keywords.
Returns:: Dict with ‘mutex’, ‘requires’, ‘exclude’ lists.

nirs4all.pipeline.config.generator.print_expansion_tree(spec: Any, indent: str = ' ', show_counts: bool = True, max_depth: int | None = None) → str[source]

Format expansion tree as a printable string.

Parameters:

spec – Specification to visualize.
indent – Indentation string.
show_counts – Whether to show counts in output.
max_depth – Maximum depth to display.

Returns:

Formatted tree string.

Examples

>>> spec = {"x": {"_or_": [1, 2]}, "y": {"_range_": [1, 3]}}
>>> print(print_expansion_tree(spec))
root (6 variants)
├── x: _or_ (2 variants)
│   ├── [0]: scalar
│   └── [1]: scalar
└── y: _range_ (3 variants)

nirs4all.pipeline.config.generator.register_builtin_presets() → None[source]

Register built-in preset configurations.

These are common patterns that users might want to use.

nirs4all.pipeline.config.generator.register_preset(name: str, spec: Any, description: str | None = None, tags: List[str] | None = None, overwrite: bool = False) → None[source]

Register a named preset configuration.

Parameters:

name – Unique name for the preset.
spec – Configuration specification (dict, list, or scalar).
description – Optional human-readable description.
tags – Optional list of tags for categorization.
overwrite – If True, overwrite existing preset with same name.

Raises:

ValueError – If preset name already exists and overwrite=False.

Examples

>>> register_preset("my_models", {"_or_": ["PLS", "RF"]})
>>> register_preset("my_models", {"_or_": ["SVM"]}, overwrite=True)

nirs4all.pipeline.config.generator.register_strategy(strategy_cls: Type[ExpansionStrategy], priority: int | None = None) → Type[ExpansionStrategy][source]

Register a strategy class.

Can be used as a decorator or called directly.

Parameters:

strategy_cls – The strategy class to register.
priority – Optional priority override. If None, uses class priority.

Returns:

The strategy class (for decorator usage).

Examples

>>> @register_strategy
... class MyStrategy(ExpansionStrategy):
...     priority = 10
...     ...

>>> register_strategy(MyStrategy, priority=5)

nirs4all.pipeline.config.generator.resolve_preset(node: Dict[str, Any]) → Any[source]

Resolve a single preset reference.

Parameters:

node – Dict containing _preset_ key.

Returns:

Resolved preset specification.

Raises:

KeyError – If referenced preset doesn’t exist.
ValueError – If _preset_ value is not a string.

nirs4all.pipeline.config.generator.resolve_presets_recursive(node: Any, resolved: Set[str] | None = None) → Any[source]

Recursively resolve all preset references in a configuration.

Handles circular reference detection.

Parameters:

node – Configuration node (dict, list, or scalar).
resolved – Set of already-resolved presets (for cycle detection).

Returns:

Node with all preset references resolved.

Raises:

ValueError – If circular preset reference detected.

nirs4all.pipeline.config.generator.sample_with_seed(population: List[T], k: int, seed: int | None = None, weights: List[float] | None = None) → List[T][source]

Sample k items from population with optional seed for reproducibility.

This function wraps Python’s random sampling functions to provide deterministic behavior when a seed is specified. The function uses random.sample for unweighted sampling and random.choices for weighted sampling.

Parameters:

population – List of items to sample from.
k – Number of items to sample.
seed – Optional random seed for reproducibility. If None, uses current random state (non-deterministic).
weights – Optional list of weights for weighted random selection. Must have the same length as population. If None, uniform sampling is used.

Returns:

List of k sampled items from population.

Raises:

ValueError – If k is larger than population size (for unweighted sampling).
ValueError – If weights length doesn’t match population length.

Examples

>>> sample_with_seed(["A", "B", "C", "D"], 2, seed=42)
['D', 'A']  # Deterministic result with seed=42
>>> sample_with_seed(["A", "B", "C"], 2, seed=42)
['C', 'A']  # Same seed produces same sequence
>>> sample_with_seed(["A", "B", "C"], 5, seed=42)  # k > len(population)
['A', 'B', 'C']  # Returns all items (capped at population size)

nirs4all.pipeline.config.generator.summarize_configs(configs: List[Any], max_unique: int = 10) → Dict[str, Any][source]

Summarize a list of configurations.

Parameters:

configs – List of configurations to summarize.
max_unique – Maximum unique values to show per key.

Returns:

Summary dict with statistics for each key.

Examples

>>> configs = [
...     {"model": "PLS", "n": 5},
...     {"model": "PLS", "n": 10},
...     {"model": "RF", "n": 5}
... ]
>>> summary = summarize_configs(configs)
>>> summary["model"]["unique_values"]
['PLS', 'RF']

nirs4all.pipeline.config.generator.to_dataframe(configs: List[Any], flatten: bool = True, prefix_sep: str = '.', include_index: bool = True) → Any[source]

Convert expanded configurations to a pandas DataFrame.

Parameters:

configs – List of expanded configurations.
flatten – If True, flatten nested dicts with dot notation.
prefix_sep – Separator for flattened keys (default “.”).
include_index – If True, include a config index column.

Returns:

pandas DataFrame with one row per configuration.

Raises:

ImportError – If pandas is not installed.

Examples

>>> configs = [
...     {"model": "PLS", "n_components": 5},
...     {"model": "PLS", "n_components": 10},
...     {"model": "RF", "n_estimators": 100}
... ]
>>> df = to_dataframe(configs)
>>> df.columns.tolist()
['config_index', 'model', 'n_components', 'n_estimators']

nirs4all.pipeline.config.generator.unregister_preset(name: str) → bool[source]

Remove a preset from the registry.

Parameters:: name – Name of preset to remove.
Returns:: True if preset was removed, False if it didn’t exist.

nirs4all.pipeline.config.generator.validate_config(config: Any, schema: Dict[str, Any] | None = None, required_keys: Set[str] | None = None, forbidden_keys: Set[str] | None = None, path: str = 'root') → ValidationResult[source]

Validate an expanded configuration.

This validates configurations after expansion, checking for structural correctness and optionally against a schema.

Parameters:

config – The expanded configuration to validate.
schema – Optional schema definition for validation.
required_keys – Optional set of keys that must be present.
forbidden_keys – Optional set of keys that must not be present.
path – JSONPath-like location for error reporting.

Returns:

ValidationResult containing validation outcome.

Examples

>>> config = {"class": "MyClass", "params": {"n": 5}}
>>> result = validate_config(config, required_keys={"class"})
>>> result.is_valid
True

nirs4all.pipeline.config.generator.validate_constraints(constraints: Dict[str, List[List[Any]]], choices: List[Any]) → List[str][source]

Validate constraint specifications against available choices.

Parameters:

constraints – Constraint dict from parse_constraints.
choices – Available choice items.

Returns:

List of validation error messages.

nirs4all.pipeline.config.generator.validate_expanded_configs(configs: List[Any], schema: Dict[str, Any] | None = None, min_count: int = 0, max_count: int | None = None) → ValidationResult[source]

Validate a list of expanded configurations.

Parameters:

configs – List of expanded configurations.
schema – Optional schema for each configuration.
min_count – Minimum number of configurations required.
max_count – Maximum number of configurations allowed.

Returns:

ValidationResult for the entire list.

nirs4all.pipeline.config.generator.validate_spec(spec: Any, path: str = 'root', strict: bool = False, custom_validators: List[Callable] | None = None) → ValidationResult[source]

Validate a generator specification before expansion.

Recursively validates the structure of a generator specification, checking for valid syntax, consistent keyword usage, and semantic correctness.

Parameters:

spec – The specification to validate (can be any type).
path – JSONPath-like location for error reporting.
strict – If True, also report warnings as errors.
custom_validators – Optional list of custom validation functions. Each function should accept (node, path) and return ValidationResult.

Returns:

ValidationResult containing validation outcome.

Examples

>>> result = validate_spec({"_or_": ["A", "B"]})
>>> result.is_valid
True

>>> result = validate_spec({"_or_": "not a list"})
>>> result.is_valid
False
>>> result.errors[0].message
"_or_ must be a list, got str"