nirs4all.pipeline.config.generator module
Generator module for pipeline configuration expansion.
This module expands pipeline configuration specifications into concrete pipeline variants. It handles combinatorial keywords (_or_, _range_, size, count, pick, arrange) and generates all possible combinations.
This is the public API module. The implementation is in the _generator subpackage, which uses a Strategy pattern for modular node handling.
- Main Functions:
expand_spec(node, seed): Expand a configuration node into all variants expand_spec_iter(node, seed): Lazy iterator version for large spaces count_combinations(node): Count variants without generating them
- Keywords:
_or_: Choice between alternatives _range_: Numeric sequence generation size: Number of items to select (legacy, uses combinations) pick: Unordered selection (combinations) - explicit intent arrange: Ordered arrangement (permutations) - explicit intent then_pick: Second-order combination selection then_arrange: Second-order permutation selection count: Limit number of generated variants
_log_range_: Logarithmic sequence generation _grid_: Grid search style Cartesian product _zip_: Parallel iteration (like Python’s zip) _chain_: Sequential ordered choices _sample_: Statistical sampling (uniform, log-uniform, normal) _tags_: Configuration tagging for filtering _metadata_: Arbitrary metadata attachment
Constraints: _mutex_, _requires_, _exclude_ for filtering combinations Presets: _preset_ for named configuration templates Iterator: expand_spec_iter for memory-efficient lazy expansion Export: to_dataframe, diff_configs, print_expansion_tree utilities
Examples
- Basic choice expansion:
>>> expand_spec({"_or_": ["A", "B", "C"]}) ['A', 'B', 'C']
- Pick (combinations):
>>> expand_spec({"_or_": ["A", "B", "C"], "pick": 2}) [['A', 'B'], ['A', 'C'], ['B', 'C']]
- Arrange (permutations):
>>> expand_spec({"_or_": ["A", "B", "C"], "arrange": 2}) [['A', 'B'], ['B', 'A'], ['A', 'C'], ['C', 'A'], ['B', 'C'], ['C', 'B']]
- Mutual exclusion constraint (Phase 4):
>>> expand_spec({"_or_": ["A", "B", "C"], "pick": 2, "_mutex_": [["A", "B"]]}) [['A', 'C'], ['B', 'C']] # ["A", "B"] excluded
- Lazy iteration for large spaces (Phase 4):
>>> for config in expand_spec_iter({"_range_": [1, 1000000]}): ... process(config) # Memory efficient
- Numeric range:
>>> expand_spec({"_range_": [1, 5]}) [1, 2, 3, 4, 5]
- Logarithmic range:
>>> expand_spec({"_log_range_": [0.001, 1, 4]}) [0.001, 0.01, 0.1, 1.0]
- Grid search:
>>> expand_spec({"_grid_": {"x": [1, 2], "y": ["A", "B"]}}) [{'x': 1, 'y': 'A'}, {'x': 1, 'y': 'B'}, {'x': 2, 'y': 'A'}, {'x': 2, 'y': 'B'}]
- Parallel zip:
>>> expand_spec({"_zip_": {"x": [1, 2], "y": ["A", "B"]}}) [{'x': 1, 'y': 'A'}, {'x': 2, 'y': 'B'}]
- Nested dict expansion:
>>> expand_spec({"x": {"_or_": [1, 2]}, "y": 3}) [{'x': 1, 'y': 3}, {'x': 2, 'y': 3}]
- Architecture:
The _generator subpackage uses the Strategy pattern: - strategies/base.py: ExpansionStrategy abstract base class - strategies/registry.py: Strategy registration and dispatch - strategies/range_strategy.py: Handles _range_ nodes - strategies/or_strategy.py: Handles _or_ nodes with pick/arrange/constraints - strategies/log_range_strategy.py: Handles _log_range_ nodes (Phase 3) - strategies/grid_strategy.py: Handles _grid_ nodes (Phase 3) - strategies/zip_strategy.py: Handles _zip_ nodes (Phase 3) - strategies/chain_strategy.py: Handles _chain_ nodes (Phase 3) - strategies/sample_strategy.py: Handles _sample_ nodes (Phase 3) - validators/schema.py: Specification and config validation (Phase 3) - iterator.py: Lazy expansion with expand_spec_iter (Phase 4) - constraints.py: Constraint evaluation (_mutex_, _requires_) (Phase 4) - presets.py: Preset registry and resolution (Phase 4) - core.py: Main expansion logic using strategy dispatch - keywords.py: Keyword constants and detection utilities - utils/: Helper functions (sampling, combinatorics, export)
- class nirs4all.pipeline.config.generator.CartesianStrategy[source]
Bases:
ExpansionStrategyStrategy for handling _cartesian_ nodes.
Generates the Cartesian product of all stages first (each stage being an _or_ node or list of options), then applies pick or arrange selection to the complete pipelines.
This differs from _grid_ which produces dicts. _cartesian_ produces lists (ordered stages) which is ideal for preprocessing pipelines.
- Supported formats:
Array of stages: [stage1, stage2, …]
With pick: Select N combinations of complete pipelines
With arrange: Select N permutations of complete pipelines
With count: Limit number of results
With constraints: Filter invalid combinations
- count(node: Dict[str, Any], count_nested: callable | None = None) int[source]
Count cartesian combinations without generating them.
- Parameters:
node – Cartesian specification node.
count_nested – Callback to count nested nodes.
- Returns:
Number of pipeline combinations.
- expand(node: Dict[str, Any], seed: int | None = None, expand_nested: callable | None = None) List[Any][source]
Expand a cartesian node to list of pipeline combinations.
The process: 1. Expand each stage to get its options 2. Compute Cartesian product of all stages -> complete pipelines 3. If pick/arrange specified, select from complete pipelines 4. Apply constraints if specified 5. Apply count limit if specified
- Parameters:
node – Cartesian specification node.
seed – Optional seed for random sampling when count is used.
expand_nested – Callback to expand nested generator nodes.
- Returns:
List of pipeline combinations.
Examples
>>> strategy.expand({ ... "_cartesian_": [ ... {"_or_": ["A", "B"]}, ... {"_or_": ["X", "Y"]} ... ], ... "pick": 2 ... }) [[["A", "X"], ["A", "Y"]], [["A", "X"], ["B", "X"]], ...]
- classmethod handles(node: Dict[str, Any]) bool[source]
Check if node is a pure cartesian node.
- Parameters:
node – Dictionary node to check.
- Returns:
True if node contains _cartesian_ and only cartesian-related keys.
- class nirs4all.pipeline.config.generator.ChainStrategy[source]
Bases:
ExpansionStrategyStrategy for handling _chain_ nodes.
Generates configurations in sequential order. Each item in the chain is expanded and added to the result list in order.
- Supported formats:
Array: [config1, config2, …]
With count: Limits output to first n items (not random)
- count(node: Dict[str, Any], count_nested: callable | None = None) int[source]
Count chain items without generating them.
- Parameters:
node – Chain specification node.
count_nested – Callback to count nested nodes.
- Returns:
Number of items in the chain.
- expand(node: Dict[str, Any], seed: int | None = None, expand_nested: callable | None = None) List[Any][source]
Expand a chain node to list of sequential configurations.
- Parameters:
node – Chain specification node.
seed – Optional seed for random sampling when count is used.
expand_nested – Callback to expand nested generator nodes.
- Returns:
List of configurations in order.
Examples
>>> strategy.expand({"_chain_": [{"x": 1}, {"x": 2}, {"x": 3}]}) [{"x": 1}, {"x": 2}, {"x": 3}]
- class nirs4all.pipeline.config.generator.ExpansionStrategy[source]
Bases:
ABCAbstract base class for generator expansion strategies.
Each strategy is responsible for: 1. Detecting if it can handle a specific node type 2. Expanding the node into all possible variants 3. Counting the variants without generating them
- Subclasses must implement:
handles(node): Check if strategy can handle this node
expand(node, seed): Expand node to list of variants
count(node): Count variants without generating
- abstractmethod count(node: Dict[str, Any], count_nested: callable | None = None) int[source]
Count the number of variants without generating them.
- Parameters:
node – A dictionary node to count.
count_nested – Callback to count nested nodes recursively.
- Returns:
Number of variants that would be generated.
- abstractmethod expand(node: Dict[str, Any], seed: int | None = None, expand_nested: callable | None = None) List[Any][source]
Expand a node into all possible variants.
- Parameters:
node – A dictionary node to expand.
seed – Optional random seed for reproducible generation.
expand_nested – Callback to expand nested nodes recursively. This allows strategies to delegate back to the main expansion logic for nested structures.
- Returns:
List of expanded variants.
- class nirs4all.pipeline.config.generator.ExpansionTreeNode(key: str, node_type: str, count: int, children: List[ExpansionTreeNode] | None = None, details: Dict[str, Any] | None = None)[source]
Bases:
objectNode in an expansion tree visualization.
- class nirs4all.pipeline.config.generator.GridStrategy[source]
Bases:
ExpansionStrategyStrategy for handling _grid_ nodes.
Generates all combinations (Cartesian product) of parameter values. Similar to sklearn’s ParameterGrid.
- Supported formats:
Dict: {“param1”: [v1, v2], “param2”: [v3, v4]}
With count: Limits output to n random samples
- count(node: Dict[str, Any], count_nested: callable | None = None) int[source]
Count grid combinations without generating them.
- Parameters:
node – Grid specification node.
count_nested – Callback to count nested nodes.
- Returns:
Number of parameter combinations.
- expand(node: Dict[str, Any], seed: int | None = None, expand_nested: callable | None = None) List[Any][source]
Expand a grid node to list of parameter combinations.
- Parameters:
node – Grid specification node.
seed – Optional seed for random sampling when count is used.
expand_nested – Callback to expand nested generator nodes.
- Returns:
List of dicts with all parameter combinations.
Examples
>>> strategy.expand({"_grid_": {"x": [1, 2], "y": ["A", "B"]}}) [{"x": 1, "y": "A"}, {"x": 1, "y": "B"}, {"x": 2, "y": "A"}, {"x": 2, "y": "B"}]
- class nirs4all.pipeline.config.generator.LogRangeStrategy[source]
Bases:
ExpansionStrategyStrategy for handling _log_range_ nodes.
Generates logarithmically-spaced numeric sequences. Useful for hyperparameter search over values that span multiple orders of magnitude.
- Supported formats:
Array: [from, to, num] - num values from from to to
Dict: {“from”: start, “to”: end, “num”: n}
Dict: {“from”: start, “to”: end, “base”: b} - explicit base
With count: Limits output to n random samples
- count(node: Dict[str, Any], count_nested: callable | None = None) int[source]
Count log range elements without generating them.
- Parameters:
node – Log range specification node.
count_nested – Not used for log range nodes.
- Returns:
Number of values in the log range.
- expand(node: Dict[str, Any], seed: int | None = None, expand_nested: callable | None = None) List[Any][source]
Expand a log range node to list of numeric values.
- Parameters:
node – Log range specification node.
seed – Optional seed for random sampling when count is used.
expand_nested – Not used for log range nodes (no nesting).
- Returns:
List of logarithmically-spaced numeric values.
- Raises:
ValueError – If log range specification is invalid.
Examples
>>> strategy.expand({"_log_range_": [0.001, 1, 4]}) [0.001, 0.01, 0.1, 1.0] >>> strategy.expand({"_log_range_": [1, 1000, 4]}) [1.0, 10.0, 100.0, 1000.0]
- class nirs4all.pipeline.config.generator.OrStrategy[source]
Bases:
ExpansionStrategyStrategy for handling _or_ nodes with selection semantics.
- Supports:
Basic choice expansion (each alternative becomes a variant)
pick: Unordered selection using combinations
arrange: Ordered arrangement using permutations
size: Legacy alias for pick (backward compatibility)
Second-order selection via then_pick/then_arrange or [outer, inner]
count: Limit number of generated variants
Constraints: _mutex_, _requires_, _exclude_ for filtering (Phase 4)
- keywords
{_or_, size, count, pick, arrange, then_pick, then_arrange, _mutex_, _requires_, _exclude_}
- Type:
FrozenSet[str]
- count(node: Dict[str, Any], count_nested: callable | None = None) int[source]
Count OR node variants without generating them.
- Parameters:
node – OR specification node.
count_nested – Callback to count nested nodes.
- Returns:
Number of variants.
- expand(node: Dict[str, Any], seed: int | None = None, expand_nested: callable | None = None) List[Any][source]
Expand an OR node to list of variants.
- Parameters:
node – OR specification node.
seed – Optional seed for random sampling.
expand_nested – Callback to expand nested generator nodes.
- Returns:
List of expanded variants.
- classmethod handles(node: Dict[str, Any]) bool[source]
Check if node is a pure OR node.
A pure OR node contains _or_ and only OR-related modifier keys.
- Parameters:
node – Dictionary node to check.
- Returns:
True if node is a pure OR node.
- class nirs4all.pipeline.config.generator.RangeStrategy[source]
Bases:
ExpansionStrategyStrategy for handling _range_ nodes.
Generates numeric sequences based on range specifications.
- Supported formats:
Array: [from, to] or [from, to, step]
Dict: {“from”: start, “to”: end, “step”: step}
With count: Limits output to n random samples
- count(node: Dict[str, Any], count_nested: callable | None = None) int[source]
Count range elements without generating them.
- Parameters:
node – Range specification node.
count_nested – Not used for range nodes.
- Returns:
Number of values in the range.
- Raises:
ValueError – If range specification is invalid.
- expand(node: Dict[str, Any], seed: int | None = None, expand_nested: callable | None = None) List[Any][source]
Expand a range node to list of numeric values.
- Parameters:
node – Range specification node.
seed – Optional seed for random sampling when count is used.
expand_nested – Not used for range nodes (no nesting).
- Returns:
List of numeric values.
- Raises:
ValueError – If range specification is invalid.
Examples
>>> strategy.expand({"_range_": [1, 5]}) [1, 2, 3, 4, 5] >>> strategy.expand({"_range_": [0, 10, 2]}) [0, 2, 4, 6, 8, 10]
- class nirs4all.pipeline.config.generator.SampleStrategy[source]
Bases:
ExpansionStrategyStrategy for handling _sample_ nodes.
Generates values using statistical sampling from various distributions. Supports uniform, log-uniform, normal, and choice distributions.
- Supported distributions:
uniform: Uniform distribution between from and to
log_uniform: Log-uniform distribution (common for learning rates)
normal/gaussian: Normal distribution with mean and std
choice: Random selection from a list of values
- SUPPORTED_DISTRIBUTIONS = {'choice', 'gaussian', 'log_uniform', 'normal', 'uniform'}
- count(node: Dict[str, Any], count_nested: callable | None = None) int[source]
Count sample results (simply returns num).
- Parameters:
node – Sample specification node.
count_nested – Not used.
- Returns:
Number of samples to generate.
- expand(node: Dict[str, Any], seed: int | None = None, expand_nested: callable | None = None) List[Any][source]
Expand a sample node to list of sampled values.
- Parameters:
node – Sample specification node.
seed – Optional seed for reproducible sampling.
expand_nested – Not typically used for sample nodes.
- Returns:
List of sampled values.
Examples
>>> strategy.expand({"_sample_": {"distribution": "uniform", "from": 0, "to": 1, "num": 3}}, seed=42) [0.6394267984578837, 0.025010755222666936, 0.27502931836911926]
- exception nirs4all.pipeline.config.generator.ValidationError(message: str, path: str = '', severity: ValidationSeverity = ValidationSeverity.ERROR, code: str = '', suggestion: str | None = None)[source]
Bases:
ExceptionException for validation failures with detailed context.
- severity
Error severity level
- severity: ValidationSeverity = 'error'
- class nirs4all.pipeline.config.generator.ValidationResult(is_valid: bool = True, errors: List[ValidationError] = <factory>, warnings: List[ValidationError] = <factory>, info: List[ValidationError] = <factory>, node_count: int = 0, generator_count: int = 0)[source]
Bases:
objectResult of configuration validation.
- errors
List of validation errors
- warnings
List of validation warnings
- info
List of informational messages
- add_error(error: ValidationError) None[source]
Add a validation error.
- errors: List[ValidationError]
- info: List[ValidationError]
- merge(other: ValidationResult) ValidationResult[source]
Merge another validation result into this one.
- warnings: List[ValidationError]
- class nirs4all.pipeline.config.generator.ValidationSeverity(value)[source]
Bases:
EnumSeverity levels for validation issues.
- ERROR = 'error'
- INFO = 'info'
- WARNING = 'warning'
- class nirs4all.pipeline.config.generator.ZipStrategy[source]
Bases:
ExpansionStrategyStrategy for handling _zip_ nodes.
Generates configurations by pairing values at the same index from multiple parameter lists (like Python’s zip).
- Supported formats:
Dict: {“param1”: [v1, v2], “param2”: [v3, v4]}
With count: Limits output to n random samples
- count(node: Dict[str, Any], count_nested: callable | None = None) int[source]
Count zip pairs without generating them.
- Parameters:
node – Zip specification node.
count_nested – Callback to count nested nodes.
- Returns:
Number of zipped pairs (minimum list length).
- expand(node: Dict[str, Any], seed: int | None = None, expand_nested: callable | None = None) List[Any][source]
Expand a zip node to list of paired parameter values.
- Parameters:
node – Zip specification node.
seed – Optional seed for random sampling when count is used.
expand_nested – Callback to expand nested generator nodes.
- Returns:
List of dicts with paired parameter values.
Examples
>>> strategy.expand({"_zip_": {"x": [1, 2, 3], "y": ["A", "B", "C"]}}) [{"x": 1, "y": "A"}, {"x": 2, "y": "B"}, {"x": 3, "y": "C"}]
- nirs4all.pipeline.config.generator.apply_all_constraints(combinations: List[List[Any]], mutex_groups: List[List[Any]] | None = None, requires_groups: List[List[Any]] | None = None, exclude_combos: List[List[Any]] | None = None) List[List[Any]][source]
Apply all constraints in sequence.
- Parameters:
combinations – List of combinations to filter.
mutex_groups – Mutual exclusion groups.
requires_groups – Dependency requirement pairs.
exclude_combos – Specific combinations to exclude.
- Returns:
Filtered list satisfying all constraints.
- nirs4all.pipeline.config.generator.apply_exclude_constraint(combinations: List[List[Any]], exclude_combos: List[List[Any]]) List[List[Any]][source]
Filter specific combinations from results.
- Parameters:
combinations – List of combinations to filter.
exclude_combos – Specific combinations to exclude.
- Returns:
Filtered list excluding specified combinations.
Examples
>>> combos = [["A", "B"], ["A", "C"], ["B", "C"]] >>> apply_exclude_constraint(combos, [["A", "B"]]) [['A', 'C'], ['B', 'C']]
- nirs4all.pipeline.config.generator.apply_mutex_constraint(combinations: List[List[Any]], mutex_groups: List[List[Any]]) List[List[Any]][source]
Filter combinations that violate mutual exclusion constraints.
A mutex constraint [A, B] means A and B cannot both be present in the same combination.
- Parameters:
combinations – List of combinations to filter.
mutex_groups – List of mutex groups. Each group is a list of items that cannot appear together in the same combination.
- Returns:
Filtered list of combinations that satisfy all mutex constraints.
Examples
>>> combos = [["A", "B"], ["A", "C"], ["B", "C"]] >>> apply_mutex_constraint(combos, [["A", "B"]]) [['A', 'C'], ['B', 'C']]
>>> apply_mutex_constraint(combos, [["A", "B"], ["B", "C"]]) [['A', 'C']]
- nirs4all.pipeline.config.generator.apply_requires_constraint(combinations: List[List[Any]], requires_groups: List[List[Any]]) List[List[Any]][source]
Filter combinations that violate dependency requirements.
A requires constraint [A, B] means if A is present, B must also be present. This is a one-directional dependency from A to B.
- Parameters:
combinations – List of combinations to filter.
requires_groups – List of requirement pairs. Each pair [A, B] means if A is selected, B must also be selected.
- Returns:
Filtered list of combinations that satisfy all requires constraints.
Examples
>>> combos = [["A", "B"], ["A", "C"], ["B", "C"]] >>> apply_requires_constraint(combos, [["A", "B"]]) [['A', 'B'], ['B', 'C']] # "A, C" removed because A requires B
>>> # B and C without A is OK because no constraint on B or C
- nirs4all.pipeline.config.generator.batch_iter(spec: Dict[str, Any] | List[Any] | str | int | float | bool | None, batch_size: int, seed: int | None = None) Iterator[List[Any]][source]
Iterate in batches for chunk processing.
- Parameters:
spec – Specification to expand.
batch_size – Number of configs per batch.
seed – Random seed.
- Yields:
Lists of up to batch_size configurations.
- nirs4all.pipeline.config.generator.clear_presets() int[source]
Clear all registered presets.
- Returns:
Number of presets cleared.
- nirs4all.pipeline.config.generator.count_combinations(node: Dict[str, Any] | List[Any] | str | int | float | bool | None) int[source]
Calculate total number of combinations without generating them.
This is more efficient than generating all combinations when you only need to know the count.
- Parameters:
node – Configuration node to count.
- Returns:
Number of variants that expand_spec would produce.
Examples
>>> count_combinations({"_or_": ["A", "B", "C"]}) 3 >>> count_combinations({"_or_": ["A", "B", "C"], "pick": 2}) 3 # C(3,2) >>> count_combinations({"_range_": [1, 10]}) 10
- nirs4all.pipeline.config.generator.diff_configs(config1: Any, config2: Any, path: str = '') Dict[str, Tuple[Any, Any]][source]
Find differences between two configurations.
- Parameters:
config1 – First configuration.
config2 – Second configuration.
path – Current path (for nested diff reporting).
- Returns:
Dict mapping paths to (value1, value2) tuples where values differ.
Examples
>>> config1 = {"model": "PLS", "n_components": 5} >>> config2 = {"model": "PLS", "n_components": 10} >>> diff_configs(config1, config2) {'n_components': (5, 10)}
>>> config1 = {"a": {"b": 1}} >>> config2 = {"a": {"b": 2}} >>> diff_configs(config1, config2) {'a.b': (1, 2)}
- nirs4all.pipeline.config.generator.expand_spec(node: Dict[str, Any] | List[Any] | str | int | float | bool | None, seed: int | None = None) List[Any][source]
Expand a specification node to all possible combinations.
This is the main entry point for configuration expansion. It handles all node types and delegates to appropriate strategies for special generator nodes.
- Parameters:
node – Configuration node to expand. Can be: - dict: Expanded based on keys (strategies or Cartesian product) - list: Cartesian product of expanded elements - scalar: Wrapped in a list
seed – Optional random seed for reproducible generation when using ‘count’ to limit results.
- Returns:
List of expanded variants.
Examples
>>> expand_spec({"_or_": ["A", "B"]}) ['A', 'B'] >>> expand_spec({"_range_": [1, 3]}) [1, 2, 3] >>> expand_spec({"x": {"_or_": [1, 2]}, "y": "fixed"}) [{'x': 1, 'y': 'fixed'}, {'x': 2, 'y': 'fixed'}]
- nirs4all.pipeline.config.generator.expand_spec_iter(node: Dict[str, Any] | List[Any] | str | int | float | bool | None, seed: int | None = None, sample_size: int | None = None) Iterator[Any][source]
Lazily expand a specification node to all possible combinations.
This is the memory-efficient version of expand_spec that yields configurations one at a time instead of building a complete list.
- Parameters:
node – Configuration node to expand. Can be: - dict: Expanded based on keys (strategies or Cartesian product) - list: Cartesian product of expanded elements - scalar: Yielded as single item
seed – Optional random seed for reproducible generation when using sample_size to limit results.
sample_size – If provided, yield at most this many items using reservoir sampling for uniform distribution.
- Yields:
Expanded configuration variants one at a time.
Examples
>>> list(expand_spec_iter({"_or_": ["A", "B"]})) ['A', 'B']
>>> from itertools import islice >>> large_spec = {"_range_": [1, 1000000]} >>> list(islice(expand_spec_iter(large_spec), 5)) [1, 2, 3, 4, 5]
>>> # With sampling >>> list(expand_spec_iter({"_range_": [1, 100]}, seed=42, sample_size=5)) [23, 45, 67, 12, 89] # Random 5 items
- nirs4all.pipeline.config.generator.expand_spec_with_choices(node: Dict[str, Any] | List[Any] | str | int | float | bool | None, seed: int | None = None) List[tuple][source]
Expand a specification node and track generator choices.
Like expand_spec, but also returns the choices made at each generator node (_or_, _range_, etc.) for each expanded variant. This is useful for tracking which specific values were selected to produce each pipeline configuration.
- Parameters:
node – Configuration node to expand.
seed – Optional random seed for reproducible generation.
- Returns:
List of (expanded_config, generator_choices) tuples. Each generator_choices is a list of dicts like: [{“_or_”: selected_value}, {“_range_”: 18}, …] in the order they were encountered during expansion.
Examples
>>> results = expand_spec_with_choices({"_or_": ["A", "B"]}) >>> results [('A', [{'_or_': 'A'}]), ('B', [{'_or_': 'B'}])]
>>> results = expand_spec_with_choices({"x": {"_or_": [1, 2]}, "y": 3}) >>> results [({'x': 1, 'y': 3}, [{'_or_': 1}]), ({'x': 2, 'y': 3}, [{'_or_': 2}])]
- nirs4all.pipeline.config.generator.export_presets() Dict[str, Any][source]
Export all presets for serialization.
- Returns:
Dict of all presets with metadata.
- nirs4all.pipeline.config.generator.extract_base_node(node: Dict[str, Any]) Dict[str, Any][source]
Extract non-keyword keys from a node.
Returns a copy of the node with all generator and modifier keywords removed.
- Parameters:
node – A dictionary node from the configuration.
- Returns:
A dictionary containing only the non-keyword key-value pairs.
Examples
>>> extract_base_node({"_or_": ["A", "B"], "class": "MyClass", "size": 2}) {"class": "MyClass"} >>> extract_base_node({"class": "MyClass", "params": {"n": 5}}) {"class": "MyClass", "params": {"n": 5}}
- nirs4all.pipeline.config.generator.extract_constraints(node: Dict[str, Any]) Dict[str, Any][source]
Extract constraint specifications from a node.
- Parameters:
node – A dictionary node.
- Returns:
Dict containing constraint specifications (_mutex_, _requires_, etc.)
Examples
>>> extract_constraints({"_mutex_": [["A", "B"]], "_requires_": [["C", "D"]]}) {"_mutex_": [["A", "B"]], "_requires_": [["C", "D"]]}
- nirs4all.pipeline.config.generator.extract_metadata(node: Dict[str, Any]) Dict[str, Any][source]
Extract metadata from a node.
- Parameters:
node – A dictionary node.
- Returns:
Metadata dict, or empty dict if no metadata.
Examples
>>> extract_metadata({"_metadata_": {"author": "user1"}}) {"author": "user1"}
- nirs4all.pipeline.config.generator.extract_modifiers(node: Dict[str, Any]) Dict[str, Any][source]
Extract modifier values from a node.
Extracts all modifier keywords (size, count, _seed_, _weights_, _exclude_) from a node and returns them as a dictionary.
- Parameters:
node – A dictionary node from the configuration.
- Returns:
A dictionary containing only the modifier key-value pairs found in the node.
Examples
>>> extract_modifiers({"_or_": ["A", "B"], "size": 2, "count": 1}) {"size": 2, "count": 1} >>> extract_modifiers({"_or_": ["A", "B"]}) {}
- nirs4all.pipeline.config.generator.extract_or_choices(node: Dict[str, Any]) list[source]
Extract the choices list from an OR node.
- Parameters:
node – A dictionary node containing the _or_ keyword.
- Returns:
The list of choices, or an empty list if _or_ is not present.
Examples
>>> extract_or_choices({"_or_": ["A", "B", "C"]}) ["A", "B", "C"] >>> extract_or_choices({"class": "MyClass"}) []
- nirs4all.pipeline.config.generator.extract_range_spec(node: Dict[str, Any]) Any[source]
Extract the range specification from a range node.
- Parameters:
node – A dictionary node containing the _range_ keyword.
- Returns:
The range specification (list or dict), or None if not present.
Examples
>>> extract_range_spec({"_range_": [1, 10, 2]}) [1, 10, 2] >>> extract_range_spec({"_range_": {"from": 1, "to": 10}}) {"from": 1, "to": 10}
- nirs4all.pipeline.config.generator.extract_tags(node: Dict[str, Any]) list[source]
Extract tags from a node.
- Parameters:
node – A dictionary node.
- Returns:
List of tags, or empty list if no tags.
Examples
>>> extract_tags({"_tags_": ["baseline", "v2"]}) ["baseline", "v2"]
- nirs4all.pipeline.config.generator.format_config_table(configs: List[Dict[str, Any]], columns: List[str] | None = None, max_rows: int = 20) str[source]
Format configurations as an ASCII table.
- Parameters:
configs – List of configuration dicts.
columns – Specific columns to show (None for auto-detect).
max_rows – Maximum rows to display.
- Returns:
Formatted ASCII table string.
- nirs4all.pipeline.config.generator.get_expansion_tree(spec: Any, key: str = 'root') ExpansionTreeNode[source]
Build an expansion tree for a specification.
- Parameters:
spec – Specification to analyze.
key – Key name for this node.
- Returns:
ExpansionTreeNode representing the configuration space.
Examples
>>> spec = {"x": {"_or_": [1, 2]}, "y": {"_range_": [1, 3]}} >>> tree = get_expansion_tree(spec) >>> tree.count 6 # 2 x 3
- nirs4all.pipeline.config.generator.get_preset(name: str) Any[source]
Retrieve a preset specification by name.
- Parameters:
name – Name of the preset.
- Returns:
Deep copy of the preset specification.
- Raises:
KeyError – If preset doesn’t exist.
- nirs4all.pipeline.config.generator.get_preset_info(name: str) Dict[str, Any][source]
Get full preset info including metadata.
- Parameters:
name – Name of the preset.
- Returns:
Dict with spec, description, tags.
- Raises:
KeyError – If preset doesn’t exist.
- nirs4all.pipeline.config.generator.get_strategy(node: Dict[str, Any]) ExpansionStrategy | None[source]
Find the appropriate strategy for a node.
Iterates through registered strategies (in priority order) and returns the first one that can handle the node.
- Parameters:
node – A dictionary node from the configuration.
- Returns:
An ExpansionStrategy instance if one handles the node, None otherwise.
Examples
>>> strategy = get_strategy({"_or_": ["A", "B"]}) >>> strategy OrStrategy(priority=10) >>> result = strategy.expand(node)
- nirs4all.pipeline.config.generator.has_cartesian_keyword(node: Dict[str, Any]) bool[source]
Check if a node contains the _cartesian_ keyword.
- nirs4all.pipeline.config.generator.has_chain_keyword(node: Dict[str, Any]) bool[source]
Check if a node contains the _chain_ keyword.
- nirs4all.pipeline.config.generator.has_grid_keyword(node: Dict[str, Any]) bool[source]
Check if a node contains the _grid_ keyword.
- nirs4all.pipeline.config.generator.has_log_range_keyword(node: Dict[str, Any]) bool[source]
Check if a node contains the _log_range_ keyword.
- nirs4all.pipeline.config.generator.has_or_keyword(node: Dict[str, Any]) bool[source]
Check if a node contains the _or_ keyword.
- Parameters:
node – A dictionary node from the configuration.
- Returns:
True if the node contains _or_, False otherwise.
- nirs4all.pipeline.config.generator.has_preset(name: str) bool[source]
Check if a preset exists.
- Parameters:
name – Name to check.
- Returns:
True if preset exists.
- nirs4all.pipeline.config.generator.has_range_keyword(node: Dict[str, Any]) bool[source]
Check if a node contains the _range_ keyword.
- Parameters:
node – A dictionary node from the configuration.
- Returns:
True if the node contains _range_, False otherwise.
- nirs4all.pipeline.config.generator.has_sample_keyword(node: Dict[str, Any]) bool[source]
Check if a node contains the _sample_ keyword.
- nirs4all.pipeline.config.generator.has_zip_keyword(node: Dict[str, Any]) bool[source]
Check if a node contains the _zip_ keyword.
- nirs4all.pipeline.config.generator.import_presets(presets: Dict[str, Any], overwrite: bool = False) int[source]
Import presets from a dict.
- Parameters:
presets – Dict mapping preset names to info dicts or specs.
overwrite – If True, overwrite existing presets.
- Returns:
Number of presets imported.
- nirs4all.pipeline.config.generator.is_generator_node(node: Dict[str, Any]) bool[source]
Check if a dict node contains any generator keywords.
- Parameters:
node – A dictionary node from the configuration.
- Returns:
True if the node contains any generation keywords (_or_, _range_, etc.), False otherwise.
Examples
>>> is_generator_node({"_or_": ["A", "B"]}) True >>> is_generator_node({"class": "MyClass"}) False >>> is_generator_node({"_range_": [1, 10]}) True
- nirs4all.pipeline.config.generator.is_preset_reference(node: Any) bool[source]
Check if a node is a preset reference.
- Parameters:
node – Node to check.
- Returns:
True if node is a dict with _preset_ key.
- nirs4all.pipeline.config.generator.is_pure_cartesian_node(node: Dict[str, Any]) bool[source]
Check if a node is a pure cartesian node.
- nirs4all.pipeline.config.generator.is_pure_chain_node(node: Dict[str, Any]) bool[source]
Check if a node is a pure chain node.
- nirs4all.pipeline.config.generator.is_pure_grid_node(node: Dict[str, Any]) bool[source]
Check if a node is a pure grid node.
- nirs4all.pipeline.config.generator.is_pure_log_range_node(node: Dict[str, Any]) bool[source]
Check if a node is a pure log range node.
- nirs4all.pipeline.config.generator.is_pure_or_node(node: Dict[str, Any]) bool[source]
Check if a node is a pure OR node (only _or_, size, count keys).
- Parameters:
node – A dictionary node from the configuration.
- Returns:
True if the node contains only OR-related keys, False otherwise.
Examples
>>> is_pure_or_node({"_or_": ["A", "B"], "size": 2}) True >>> is_pure_or_node({"_or_": ["A", "B"], "class": "X"}) False
- nirs4all.pipeline.config.generator.is_pure_range_node(node: Dict[str, Any]) bool[source]
Check if a node is a pure range node (only _range_, count keys).
- Parameters:
node – A dictionary node from the configuration.
- Returns:
True if the node contains only range-related keys, False otherwise.
Examples
>>> is_pure_range_node({"_range_": [1, 10]}) True >>> is_pure_range_node({"_range_": [1, 10], "count": 5}) True >>> is_pure_range_node({"_range_": [1, 10], "size": 2}) False
- nirs4all.pipeline.config.generator.is_pure_sample_node(node: Dict[str, Any]) bool[source]
Check if a node is a pure sample node.
- nirs4all.pipeline.config.generator.is_pure_zip_node(node: Dict[str, Any]) bool[source]
Check if a node is a pure zip node.
- nirs4all.pipeline.config.generator.iter_with_progress(spec: Dict[str, Any] | List[Any] | str | int | float | bool | None, seed: int | None = None, report_every: int = 1000) Iterator[tuple][source]
Iterate with progress reporting.
- Parameters:
spec – Specification to expand.
seed – Random seed.
report_every – Report progress every N items.
- Yields:
Tuples of (index, config).
- nirs4all.pipeline.config.generator.list_presets(tags: List[str] | None = None) List[str][source]
List all registered preset names.
- Parameters:
tags – If provided, filter to presets with any of these tags.
- Returns:
List of preset names.
- nirs4all.pipeline.config.generator.parse_constraints(node: Dict[str, Any]) Dict[str, List[List[Any]]][source]
Extract constraint specifications from a node.
- Parameters:
node – Node containing constraint keywords.
- Returns:
Dict with ‘mutex’, ‘requires’, ‘exclude’ lists.
- nirs4all.pipeline.config.generator.print_expansion_tree(spec: Any, indent: str = ' ', show_counts: bool = True, max_depth: int | None = None) str[source]
Format expansion tree as a printable string.
- Parameters:
spec – Specification to visualize.
indent – Indentation string.
show_counts – Whether to show counts in output.
max_depth – Maximum depth to display.
- Returns:
Formatted tree string.
Examples
>>> spec = {"x": {"_or_": [1, 2]}, "y": {"_range_": [1, 3]}} >>> print(print_expansion_tree(spec)) root (6 variants) ├── x: _or_ (2 variants) │ ├── [0]: scalar │ └── [1]: scalar └── y: _range_ (3 variants)
- nirs4all.pipeline.config.generator.register_builtin_presets() None[source]
Register built-in preset configurations.
These are common patterns that users might want to use.
- nirs4all.pipeline.config.generator.register_preset(name: str, spec: Any, description: str | None = None, tags: List[str] | None = None, overwrite: bool = False) None[source]
Register a named preset configuration.
- Parameters:
name – Unique name for the preset.
spec – Configuration specification (dict, list, or scalar).
description – Optional human-readable description.
tags – Optional list of tags for categorization.
overwrite – If True, overwrite existing preset with same name.
- Raises:
ValueError – If preset name already exists and overwrite=False.
Examples
>>> register_preset("my_models", {"_or_": ["PLS", "RF"]}) >>> register_preset("my_models", {"_or_": ["SVM"]}, overwrite=True)
- nirs4all.pipeline.config.generator.register_strategy(strategy_cls: Type[ExpansionStrategy], priority: int | None = None) Type[ExpansionStrategy][source]
Register a strategy class.
Can be used as a decorator or called directly.
- Parameters:
strategy_cls – The strategy class to register.
priority – Optional priority override. If None, uses class priority.
- Returns:
The strategy class (for decorator usage).
Examples
>>> @register_strategy ... class MyStrategy(ExpansionStrategy): ... priority = 10 ... ...
>>> register_strategy(MyStrategy, priority=5)
- nirs4all.pipeline.config.generator.resolve_preset(node: Dict[str, Any]) Any[source]
Resolve a single preset reference.
- Parameters:
node – Dict containing _preset_ key.
- Returns:
Resolved preset specification.
- Raises:
KeyError – If referenced preset doesn’t exist.
ValueError – If _preset_ value is not a string.
- nirs4all.pipeline.config.generator.resolve_presets_recursive(node: Any, resolved: Set[str] | None = None) Any[source]
Recursively resolve all preset references in a configuration.
Handles circular reference detection.
- Parameters:
node – Configuration node (dict, list, or scalar).
resolved – Set of already-resolved presets (for cycle detection).
- Returns:
Node with all preset references resolved.
- Raises:
ValueError – If circular preset reference detected.
- nirs4all.pipeline.config.generator.sample_with_seed(population: List[T], k: int, seed: int | None = None, weights: List[float] | None = None) List[T][source]
Sample k items from population with optional seed for reproducibility.
This function wraps Python’s random sampling functions to provide deterministic behavior when a seed is specified. The function uses random.sample for unweighted sampling and random.choices for weighted sampling.
- Parameters:
population – List of items to sample from.
k – Number of items to sample.
seed – Optional random seed for reproducibility. If None, uses current random state (non-deterministic).
weights – Optional list of weights for weighted random selection. Must have the same length as population. If None, uniform sampling is used.
- Returns:
List of k sampled items from population.
- Raises:
ValueError – If k is larger than population size (for unweighted sampling).
ValueError – If weights length doesn’t match population length.
Examples
>>> sample_with_seed(["A", "B", "C", "D"], 2, seed=42) ['D', 'A'] # Deterministic result with seed=42 >>> sample_with_seed(["A", "B", "C"], 2, seed=42) ['C', 'A'] # Same seed produces same sequence >>> sample_with_seed(["A", "B", "C"], 5, seed=42) # k > len(population) ['A', 'B', 'C'] # Returns all items (capped at population size)
- nirs4all.pipeline.config.generator.summarize_configs(configs: List[Any], max_unique: int = 10) Dict[str, Any][source]
Summarize a list of configurations.
- Parameters:
configs – List of configurations to summarize.
max_unique – Maximum unique values to show per key.
- Returns:
Summary dict with statistics for each key.
Examples
>>> configs = [ ... {"model": "PLS", "n": 5}, ... {"model": "PLS", "n": 10}, ... {"model": "RF", "n": 5} ... ] >>> summary = summarize_configs(configs) >>> summary["model"]["unique_values"] ['PLS', 'RF']
- nirs4all.pipeline.config.generator.to_dataframe(configs: List[Any], flatten: bool = True, prefix_sep: str = '.', include_index: bool = True) Any[source]
Convert expanded configurations to a pandas DataFrame.
- Parameters:
configs – List of expanded configurations.
flatten – If True, flatten nested dicts with dot notation.
prefix_sep – Separator for flattened keys (default “.”).
include_index – If True, include a config index column.
- Returns:
pandas DataFrame with one row per configuration.
- Raises:
ImportError – If pandas is not installed.
Examples
>>> configs = [ ... {"model": "PLS", "n_components": 5}, ... {"model": "PLS", "n_components": 10}, ... {"model": "RF", "n_estimators": 100} ... ] >>> df = to_dataframe(configs) >>> df.columns.tolist() ['config_index', 'model', 'n_components', 'n_estimators']
- nirs4all.pipeline.config.generator.unregister_preset(name: str) bool[source]
Remove a preset from the registry.
- Parameters:
name – Name of preset to remove.
- Returns:
True if preset was removed, False if it didn’t exist.
- nirs4all.pipeline.config.generator.validate_config(config: Any, schema: Dict[str, Any] | None = None, required_keys: Set[str] | None = None, forbidden_keys: Set[str] | None = None, path: str = 'root') ValidationResult[source]
Validate an expanded configuration.
This validates configurations after expansion, checking for structural correctness and optionally against a schema.
- Parameters:
config – The expanded configuration to validate.
schema – Optional schema definition for validation.
required_keys – Optional set of keys that must be present.
forbidden_keys – Optional set of keys that must not be present.
path – JSONPath-like location for error reporting.
- Returns:
ValidationResult containing validation outcome.
Examples
>>> config = {"class": "MyClass", "params": {"n": 5}} >>> result = validate_config(config, required_keys={"class"}) >>> result.is_valid True
- nirs4all.pipeline.config.generator.validate_constraints(constraints: Dict[str, List[List[Any]]], choices: List[Any]) List[str][source]
Validate constraint specifications against available choices.
- Parameters:
constraints – Constraint dict from parse_constraints.
choices – Available choice items.
- Returns:
List of validation error messages.
- nirs4all.pipeline.config.generator.validate_expanded_configs(configs: List[Any], schema: Dict[str, Any] | None = None, min_count: int = 0, max_count: int | None = None) ValidationResult[source]
Validate a list of expanded configurations.
- Parameters:
configs – List of expanded configurations.
schema – Optional schema for each configuration.
min_count – Minimum number of configurations required.
max_count – Maximum number of configurations allowed.
- Returns:
ValidationResult for the entire list.
- nirs4all.pipeline.config.generator.validate_spec(spec: Any, path: str = 'root', strict: bool = False, custom_validators: List[Callable] | None = None) ValidationResult[source]
Validate a generator specification before expansion.
Recursively validates the structure of a generator specification, checking for valid syntax, consistent keyword usage, and semantic correctness.
- Parameters:
spec – The specification to validate (can be any type).
path – JSONPath-like location for error reporting.
strict – If True, also report warnings as errors.
custom_validators – Optional list of custom validation functions. Each function should accept (node, path) and return ValidationResult.
- Returns:
ValidationResult containing validation outcome.
Examples
>>> result = validate_spec({"_or_": ["A", "B"]}) >>> result.is_valid True
>>> result = validate_spec({"_or_": "not a list"}) >>> result.is_valid False >>> result.errors[0].message "_or_ must be a list, got str"