# Generator Keywords Reference This document provides a comprehensive reference for all generator keywords used in nirs4all pipeline configuration expansion. ## Table of Contents 1. [Overview](#overview) 2. [Phase 1-2: Core Keywords](#phase-1-2-core-keywords) - [_or_](#_or_) - [_range_](#_range_) - [size](#size) - [pick](#pick) - [arrange](#arrange) - [then_pick](#then_pick) - [then_arrange](#then_arrange) - [count](#count) 3. [Phase 3: Advanced Keywords](#phase-3-advanced-keywords) - [_log_range_](#_log_range_) - [_grid_](#_grid_) - [_zip_](#_zip_) - [_chain_](#_chain_) - [_sample_](#_sample_) - [_tags_](#_tags_) - [_metadata_](#_metadata_) 4. [Phase 4: Production Keywords](#phase-4-production-keywords) - [_cartesian_](#_cartesian_) - [_mutex_](#_mutex_) - [_requires_](#_requires_) - [_depends_on_](#_depends_on_) - [_exclude_](#_exclude_) - [_preset_](#_preset_) 5. [Modifier Keywords](#modifier-keywords) - [_seed_](#_seed_) - [_weights_](#_weights_) 6. [API Functions](#api-functions) 7. [Selection Semantics: pick vs arrange](#selection-semantics-pick-vs-arrange) 8. [Common Patterns and Examples](#common-patterns-and-examples) --- ## Overview The generator module expands pipeline configuration specifications into concrete pipeline variants. It takes a single configuration with combinatorial keywords and generates all possible combinations. ### Basic Import ```python from nirs4all.pipeline.config.generator import ( # Core API expand_spec, expand_spec_with_choices, count_combinations, # Iterator API expand_spec_iter, batch_iter, iter_with_progress, # Validation validate_spec, validate_config, validate_expanded_configs, # Presets PRESET_KEYWORD, register_preset, unregister_preset, get_preset, get_preset_info, list_presets, clear_presets, has_preset, is_preset_reference, resolve_preset, resolve_presets_recursive, export_presets, import_presets, register_builtin_presets, # Constraints apply_mutex_constraint, apply_requires_constraint, apply_exclude_constraint, apply_all_constraints, parse_constraints, validate_constraints, # Export utilities to_dataframe, diff_configs, summarize_configs, get_expansion_tree, print_expansion_tree, format_config_table, ExpansionTreeNode, # Keyword constants OR_KEYWORD, RANGE_KEYWORD, LOG_RANGE_KEYWORD, GRID_KEYWORD, ZIP_KEYWORD, CHAIN_KEYWORD, SAMPLE_KEYWORD, CARTESIAN_KEYWORD, SIZE_KEYWORD, COUNT_KEYWORD, SEED_KEYWORD, WEIGHTS_KEYWORD, PICK_KEYWORD, ARRANGE_KEYWORD, THEN_PICK_KEYWORD, THEN_ARRANGE_KEYWORD, TAGS_KEYWORD, METADATA_KEYWORD, MUTEX_KEYWORD, REQUIRES_KEYWORD, DEPENDS_ON_KEYWORD, EXCLUDE_KEYWORD, # Detection functions is_generator_node, is_pure_or_node, is_pure_range_node, is_pure_log_range_node, is_pure_grid_node, is_pure_zip_node, is_pure_chain_node, is_pure_sample_node, is_pure_cartesian_node, # Extraction functions extract_modifiers, extract_base_node, extract_or_choices, extract_range_spec, extract_tags, extract_metadata, extract_constraints, # Strategies (advanced usage) ExpansionStrategy, get_strategy, register_strategy, RangeStrategy, OrStrategy, LogRangeStrategy, GridStrategy, ZipStrategy, ChainStrategy, SampleStrategy, CartesianStrategy, ) ``` --- ## Phase 1-2: Core Keywords ### `_or_` Select from a list of alternatives. Each choice becomes a separate configuration variant. **Syntax:** ```python {"_or_": [choice1, choice2, ...]} ``` **Examples:** ```python # Simple string choices {"_or_": ["StandardScaler", "MinMaxScaler", "RobustScaler"]} # → ["StandardScaler", "MinMaxScaler", "RobustScaler"] # Dictionary choices {"_or_": [ {"class": "PCA", "n_components": 10}, {"class": "SVD", "n_components": 10}, ]} # → [{"class": "PCA", "n_components": 10}, {"class": "SVD", "n_components": 10}] # Mixed types {"_or_": [None, 5, {"window": 11}]} # → [None, 5, {"window": 11}] ``` **Modifiers:** `size`, `pick`, `arrange`, `then_pick`, `then_arrange`, `count` --- ### `_range_` Generate a sequence of numeric values. **Syntax:** ```python # Array syntax {"_range_": [start, end]} # Inclusive, step=1 {"_range_": [start, end, step]} # With custom step # Dict syntax {"_range_": {"from": start, "to": end, "step": step}} ``` **Examples:** ```python {"_range_": [1, 5]} # → [1, 2, 3, 4, 5] {"_range_": [0, 20, 5]} # → [0, 5, 10, 15, 20] {"_range_": {"from": 10, "to": 50, "step": 10}} # → [10, 20, 30, 40, 50] ``` --- ### `size` **(Legacy)** Select combinations of N items from `_or_` choices. Equivalent to `pick`. **Syntax:** ```python {"_or_": [...], "size": n} # Fixed size {"_or_": [...], "size": (min, max)} # Range of sizes {"_or_": [...], "size": [outer, inner]} # Second-order (nested) ``` **Examples:** ```python # Select 2 from 4 items → C(4,2) = 6 combinations {"_or_": ["A", "B", "C", "D"], "size": 2} # → [["A", "B"], ["A", "C"], ["A", "D"], ["B", "C"], ["B", "D"], ["C", "D"]] # Size range {"_or_": ["A", "B", "C"], "size": (1, 2)} # → [["A"], ["B"], ["C"], ["A", "B"], ["A", "C"], ["B", "C"]] ``` --- ### `pick` **(Explicit)** Unordered selection - combinations where order doesn't matter. **Syntax:** ```python {"_or_": [...], "pick": n} # Fixed size {"_or_": [...], "pick": (min, max)} # Range of sizes ``` **Mathematical formula:** C(n, k) = n! / (k! × (n-k)!) **Examples:** ```python # Pick 2 from 3 → C(3,2) = 3 {"_or_": ["A", "B", "C"], "pick": 2} # → [["A", "B"], ["A", "C"], ["B", "C"]] ``` **Use cases:** - `concat_transform` where feature order doesn't matter - `feature_augmentation` for parallel channels - Any scenario where [A, B] and [B, A] should be treated as equivalent --- ### `arrange` **(Explicit)** Ordered arrangement - permutations where order matters. **Syntax:** ```python {"_or_": [...], "arrange": n} # Fixed size {"_or_": [...], "arrange": (min, max)} # Range of sizes ``` **Mathematical formula:** P(n, k) = n! / (n-k)! **Examples:** ```python # Arrange 2 from 3 → P(3,2) = 6 {"_or_": ["A", "B", "C"], "arrange": 2} # → [["A", "B"], ["A", "C"], ["B", "A"], ["B", "C"], ["C", "A"], ["C", "B"]] ``` **Use cases:** - Sequential preprocessing pipelines - Any scenario where order of operations affects results - When [A, B] and [B, A] should be treated as different configurations --- ### `then_pick` Second-order operation: apply combinations to the results of a primary selection. **Syntax:** ```python {"_or_": [...], "pick": n1, "then_pick": n2} {"_or_": [...], "arrange": n1, "then_pick": n2} ``` **Example:** ```python # Pick 2, then pick 2 from those 3 results {"_or_": ["A", "B", "C"], "pick": 2, "then_pick": 2} # Step 1: pick=2 → C(3,2) = 3 combos: [A,B], [A,C], [B,C] # Step 2: then_pick=2 → C(3,2) = 3 selections of those combos ``` --- ### `then_arrange` Second-order operation: apply permutations to the results of a primary selection. **Syntax:** ```python {"_or_": [...], "pick": n1, "then_arrange": n2} {"_or_": [...], "arrange": n1, "then_arrange": n2} ``` **Example:** ```python # Pick 2, then arrange 2 from those results {"_or_": ["A", "B", "C"], "pick": 2, "then_arrange": 2} # Step 1: pick=2 → 3 combos: [A,B], [A,C], [B,C] # Step 2: then_arrange=2 → P(3,2) = 6 arrangements ``` --- ### `count` Limit the number of results returned. With a seed, results are deterministic. **Syntax:** ```python {"_or_": [...], "count": n} {"_or_": [...], "size": k, "count": n} ``` **Example:** ```python # Get 2 random items from 5 {"_or_": ["A", "B", "C", "D", "E"], "count": 2} # → 2 randomly selected items # With seed for reproducibility expand_spec({"_or_": ["A", "B", "C", "D", "E"], "count": 2}, seed=42) # → Same 2 items every time with seed=42 ``` --- ## Phase 3: Advanced Keywords ### `_log_range_` Generate logarithmically-spaced numeric sequences. Useful for hyperparameter optimization over values spanning multiple orders of magnitude. **Syntax:** ```python # Array syntax: [from, to, num_values] {"_log_range_": [start, end, num]} # Dict syntax {"_log_range_": {"from": start, "to": end, "num": n}} {"_log_range_": {"from": start, "to": end, "base": b}} # Custom base ``` **Examples:** ```python # 4 values from 0.001 to 1 (base 10) {"_log_range_": [0.001, 1, 4]} # → [0.001, 0.01, 0.1, 1.0] # Learning rate search {"_log_range_": [0.0001, 0.1, 5]} # → [0.0001, 0.001, 0.01, 0.1, 1.0] (approximately) # Base 2 powers {"_log_range_": {"from": 1, "to": 256, "num": 9, "base": 2}} # → [1, 2, 4, 8, 16, 32, 64, 128, 256] ``` --- ### `_grid_` Generate Cartesian product of parameter spaces. Similar to sklearn's `ParameterGrid`. **Syntax:** ```python {"_grid_": {"param1": [v1, v2, ...], "param2": [v3, v4, ...]}} ``` **Examples:** ```python {"_grid_": {"learning_rate": [0.01, 0.1], "batch_size": [16, 32, 64]}} # → 2 × 3 = 6 configurations: # [{"learning_rate": 0.01, "batch_size": 16}, # {"learning_rate": 0.01, "batch_size": 32}, # {"learning_rate": 0.01, "batch_size": 64}, # {"learning_rate": 0.1, "batch_size": 16}, # {"learning_rate": 0.1, "batch_size": 32}, # {"learning_rate": 0.1, "batch_size": 64}] ``` --- ### `_zip_` Parallel iteration - pair values at the same index (like Python's `zip`). **Syntax:** ```python {"_zip_": {"param1": [v1, v2, ...], "param2": [v3, v4, ...]}} ``` **Examples:** ```python {"_zip_": {"x": [1, 2, 3], "y": ["A", "B", "C"]}} # → 3 configurations (paired by position): # [{"x": 1, "y": "A"}, {"x": 2, "y": "B"}, {"x": 3, "y": "C"}] ``` **Comparison with `_grid_`:** ```python # _zip_ pairs by position {"_zip_": {"x": [1, 2], "y": ["A", "B"]}} # → [{"x": 1, "y": "A"}, {"x": 2, "y": "B"}] # _grid_ generates all combinations {"_grid_": {"x": [1, 2], "y": ["A", "B"]}} # → [{"x": 1, "y": "A"}, {"x": 1, "y": "B"}, {"x": 2, "y": "A"}, {"x": 2, "y": "B"}] ``` --- ### `_chain_` Sequential ordered choices. Preserves order (unlike `_or_` which may be randomized). **Syntax:** ```python {"_chain_": [config1, config2, config3, ...]} ``` **Examples:** ```python {"_chain_": [ {"model": "baseline", "complexity": "low"}, {"model": "improved", "complexity": "medium"}, {"model": "best", "complexity": "high"} ]} # → Configurations in that exact order ``` **Use cases:** - Progressive experiments: baseline → improved → best - When configuration order has meaning --- ### `_sample_` Statistical sampling from various distributions. **Syntax:** ```python {"_sample_": {"distribution": "uniform|log_uniform|normal|choice", ...}} ``` **Distributions:** | Distribution | Parameters | Description | |-------------|------------|-------------| | `uniform` | `from`, `to`, `num` | Uniform distribution between from and to | | `log_uniform` | `from`, `to`, `num` | Log-uniform (common for learning rates) | | `normal`/`gaussian` | `mean`, `std`, `num` | Normal distribution | | `choice` | `values`, `num` | Random selection from list | **Examples:** ```python # Uniform sampling {"_sample_": {"distribution": "uniform", "from": 0.1, "to": 1.0, "num": 5}} # → 5 random values uniformly distributed between 0.1 and 1.0 # Log-uniform (learning rate search) {"_sample_": {"distribution": "log_uniform", "from": 0.0001, "to": 0.1, "num": 5}} # → 5 values with log-uniform distribution # Normal distribution {"_sample_": {"distribution": "normal", "mean": 0, "std": 1, "num": 5}} # → 5 values from standard normal distribution # Random choice {"_sample_": {"distribution": "choice", "values": ["A", "B", "C", "D"], "num": 3}} # → 3 randomly selected values (with replacement) ``` --- ### `_tags_` Add tags to configurations for filtering and categorization. **Syntax:** ```python {"_or_": [...], "_tags_": ["tag1", "tag2"]} ``` --- ### `_metadata_` Attach arbitrary metadata to configurations. **Syntax:** ```python {"_or_": [...], "_metadata_": {"key": "value", ...}} ``` --- ## Phase 4: Production Keywords ### `_cartesian_` Generate the Cartesian product of multiple stages (each with `_or_` choices), then apply pick/arrange selection on the resulting complete pipelines. This is the key pattern for preprocessing pipeline generation. **Syntax:** ```python {"_cartesian_": [stage1, stage2, ...]} {"_cartesian_": [stage1, stage2, ...], "pick": N} {"_cartesian_": [stage1, stage2, ...], "arrange": N} ``` **Examples:** ```python # Generate all pipeline combinations (3×3×3 = 27), then pick 2 {"_cartesian_": [ {"_or_": ["MSC", "SNV", "EMSC"]}, {"_or_": ["SavGol", "Gaussian", None]}, {"_or_": [None, "Deriv1", "Deriv2"]} ], "pick": 2} # → All 2-combinations of the 27 complete pipelines # Pick 1-3 complete pipelines with count limit {"_cartesian_": [ {"_or_": ["A", "B"]}, {"_or_": ["X", "Y"]} ], "pick": (1, 3), "count": 20} ``` **Difference from `_grid_`:** - `_grid_` produces dicts (parameter combinations) - `_cartesian_` produces lists (ordered stages), ideal for preprocessing pipelines **Use cases:** - Preprocessing pipeline generation - Any staged pipeline where order matters - When you want to select from complete pipeline variants --- ### `_mutex_` Mutual exclusion constraint - certain items cannot appear together. **Syntax:** ```python {"_or_": [...], "pick": n, "_mutex_": [[item1, item2], [item3, item4]]} ``` **Example:** ```python # A and B cannot appear together {"_or_": ["A", "B", "C", "D"], "pick": 2, "_mutex_": [["A", "B"]]} # All combinations: [A,B], [A,C], [A,D], [B,C], [B,D], [C,D] # After _mutex_: [A,C], [A,D], [B,C], [B,D], [C,D] (A,B excluded) ``` --- ### `_requires_` Dependency constraint - if item A is selected, item B must also be selected. **Syntax:** ```python {"_or_": [...], "pick": n, "_requires_": [[trigger, required1, required2]]} ``` **Example:** ```python # If A is selected, C must also be selected {"_or_": ["A", "B", "C", "D"], "pick": 2, "_requires_": [["A", "C"]]} # Valid: [A,C], [B,C], [B,D], [C,D] # Invalid: [A,B], [A,D] (A without C) ``` --- ### `_depends_on_` Conditional expansion - expansion depends on the value of another parameter. **Syntax:** ```python {"_or_": [...], "_depends_on_": "other_param"} ``` **Use cases:** - Conditional hyperparameter spaces - Parameters that only apply when another parameter has a certain value --- ### `_exclude_` Exclude specific combinations from results. **Syntax:** ```python {"_or_": [...], "pick": n, "_exclude_": [[combo1], [combo2]]} ``` **Example:** ```python # Exclude specific combinations [A,C] and [B,D] {"_or_": ["A", "B", "C", "D"], "pick": 2, "_exclude_": [["A", "C"], ["B", "D"]]} # Remaining: [A,B], [A,D], [B,C], [C,D] ``` --- ### `_preset_` Reference a named preset configuration. **Syntax:** ```python {"_preset_": "preset_name"} ``` **Usage:** ```python from nirs4all.pipeline.config.generator import register_preset, resolve_presets_recursive # Register presets register_preset( "spectral_transforms", {"_or_": ["SNV", "MSC", "Detrend"], "pick": (1, 2)}, description="Common spectral preprocessing" ) register_preset( "pls_components", {"_range_": [2, 15]} ) # Use in configuration config = { "transforms": {"_preset_": "spectral_transforms"}, "model": { "class": "PLSRegression", "n_components": {"_preset_": "pls_components"} } } # Resolve presets before expansion resolved = resolve_presets_recursive(config) results = expand_spec(resolved) ``` --- ## Modifier Keywords ### `_seed_` Provide a deterministic seed for random operations within a node. This ensures reproducible generation when using `count` or random sampling. **Syntax:** ```python {"_or_": [...], "count": N, "_seed_": 42} {"_sample_": {...}, "_seed_": 42} ``` **Examples:** ```python # Reproducible random selection {"_or_": ["A", "B", "C", "D", "E"], "count": 2, "_seed_": 42} # → Same 2 items every time # Reproducible sampling {"_sample_": {"distribution": "uniform", "from": 0, "to": 1, "num": 5}, "_seed_": 123} # → Same 5 values every time ``` --- ### `_weights_` Provide weights for weighted random selection when using `count`. **Syntax:** ```python {"_or_": [...], "count": N, "_weights_": [w1, w2, ...]} ``` **Examples:** ```python # Weighted random selection (A is 3x more likely than others) {"_or_": ["A", "B", "C", "D"], "count": 2, "_weights_": [3, 1, 1, 1]} ``` --- ## API Functions ### Core Functions ```python # Expand a specification to all variants results = expand_spec(spec, seed=None) # Expand with choice tracking (returns configs and choice paths) results, choices = expand_spec_with_choices(spec, seed=None) # Count variants without generating count = count_combinations(spec) ``` ### Iterator Functions ```python # Lazy iteration for large spaces for config in expand_spec_iter(spec, seed=None): process(config) # With sampling (uses reservoir sampling for uniform distribution) configs = list(expand_spec_iter(spec, seed=42, sample_size=100)) # Batch processing for batch in batch_iter(spec, batch_size=10): process_batch(batch) # With progress reporting for i, config in iter_with_progress(spec, report_every=1000): process(config) ``` ### Preset Functions ```python # Register a preset register_preset(name, spec, description=None, tags=None, overwrite=False) # Get preset specification spec = get_preset(name) # Get preset info (spec, description, tags) info = get_preset_info(name) # List and manage presets names = list_presets(tags=None) # Filter by tags optionally has_preset(name) unregister_preset(name) clear_presets() # Resolve presets in a config (handles circular reference detection) resolved = resolve_presets_recursive(config) # Check if a node is a preset reference is_preset_reference(node) # Export/import presets presets_dict = export_presets() count = import_presets(presets_dict, overwrite=False) # Register built-in presets (standard_scalers, pls_components, learning_rates) register_builtin_presets() ``` ### Constraint Functions ```python # Apply individual constraints filtered = apply_mutex_constraint(results, mutex_groups) filtered = apply_requires_constraint(results, requires_groups) filtered = apply_exclude_constraint(results, exclude_combos) # Apply all constraints at once filtered = apply_all_constraints(results, mutex_groups, requires_groups, exclude_combos) # Parse and validate constraints parsed = parse_constraints(constraint_spec) errors = validate_constraints(constraint_spec) ``` ### Export Functions ```python # Convert to pandas DataFrame df = to_dataframe(configs, flatten=True, prefix_sep=".", include_index=True) # Compare configurations diff = diff_configs(config1, config2) # Summary statistics summary = summarize_configs(configs, max_unique=10) # Tree visualization tree_str = print_expansion_tree(spec, indent=" ", show_counts=True, max_depth=None) tree_node = get_expansion_tree(spec) # ASCII table formatting table_str = format_config_table(configs, columns=None, max_rows=20) ``` ### Validation Functions ```python # Validate a specification result = validate_spec(spec) if not result.is_valid: print(result.errors) # Validate a config dict result = validate_config(config, schema=None) # Validate expanded configs results = validate_expanded_configs(configs, schema=None) ``` ### Detection Functions ```python # Check if a node contains any generator keywords is_generator_node(node) # True if has _or_, _range_, etc. # Check for specific node types is_pure_or_node(node) # Only OR-related keys is_pure_range_node(node) # Only range-related keys is_pure_log_range_node(node) is_pure_grid_node(node) is_pure_zip_node(node) is_pure_chain_node(node) is_pure_sample_node(node) is_pure_cartesian_node(node) # Check for specific keywords has_or_keyword(node) has_range_keyword(node) has_log_range_keyword(node) has_grid_keyword(node) has_zip_keyword(node) has_chain_keyword(node) has_sample_keyword(node) has_cartesian_keyword(node) ``` ### Extraction Functions ```python # Extract modifiers (size, count, pick, arrange, etc.) modifiers = extract_modifiers(node) # Extract non-keyword keys base = extract_base_node(node) # Extract specific elements choices = extract_or_choices(node) # From _or_ node range_spec = extract_range_spec(node) # From _range_ node tags = extract_tags(node) # From _tags_ metadata = extract_metadata(node) # From _metadata_ constraints = extract_constraints(node) # From _mutex_, _requires_, etc. ``` --- ## Selection Semantics: pick vs arrange | Aspect | `pick` (Combinations) | `arrange` (Permutations) | |--------|----------------------|--------------------------| | Order matters? | No | Yes | | [A, B] vs [B, A] | Same | Different | | Formula | C(n,k) = n!/(k!(n-k)!) | P(n,k) = n!/(n-k)! | | Count for 3 choose 2 | 3 | 6 | | Use case | Feature sets | Processing pipelines | **When to use `pick`:** - `concat_transform` where feature order doesn't matter - `feature_augmentation` for parallel channels - Any unordered collection **When to use `arrange`:** - Sequential preprocessing steps - When operation order affects results - Pipeline stages with dependencies --- ## Common Patterns and Examples ### 1. Hyperparameter Grid Search ```python { "_grid_": { "model": ["PLS", "RF", "SVR"], "n_components": {"_range_": [5, 20, 5]}, "preprocessing": ["StandardScaler", "MinMaxScaler", None] } } ``` ### 2. Learning Rate Search ```python { "optimizer": "Adam", "learning_rate": {"_log_range_": [0.0001, 0.1, 10]}, "batch_size": {"_or_": [16, 32, 64, 128]} } ``` ### 3. Preprocessing Pipeline Combinations ```python { "feature_augmentation": { "_or_": [ {"class": "SNV"}, {"class": "MSC"}, {"class": "Detrend", "order": {"_or_": [1, 2]}}, {"class": "SavitzkyGolay", "window": {"_or_": [5, 11, 21]}} ], "pick": (1, 3) # 1 to 3 transforms } } ``` ### 4. Constrained Combinations ```python { "_or_": ["PCA", "ICA", "NMF", "UMAP"], "pick": 2, "_mutex_": [["PCA", "ICA"]], # PCA and ICA can't be together "_requires_": [["UMAP", "NMF"]] # If UMAP selected, NMF required } ``` ### 5. Progressive Experiments with Chain ```python { "_chain_": [ {"model": "baseline", "transforms": []}, {"model": "baseline", "transforms": ["SNV"]}, {"model": "improved", "transforms": ["SNV", "Detrend"]}, {"model": "best", "transforms": ["SNV", "Detrend", "SavGol"]} ] } ``` ### 6. Using Presets for Reusable Patterns ```python # Define presets register_preset("standard_preprocessing", { "_or_": [ {"class": "StandardScaler"}, {"class": "MinMaxScaler"}, None ] }) register_preset("pls_search", { "_grid_": { "class": ["PLSRegression"], "n_components": {"_range_": [2, 20]} } }) # Use in pipeline config = [ {"preprocessing": {"_preset_": "standard_preprocessing"}}, {"model": {"_preset_": "pls_search"}} ] ``` ### 7. Memory-Efficient Large Space Processing ```python from itertools import islice large_spec = { "_grid_": { "param1": {"_range_": [1, 100]}, "param2": {"_range_": [1, 100]}, "param3": {"_range_": [1, 100]} } } # Don't do this! (1M configurations in memory) # all_configs = expand_spec(large_spec) # Do this instead (lazy iteration) for config in expand_spec_iter(large_spec): process(config) # Or sample sample = list(expand_spec_iter(large_spec, seed=42, sample_size=1000)) ``` ### 8. Preprocessing Pipeline with Cartesian ```python # Generate all stage combinations, then select complete pipelines { "_cartesian_": [ # Stage 1: Scatter correction {"_or_": ["MSC", "SNV", "EMSC", None]}, # Stage 2: Smoothing {"_or_": [ {"class": "SavitzkyGolay", "window": 11}, {"class": "Gaussian", "sigma": 2}, None ]}, # Stage 3: Derivative {"_or_": [ {"class": "FirstDerivative"}, {"class": "SecondDerivative"}, None ]} ], "pick": (1, 3), # Select 1-3 complete pipelines "count": 50 # Limit to 50 variants } ``` ### 9. Reproducible Random Search ```python # Use _seed_ for reproducible random selection { "_or_": [ {"class": "PLS", "n_components": {"_range_": [2, 20]}}, {"class": "RF", "n_estimators": {"_or_": [100, 200, 500]}}, {"class": "SVR", "C": {"_log_range_": [0.1, 100, 10]}} ], "count": 10, "_seed_": 42 # Same 10 configs every time } ``` --- ## See Also - {doc}`/examples/index` - Working examples organized by topic - {doc}`/reference/pipeline_syntax` - Pipeline syntax reference - {doc}`/reference/combination_generator` - Combination generator syntax --- *Document updated: December 27, 2025* *Version: Phase 4+ Complete*