nirs4all package
Subpackages
- nirs4all.analysis package
- Submodules
- nirs4all.analysis.presets module
- nirs4all.analysis.results module
- nirs4all.analysis.selector module
- nirs4all.analysis.transfer_metrics module
- nirs4all.analysis.transfer_utils module
apply_augmentation()apply_pipeline()apply_preprocessing_objects()apply_single_preprocessing()apply_stacked_pipeline()format_pipeline_name()generate_augmentation_combinations()generate_object_augmentation_combinations()generate_object_stacked_pipelines()generate_stacked_pipelines()generate_top_k_stacked_pipelines()get_base_preprocessings()get_transform_name()get_transform_signature()normalize_preprocessing()normalize_preprocessing_list()validate_datasets()
- Module contents
TransferMetricsTransferMetrics.centroid_distanceTransferMetrics.cka_similarityTransferMetrics.evr_sourceTransferMetrics.evr_targetTransferMetrics.grassmann_distanceTransferMetrics.procrustes_disparityTransferMetrics.rv_coefficientTransferMetrics.spread_distanceTransferMetrics.to_dict()TransferMetrics.trustworthiness
TransferMetricsComputerTransferPreprocessingSelectorTransferResultTransferResult.nameTransferResult.pipeline_typeTransferResult.componentsTransferResult.transfer_scoreTransferResult.metricsTransferResult.improvement_pctTransferResult.signal_scoreTransferResult.transformsTransferResult.__post_init__()TransferResult.componentsTransferResult.get_transforms()TransferResult.improvement_pctTransferResult.metricsTransferResult.nameTransferResult.pipeline_typeTransferResult.signal_scoreTransferResult.to_dict()TransferResult.transfer_scoreTransferResult.transforms
TransferSelectionResultsTransferSelectionResults.rankingTransferSelectionResults.raw_metricsTransferSelectionResults.timingTransferSelectionResults.bestTransferSelectionResults.plot_improvement_heatmap()TransferSelectionResults.plot_metrics_comparison()TransferSelectionResults.plot_ranking()TransferSelectionResults.rankingTransferSelectionResults.raw_metricsTransferSelectionResults.summary()TransferSelectionResults.timingTransferSelectionResults.to_dataframe()TransferSelectionResults.to_pipeline_spec()TransferSelectionResults.to_preprocessing_list()TransferSelectionResults.top_k()
apply_augmentation()apply_pipeline()apply_preprocessing_objects()apply_single_preprocessing()apply_stacked_pipeline()compute_transfer_score()format_pipeline_name()generate_augmentation_combinations()generate_object_augmentation_combinations()generate_object_stacked_pipelines()generate_stacked_pipelines()generate_top_k_stacked_pipelines()get_base_preprocessings()get_preset()get_transform_name()get_transform_signature()list_presets()normalize_preprocessing()normalize_preprocessing_list()validate_datasets()
- Submodules
- nirs4all.api package
- Submodules
- Module contents
ExplainResultExplainResult.shap_valuesExplainResult.feature_namesExplainResult.base_valueExplainResult.visualizationsExplainResult.explainer_typeExplainResult.model_nameExplainResult.n_samplesExplainResult.get_feature_importance()ExplainResult.get_sample_explanation()ExplainResult.to_dataframe()ExplainResult.__post_init__()ExplainResult.__repr__()ExplainResult.__str__()ExplainResult.base_valueExplainResult.explainer_typeExplainResult.feature_namesExplainResult.get_feature_importance()ExplainResult.get_sample_explanation()ExplainResult.mean_abs_shapExplainResult.model_nameExplainResult.n_samplesExplainResult.shap_valuesExplainResult.shapeExplainResult.to_dataframe()ExplainResult.top_featuresExplainResult.valuesExplainResult.visualizations
PredictResultPredictResult.y_predPredictResult.metadataPredictResult.sample_indicesPredictResult.model_namePredictResult.preprocessing_stepsPredictResult.to_numpy()PredictResult.to_list()PredictResult.to_dataframe()PredictResult.flatten()PredictResult.__len__()PredictResult.__post_init__()PredictResult.__repr__()PredictResult.__str__()PredictResult.flatten()PredictResult.is_multioutputPredictResult.metadataPredictResult.model_namePredictResult.preprocessing_stepsPredictResult.sample_indicesPredictResult.shapePredictResult.to_dataframe()PredictResult.to_list()PredictResult.to_numpy()PredictResult.valuesPredictResult.y_pred
RunResultRunResult.predictionsRunResult.per_datasetRunResult.top()RunResult.export()RunResult.filter()RunResult.get_datasets()RunResult.get_models()RunResult.__repr__()RunResult.__str__()RunResult.artifacts_pathRunResult.bestRunResult.best_accuracyRunResult.best_r2RunResult.best_rmseRunResult.best_scoreRunResult.export()RunResult.export_model()RunResult.filter()RunResult.get_datasets()RunResult.get_models()RunResult.num_predictionsRunResult.per_datasetRunResult.predictionsRunResult.summary()RunResult.top()RunResult.validate()
SessionSession.nameSession.pipelineSession.statusSession.is_trainedSession.runnerSession.workspace_pathSession.__enter__()Session.__exit__()Session.__repr__()Session.close()Session.historySession.is_trainedSession.nameSession.pipelineSession.predict()Session.retrain()Session.run()Session.runnerSession.save()Session.statusSession.workspace_path
explain()load_session()predict()retrain()run()session()
- nirs4all.cli package
- nirs4all.config package
- nirs4all.controllers package
- Subpackages
- Submodules
- Module contents
AugmentationChartControllerAutoTransferPreprocessingControllerBaseControllerBranchControllerConcatAugmentationControllerCrossValidatorControllerDummyControllerFeatureAugmentationControllerFoldChartControllerJaxModelControllerOperatorControllerPyTorchModelControllerResamplerControllerSampleAugmentationControllerSampleFilterControllerSklearnModelControllerSpectraChartControllerSpectralDistributionControllerTensorFlowModelControllerTransformerMixinControllerYChartControllerYTransformerMixinControllerregister_controller()
- nirs4all.core package
- nirs4all.data package
- Subpackages
- nirs4all.data.aggregation package
- nirs4all.data.detection package
- nirs4all.data.loaders package
- nirs4all.data.parsers package
- nirs4all.data.partition package
- nirs4all.data.performance package
- nirs4all.data.schema package
- nirs4all.data.selection package
- nirs4all.data.serialization package
- nirs4all.data.synthetic package
- Submodules
- nirs4all.data.binning module
- nirs4all.data.config module
- nirs4all.data.config_parser module
- nirs4all.data.dataset module
- nirs4all.data.ensemble_utils module
- nirs4all.data.features module
- nirs4all.data.indexer module
- nirs4all.data.io module
- nirs4all.data.metadata module
- nirs4all.data.predictions module
- nirs4all.data.signal_type module
- nirs4all.data.targets module
- nirs4all.data.types module
- Module contents
ColumnConfigColumnSelectionErrorColumnSelectorConfigNormalizerConfigValidatorDatasetConfigSchemaDatasetConfigSchema.aggregateDatasetConfigSchema.aggregate_exclude_outliersDatasetConfigSchema.aggregate_methodDatasetConfigSchema.descriptionDatasetConfigSchema.filesDatasetConfigSchema.foldsDatasetConfigSchema.from_dict()DatasetConfigSchema.get_effective_params()DatasetConfigSchema.get_selected_variations()DatasetConfigSchema.get_source_count()DatasetConfigSchema.get_source_names()DatasetConfigSchema.get_variation_count()DatasetConfigSchema.get_variation_names()DatasetConfigSchema.global_paramsDatasetConfigSchema.is_files_format()DatasetConfigSchema.is_legacy_format()DatasetConfigSchema.is_multi_source()DatasetConfigSchema.is_sources_format()DatasetConfigSchema.is_variations_format()DatasetConfigSchema.model_configDatasetConfigSchema.nameDatasetConfigSchema.normalize_aggregate_method()DatasetConfigSchema.normalize_task_type()DatasetConfigSchema.normalize_variation_mode()DatasetConfigSchema.parse_loading_params()DatasetConfigSchema.parse_shared_metadata()DatasetConfigSchema.parse_shared_targets()DatasetConfigSchema.parse_sources()DatasetConfigSchema.parse_variations()DatasetConfigSchema.shared_metadataDatasetConfigSchema.shared_targetsDatasetConfigSchema.sourcesDatasetConfigSchema.task_typeDatasetConfigSchema.test_groupDatasetConfigSchema.test_group_filterDatasetConfigSchema.test_group_paramsDatasetConfigSchema.test_paramsDatasetConfigSchema.test_xDatasetConfigSchema.test_x_filterDatasetConfigSchema.test_x_paramsDatasetConfigSchema.test_yDatasetConfigSchema.test_y_filterDatasetConfigSchema.test_y_paramsDatasetConfigSchema.to_dict()DatasetConfigSchema.to_legacy_format()DatasetConfigSchema.train_groupDatasetConfigSchema.train_group_filterDatasetConfigSchema.train_group_paramsDatasetConfigSchema.train_paramsDatasetConfigSchema.train_xDatasetConfigSchema.train_x_filterDatasetConfigSchema.train_x_paramsDatasetConfigSchema.train_yDatasetConfigSchema.train_y_filterDatasetConfigSchema.train_y_paramsDatasetConfigSchema.validate_data_sources()DatasetConfigSchema.variation_modeDatasetConfigSchema.variation_prefixDatasetConfigSchema.variation_selectDatasetConfigSchema.variationsDatasetConfigSchema.variations_to_legacy_format()
DatasetConfigsFeatureLayoutFeatureSourceFeatureSource.paddingFeatureSource.pad_valueFeatureSource.add_samples()FeatureSource.add_samples_batch_3d()FeatureSource.augment_samples()FeatureSource.header_unitFeatureSource.headersFeatureSource.num_2d_featuresFeatureSource.num_featuresFeatureSource.num_processingsFeatureSource.num_samplesFeatureSource.processing_idsFeatureSource.reset_features()FeatureSource.set_headers()FeatureSource.update_features()FeatureSource.x()
FileConfigHeaderUnitLinkingErrorLoadingParamsLoadingParams.categorical_modeLoadingParams.decimal_separatorLoadingParams.delimiterLoadingParams.encodingLoadingParams.has_headerLoadingParams.header_unitLoadingParams.merge_with()LoadingParams.model_configLoadingParams.na_policyLoadingParams.normalize_header_unit()LoadingParams.normalize_signal_type()LoadingParams.signal_type
PartitionAssignerPartitionConfigPartitionConfig.columnPartitionConfig.model_configPartitionConfig.predictPartitionConfig.predict_filePartitionConfig.predict_valuesPartitionConfig.random_statePartitionConfig.shufflePartitionConfig.stratifyPartitionConfig.testPartitionConfig.test_filePartitionConfig.test_valuesPartitionConfig.to_assigner_spec()PartitionConfig.trainPartitionConfig.train_filePartitionConfig.train_valuesPartitionConfig.typePartitionConfig.unknown_policyPartitionConfig.validate_partition_method()
PartitionErrorPartitionResultPartitionResult.train_indicesPartitionResult.test_indicesPartitionResult.predict_indicesPartitionResult.train_dataPartitionResult.test_dataPartitionResult.predict_dataPartitionResult.partition_columnPartitionResult.get_data()PartitionResult.get_indices()PartitionResult.has_predictPartitionResult.has_testPartitionResult.has_trainPartitionResult.partition_columnPartitionResult.predict_dataPartitionResult.predict_indicesPartitionResult.test_dataPartitionResult.test_indicesPartitionResult.train_dataPartitionResult.train_indices
PredictionAnalyzerPredictionAnalyzer.predictionsPredictionAnalyzer.dataset_name_overridePredictionAnalyzer.configPredictionAnalyzer.output_dirPredictionAnalyzer.cachePredictionAnalyzer.default_aggregatePredictionAnalyzer.branch_summary()PredictionAnalyzer.clear_cache()PredictionAnalyzer.generate_report()PredictionAnalyzer.get_branch_ids()PredictionAnalyzer.get_branches()PredictionAnalyzer.get_cache_stats()PredictionAnalyzer.get_cached_predictions()PredictionAnalyzer.plot_branch_boxplot()PredictionAnalyzer.plot_branch_comparison()PredictionAnalyzer.plot_branch_diagram()PredictionAnalyzer.plot_branch_heatmap()PredictionAnalyzer.plot_candlestick()PredictionAnalyzer.plot_confusion_matrix()PredictionAnalyzer.plot_heatmap()PredictionAnalyzer.plot_histogram()PredictionAnalyzer.plot_nested_branches()PredictionAnalyzer.plot_top_k()
PredictionResultPredictionResult.__repr__()PredictionResult.__str__()PredictionResult.config_namePredictionResult.dataset_namePredictionResult.eval_score()PredictionResult.fold_idPredictionResult.idPredictionResult.model_namePredictionResult.op_counterPredictionResult.save_to_csv()PredictionResult.step_idxPredictionResult.summary()
PredictionResultsListPredictionsPredictions.__len__()Predictions.__repr__()Predictions.__str__()Predictions.add_prediction()Predictions.add_predictions()Predictions.aggregate()Predictions.archive_to_catalog()Predictions.clear()Predictions.clear_caches()Predictions.compare_across_datasets()Predictions.filter_by_branch()Predictions.filter_by_criteria()Predictions.filter_predictions()Predictions.get_best()Predictions.get_cache_stats()Predictions.get_configs()Predictions.get_datasets()Predictions.get_entry_partitions()Predictions.get_folds()Predictions.get_models()Predictions.get_models_before_step()Predictions.get_oof_predictions()Predictions.get_partitions()Predictions.get_prediction_by_id()Predictions.get_predictions_by_step()Predictions.get_similar()Predictions.get_summary_stats()Predictions.get_unique_values()Predictions.list_runs()Predictions.load()Predictions.load_from_file()Predictions.load_from_file_cls()Predictions.load_from_parquet()Predictions.merge_parquet_files()Predictions.merge_predictions()Predictions.num_predictionsPredictions.pred_long_string()Predictions.pred_short_string()Predictions.query_best()Predictions.save_all_to_csv()Predictions.save_predictions_to_csv()Predictions.save_to_file()Predictions.save_to_parquet()Predictions.to_dataframe()Predictions.to_dicts()Predictions.to_pandas()Predictions.top()
RoleAssignerRoleAssignmentErrorRowSelectionErrorRowSelectorSampleLinkerSignalTypeSignalType.ABSORBANCESignalType.AUTOSignalType.KUBELKA_MUNKSignalType.LOG_1_RSignalType.LOG_1_TSignalType.PREPROCESSEDSignalType.REFLECTANCESignalType.REFLECTANCE_PERCENTSignalType.TRANSMITTANCESignalType.TRANSMITTANCE_PERCENTSignalType.UNKNOWNSignalType.from_string()SignalType.is_absorbance_likeSignalType.is_determinableSignalType.is_fractionSignalType.is_percentSignalType.is_reflectance_basedSignalType.is_transmittance_based
SignalTypeDetectorSpectroDatasetSpectroDataset.nameSpectroDataset.featuresSpectroDataset.targetsSpectroDataset.metadata_accessorSpectroDataset.foldsSpectroDataset.__str__()SpectroDataset.add_features()SpectroDataset.add_merged_features()SpectroDataset.add_metadata()SpectroDataset.add_metadata_column()SpectroDataset.add_processed_targets()SpectroDataset.add_samples()SpectroDataset.add_samples_batch()SpectroDataset.add_targets()SpectroDataset.aggregateSpectroDataset.aggregate_exclude_outliersSpectroDataset.aggregate_methodSpectroDataset.aggregate_outlier_thresholdSpectroDataset.augment_samples()SpectroDataset.detect_signal_type()SpectroDataset.features_processings()SpectroDataset.features_sources()SpectroDataset.float_headers()SpectroDataset.foldsSpectroDataset.get_merged_features()SpectroDataset.header_unit()SpectroDataset.headers()SpectroDataset.index_column()SpectroDataset.is_classificationSpectroDataset.is_multi_source()SpectroDataset.is_regressionSpectroDataset.keep_sources()SpectroDataset.metadata()SpectroDataset.metadata_column()SpectroDataset.metadata_columnsSpectroDataset.metadata_numeric()SpectroDataset.n_sourcesSpectroDataset.num_classesSpectroDataset.num_featuresSpectroDataset.num_foldsSpectroDataset.num_samplesSpectroDataset.print_summary()SpectroDataset.replace_features()SpectroDataset.reshape_reps_to_preprocessings()SpectroDataset.reshape_reps_to_sources()SpectroDataset.set_aggregate()SpectroDataset.set_aggregate_exclude_outliers()SpectroDataset.set_aggregate_method()SpectroDataset.set_folds()SpectroDataset.set_signal_type()SpectroDataset.set_task_type()SpectroDataset.short_preprocessings_str()SpectroDataset.signal_type()SpectroDataset.signal_typesSpectroDataset.task_typeSpectroDataset.update_features()SpectroDataset.update_metadata()SpectroDataset.wavelengths_cm1()SpectroDataset.wavelengths_nm()SpectroDataset.x()SpectroDataset.y()
TaskTypeValidationErrorValidationResultValidationWarningdetect_signal_type()normalize_config()normalize_header_unit()normalize_layout()normalize_signal_type()
- Subpackages
- nirs4all.operators package
- Subpackages
- Module contents
AugmenterBaselineCropTransformerDerivateDetrendGaussianHaarIdentityAugmenterIdentityTransformerLocalStandardNormalVariateMultiplicativeScatterCorrectionNormalizeRandom_X_OperationResampleTransformerRobustStandardNormalVariateRotate_TranslateSampleFilterSavitzkyGolaySimpleScaleSpline_Curve_SimplificationSpline_SmoothingSpline_X_PerturbationsSpline_X_SimplificationSpline_Y_PerturbationsStandardNormalVariateWaveletYOutlierFilterbaseline()derivate()detrend()gaussian()msc()norml()savgol()spl_norml()wavelet_transform()
- nirs4all.pipeline package
- Subpackages
- Submodules
- Module contents
ArtifactProviderBundleFormatBundleGeneratorBundleLoaderBundleLoader.bundle_pathBundleLoader.metadataBundleLoader.traceBundleLoader.pipeline_configBundleLoader.fold_weightsBundleLoader.artifact_providerBundleLoader.get_chain_for_artifact()BundleLoader.get_merged_chains()BundleLoader.get_partitioner_routing()BundleLoader.get_required_metadata_columns()BundleLoader.get_step_info()BundleLoader.has_partitioner_routing()BundleLoader.import_artifacts_to_registry()BundleLoader.predict()BundleLoader.predict_with_metadata()BundleLoader.to_resolved_prediction()
BundleMetadataBundleMetadata.bundle_format_versionBundleMetadata.nirs4all_versionBundleMetadata.created_atBundleMetadata.pipeline_uidBundleMetadata.source_typeBundleMetadata.model_step_indexBundleMetadata.fold_strategyBundleMetadata.preprocessing_chainBundleMetadata.trace_idBundleMetadata.original_manifestBundleMetadata.partitioner_routingBundleMetadata.bundle_format_versionBundleMetadata.created_atBundleMetadata.fold_strategyBundleMetadata.from_dict()BundleMetadata.model_step_indexBundleMetadata.nirs4all_versionBundleMetadata.original_manifestBundleMetadata.partitioner_routingBundleMetadata.pipeline_uidBundleMetadata.preprocessing_chainBundleMetadata.source_typeBundleMetadata.trace_id
ExecutionStepExecutionStep.step_indexExecutionStep.operator_typeExecutionStep.operator_classExecutionStep.operator_configExecutionStep.execution_modeExecutionStep.artifactsExecutionStep.branch_pathExecutionStep.branch_nameExecutionStep.duration_msExecutionStep.metadataExecutionStep.input_chain_pathExecutionStep.output_chain_pathsExecutionStep.source_countExecutionStep.produces_branchesExecutionStep.substep_indexExecutionStep.add_output_chain()ExecutionStep.artifactsExecutionStep.branch_nameExecutionStep.branch_pathExecutionStep.duration_msExecutionStep.execution_modeExecutionStep.from_dict()ExecutionStep.has_artifacts()ExecutionStep.input_chain_pathExecutionStep.input_features_shapeExecutionStep.input_shapeExecutionStep.metadataExecutionStep.operator_classExecutionStep.operator_configExecutionStep.operator_typeExecutionStep.output_chain_pathsExecutionStep.output_features_shapeExecutionStep.output_shapeExecutionStep.produces_branchesExecutionStep.source_countExecutionStep.step_indexExecutionStep.substep_indexExecutionStep.to_dict()
ExecutionTraceExecutionTrace.trace_idExecutionTrace.pipeline_uidExecutionTrace.created_atExecutionTrace.stepsExecutionTrace.model_step_indexExecutionTrace.fold_weightsExecutionTrace.preprocessing_chainExecutionTrace.metadataExecutionTrace.add_step()ExecutionTrace.created_atExecutionTrace.finalize()ExecutionTrace.fold_weightsExecutionTrace.from_dict()ExecutionTrace.get_artifact_ids()ExecutionTrace.get_artifacts_by_step()ExecutionTrace.get_fold_artifact_ids()ExecutionTrace.get_model_artifact_id()ExecutionTrace.get_step()ExecutionTrace.get_steps_before()ExecutionTrace.get_steps_up_to_model()ExecutionTrace.metadataExecutionTrace.model_step_indexExecutionTrace.pipeline_uidExecutionTrace.preprocessing_chainExecutionTrace.set_model_step()ExecutionTrace.stepsExecutionTrace.to_dict()ExecutionTrace.trace_id
ExplainerExtractedPipelineExtractedPipeline.stepsExtractedPipeline.traceExtractedPipeline.artifact_providerExtractedPipeline.model_step_indexExtractedPipeline.preprocessing_chainExtractedPipeline.source_pipeline_uidExtractedPipeline.metadataExtractedPipeline.artifact_providerExtractedPipeline.get_model_step()ExtractedPipeline.get_step()ExtractedPipeline.metadataExtractedPipeline.model_step_indexExtractedPipeline.preprocessing_chainExtractedPipeline.set_model()ExtractedPipeline.set_step()ExtractedPipeline.source_pipeline_uidExtractedPipeline.stepsExtractedPipeline.trace
FoldStrategyLoaderArtifactProviderLoaderArtifactProvider.loaderLoaderArtifactProvider.traceLoaderArtifactProvider.get_artifact()LoaderArtifactProvider.get_artifact_by_chain()LoaderArtifactProvider.get_artifacts_for_chain_prefix()LoaderArtifactProvider.get_artifacts_for_step()LoaderArtifactProvider.get_fold_artifacts()LoaderArtifactProvider.has_artifacts_for_step()
MapArtifactProviderMinimalArtifactProviderMinimalArtifactProvider.minimal_pipelineMinimalArtifactProvider.artifact_loaderMinimalArtifactProvider.target_sub_indexMinimalArtifactProvider.target_model_nameMinimalArtifactProvider.get_artifact()MinimalArtifactProvider.get_artifacts_for_step()MinimalArtifactProvider.get_fold_artifacts()MinimalArtifactProvider.get_fold_weights()MinimalArtifactProvider.has_artifacts_for_step()
MinimalPipelineMinimalPipeline.trace_idMinimalPipeline.pipeline_uidMinimalPipeline.stepsMinimalPipeline.artifact_mapMinimalPipeline.model_step_indexMinimalPipeline.fold_weightsMinimalPipeline.preprocessing_chainMinimalPipeline.metadataMinimalPipeline.artifact_mapMinimalPipeline.fold_weightsMinimalPipeline.get_all_chain_paths()MinimalPipeline.get_artifact_by_chain()MinimalPipeline.get_artifact_ids()MinimalPipeline.get_artifacts_for_step()MinimalPipeline.get_step()MinimalPipeline.get_step_count()MinimalPipeline.get_step_indices()MinimalPipeline.has_step()MinimalPipeline.metadataMinimalPipeline.model_step_indexMinimalPipeline.pipeline_uidMinimalPipeline.preprocessing_chainMinimalPipeline.stepsMinimalPipeline.trace_id
MinimalPipelineStepMinimalPipelineStep.step_indexMinimalPipelineStep.step_configMinimalPipelineStep.execution_modeMinimalPipelineStep.artifactsMinimalPipelineStep.operator_typeMinimalPipelineStep.operator_classMinimalPipelineStep.branch_pathMinimalPipelineStep.branch_nameMinimalPipelineStep.depends_onMinimalPipelineStep.artifactsMinimalPipelineStep.branch_nameMinimalPipelineStep.branch_pathMinimalPipelineStep.depends_onMinimalPipelineStep.execution_modeMinimalPipelineStep.get_artifact_by_chain()MinimalPipelineStep.get_artifact_ids()MinimalPipelineStep.get_artifacts_by_chain()MinimalPipelineStep.has_artifacts()MinimalPipelineStep.operator_classMinimalPipelineStep.operator_typeMinimalPipelineStep.step_configMinimalPipelineStep.step_indexMinimalPipelineStep.substep_index
MinimalPredictorPipelineConfigsPipelineLibraryPipelineRunnerPipelineRunner.workspace_pathPipelineRunner.verbosePipelineRunner.modePipelineRunner.save_artifactsPipelineRunner.save_chartsPipelineRunner.enable_tab_reportsPipelineRunner.continue_on_errorPipelineRunner.show_spinnerPipelineRunner.keep_datasetsPipelineRunner.plots_visiblePipelineRunner.orchestratorPipelineRunner.predictorPipelineRunner.explainerPipelineRunner.raw_dataPipelineRunner.pp_dataPipelineRunner.current_run_dirPipelineRunner.explain()PipelineRunner.export()PipelineRunner.export_best_for_dataset()PipelineRunner.export_model()PipelineRunner.extract()PipelineRunner.last_aggregatePipelineRunner.last_aggregate_exclude_outliersPipelineRunner.last_aggregate_methodPipelineRunner.libraryPipelineRunner.next_op()PipelineRunner.predict()PipelineRunner.retrain()PipelineRunner.run()PipelineRunner.runs_dir
PipelineWriterPredictionResolverPredictorResolvedPredictionResolvedPrediction.source_typeResolvedPrediction.minimal_pipelineResolvedPrediction.artifact_providerResolvedPrediction.traceResolvedPrediction.fold_strategyResolvedPrediction.fold_weightsResolvedPrediction.model_step_indexResolvedPrediction.target_modelResolvedPrediction.pipeline_uidResolvedPrediction.run_dirResolvedPrediction.manifestResolvedPrediction.artifact_providerResolvedPrediction.fold_strategyResolvedPrediction.fold_weightsResolvedPrediction.get_preprocessing_chain()ResolvedPrediction.has_fold_artifacts()ResolvedPrediction.has_trace()ResolvedPrediction.manifestResolvedPrediction.minimal_pipelineResolvedPrediction.model_step_indexResolvedPrediction.pipeline_uidResolvedPrediction.run_dirResolvedPrediction.source_typeResolvedPrediction.target_modelResolvedPrediction.trace
RetrainArtifactProviderRetrainConfigRetrainConfig.modeRetrainConfig.step_modesRetrainConfig.new_modelRetrainConfig.epochsRetrainConfig.learning_rateRetrainConfig.freeze_layersRetrainConfig.metadataRetrainConfig.epochsRetrainConfig.freeze_layersRetrainConfig.get_step_mode()RetrainConfig.learning_rateRetrainConfig.metadataRetrainConfig.modeRetrainConfig.new_modelRetrainConfig.should_train_step()RetrainConfig.step_modes
RetrainModeRetrainerSourceTypeSourceType.PREDICTIONSourceType.FOLDERSourceType.RUNSourceType.ARTIFACT_IDSourceType.BUNDLESourceType.TRACE_IDSourceType.MODEL_FILESourceType.UNKNOWNSourceType.ARTIFACT_IDSourceType.BUNDLESourceType.FOLDERSourceType.MODEL_FILESourceType.PREDICTIONSourceType.RUNSourceType.TRACE_IDSourceType.UNKNOWN
StepArtifactsStepArtifacts.artifact_idsStepArtifacts.primary_artifact_idStepArtifacts.fold_artifact_idsStepArtifacts.primary_artifactsStepArtifacts.by_branchStepArtifacts.by_sourceStepArtifacts.by_chainStepArtifacts.metadataStepArtifacts.add_artifact()StepArtifacts.add_fold_artifact()StepArtifacts.artifact_idsStepArtifacts.by_branchStepArtifacts.by_chainStepArtifacts.by_sourceStepArtifacts.fold_artifact_idsStepArtifacts.from_dict()StepArtifacts.get_artifact_by_chain()StepArtifacts.get_artifacts_for_branch()StepArtifacts.get_artifacts_for_source()StepArtifacts.merge()StepArtifacts.metadataStepArtifacts.primary_artifact_idStepArtifacts.primary_artifactsStepArtifacts.to_dict()
StepModeTargetResolverTraceBasedExtractorTraceBasedExtractor.include_skippedTraceBasedExtractor.preserve_orderTraceBasedExtractor.extract()TraceBasedExtractor.extract_for_branch()TraceBasedExtractor.extract_for_branch_name()TraceBasedExtractor.extract_for_step()TraceBasedExtractor.get_required_artifact_ids()TraceBasedExtractor.get_step_dependency_graph()TraceBasedExtractor.validate_trace_for_prediction()
TraceRecorderTraceRecorder.traceTraceRecorder.current_stepTraceRecorder.step_start_timeTraceRecorder.pipeline_idTraceRecorder.add_step_metadata()TraceRecorder.build_chain_for_artifact()TraceRecorder.current_branch_path()TraceRecorder.current_chain()TraceRecorder.end_step()TraceRecorder.enter_branch()TraceRecorder.exit_branch()TraceRecorder.finalize()TraceRecorder.get_current_step_index()TraceRecorder.has_model_step()TraceRecorder.in_branch()TraceRecorder.mark_step_skipped()TraceRecorder.pop_chain()TraceRecorder.push_chain()TraceRecorder.record_artifact()TraceRecorder.record_input_shapes()TraceRecorder.record_output_shapes()TraceRecorder.reset_chain_to()TraceRecorder.start_branch_step()TraceRecorder.start_branch_substep()TraceRecorder.start_step()TraceRecorder.trace_id
WorkspaceExporter
- nirs4all.sklearn package
- Submodules
- Module contents
NIRSPipelineNIRSPipeline.is_fitted_NIRSPipeline.model_NIRSPipeline.bundle_loader_NIRSPipeline.preprocessing_chainNIRSPipeline.model_step_indexNIRSPipeline.fold_weightsNIRSPipeline.predict()NIRSPipeline.score()NIRSPipeline.transform()NIRSPipeline.__repr__()NIRSPipeline.__str__()NIRSPipeline.bundle_loader_NIRSPipeline.fit()NIRSPipeline.fold_weightsNIRSPipeline.from_bundle()NIRSPipeline.from_result()NIRSPipeline.get_params()NIRSPipeline.get_transformers()NIRSPipeline.is_fitted_NIRSPipeline.model_NIRSPipeline.model_nameNIRSPipeline.model_step_indexNIRSPipeline.n_foldsNIRSPipeline.predict()NIRSPipeline.preprocessing_chainNIRSPipeline.score()NIRSPipeline.set_params()NIRSPipeline.shap_modelNIRSPipeline.transform()
NIRSPipelineClassifier
- nirs4all.utils package
- Submodules
- nirs4all.utils.backend module
BackendNotAvailableErrorcheck_backend_available()clear_availability_cache()framework()get_backend_info()get_gpu_info()is_available()is_gpu_available()is_ikpls_available()is_jax_available()is_keras_available()is_tensorflow_available()is_torch_available()lazy_import()print_backend_status()require_backend()
- nirs4all.utils.header_units module
- nirs4all.utils.reproducibility module
- nirs4all.utils.spinner module
- nirs4all.utils.backend module
- Module contents
BackendNotAvailableErrorapply_x_axis_limits()check_backend_available()clear_availability_cache()framework()get_axis_label()get_backend_info()get_gpu_info()get_x_values_and_label()is_available()is_gpu_available()is_jax_available()is_keras_available()is_tensorflow_available()is_torch_available()lazy_import()print_backend_status()require_backend()should_invert_x_axis()
- Submodules
- nirs4all.visualization package
- Subpackages
- Submodules
- Module contents
BranchAnalyzerBranchDiagramBranchSummaryPipelineDiagramPredictionAnalyzerPredictionAnalyzer.predictionsPredictionAnalyzer.dataset_name_overridePredictionAnalyzer.configPredictionAnalyzer.output_dirPredictionAnalyzer.cachePredictionAnalyzer.default_aggregatePredictionAnalyzer.branch_summary()PredictionAnalyzer.clear_cache()PredictionAnalyzer.generate_report()PredictionAnalyzer.get_branch_ids()PredictionAnalyzer.get_branches()PredictionAnalyzer.get_cache_stats()PredictionAnalyzer.get_cached_predictions()PredictionAnalyzer.plot_branch_boxplot()PredictionAnalyzer.plot_branch_comparison()PredictionAnalyzer.plot_branch_diagram()PredictionAnalyzer.plot_branch_heatmap()PredictionAnalyzer.plot_candlestick()PredictionAnalyzer.plot_confusion_matrix()PredictionAnalyzer.plot_heatmap()PredictionAnalyzer.plot_histogram()PredictionAnalyzer.plot_nested_branches()PredictionAnalyzer.plot_top_k()
plot_branch_diagram()plot_pipeline_diagram()
- nirs4all.workspace package
Module contents
NIRS4All - A comprehensive package for Near-Infrared Spectroscopy data processing and analysis.
This package provides tools for spectroscopy data handling, preprocessing, model building, and pipeline management with support for multiple ML backends.
Public API (recommended):
nirs4all.run(pipeline, dataset, ...) - Train a pipeline
nirs4all.predict(model, data, ...) - Make predictions
nirs4all.explain(model, data, ...) - Generate SHAP explanations
nirs4all.retrain(source, data, ...) - Retrain a pipeline
nirs4all.session(...) - Create execution session
nirs4all.load_session(path) - Load saved session
nirs4all.generate(n_samples, ...) - Generate synthetic NIRS data
- Classes (for advanced usage):
nirs4all.PipelineRunner - Direct runner access nirs4all.PipelineConfigs - Pipeline configuration nirs4all.DatasetConfigs - Dataset configuration (from nirs4all.data)
Example
>>> import nirs4all
>>> from sklearn.preprocessing import MinMaxScaler
>>> from sklearn.cross_decomposition import PLSRegression
>>>
>>> result = nirs4all.run(
... pipeline=[MinMaxScaler(), PLSRegression(10)],
... dataset="sample_data/regression",
... verbose=1
... )
>>> print(f"Best RMSE: {result.best_rmse:.4f}")
>>> result.export("exports/best_model.n4a")
- Synthetic Data Generation:
>>> # Generate synthetic data for testing >>> dataset = nirs4all.generate(n_samples=1000, random_state=42) >>> >>> # Use convenience functions >>> dataset = nirs4all.generate.regression(n_samples=500) >>> dataset = nirs4all.generate.classification(n_samples=300, n_classes=3)
See examples/ for more usage examples.
- class nirs4all.ExplainResult(shap_values: ~typing.Any, feature_names: ~typing.List[str] | None = None, base_value: float | ~numpy.ndarray | None = None, visualizations: ~typing.Dict[str, ~pathlib.Path] = <factory>, explainer_type: str = 'auto', model_name: str = '', n_samples: int = 0)[source]
Bases:
objectResult from nirs4all.explain().
Wraps SHAP explanation outputs with visualization helpers and accessors.
- shap_values
SHAP values array or Explanation object.
- Type:
Any
- base_value
Expected value (baseline prediction).
- Type:
float | numpy.ndarray | None
- visualizations
Paths to generated visualization files.
- Type:
Dict[str, pathlib.Path]
- Properties:
values: Raw SHAP values array. shape: Shape of SHAP values array. mean_abs_shap: Mean absolute SHAP values per feature. top_features: Feature names sorted by importance.
Example
>>> result = nirs4all.explain(model, X_test) >>> print(f"Top features: {result.top_features[:5]}") >>> importance = result.get_feature_importance()
- get_feature_importance(top_n: int | None = None, normalize: bool = False) Dict[str, float][source]
Get feature importance ranking.
- Parameters:
top_n – If provided, return only top N features.
normalize – If True, normalize values to sum to 1.
- Returns:
Dictionary mapping feature names to importance values.
- get_sample_explanation(idx: int) Dict[str, float][source]
Get SHAP explanation for a single sample.
- Parameters:
idx – Sample index.
- Returns:
Dictionary mapping feature names to SHAP values for that sample.
- property mean_abs_shap: ndarray
Get mean absolute SHAP values per feature.
- Returns:
1D array of mean |SHAP| values, one per feature.
- to_dataframe(include_feature_names: bool = True)[source]
Get SHAP values as pandas DataFrame.
- Parameters:
include_feature_names – If True, use feature names as columns.
- Returns:
pandas DataFrame with SHAP values.
- Raises:
ImportError – If pandas is not available.
- property top_features: List[str]
Get feature names sorted by importance (descending).
- Returns:
List of feature names, most important first. Returns indices as strings if feature_names not available.
- class nirs4all.PipelineConfigs(definition: Dict | List[Any] | str, name: str = '', description: str = 'No description provided', max_generation_count: int = 10000)[source]
Bases:
objectClass to hold the configuration for a pipeline.
- static get_hash(steps) str[source]
Generate a hash for the pipeline configuration.
All objects are fully JSON-serializable (no _runtime_instance). No need for default=str hack anymore.
- class nirs4all.PipelineRunner(workspace_path: str | Path | None = None, verbose: int = 0, mode: str = 'train', save_artifacts: bool = True, save_charts: bool = True, enable_tab_reports: bool = True, continue_on_error: bool = False, show_spinner: bool = True, keep_datasets: bool = True, plots_visible: bool = False, random_state: int | None = None, log_file: bool = True, log_format: str = 'pretty', use_unicode: bool | None = None, use_colors: bool | None = None, show_progress_bar: bool = True, json_output: bool = False)[source]
Bases:
objectMain pipeline execution interface.
Orchestrates pipeline execution on datasets, providing a simplified interface for training, prediction, and explanation workflows. Delegates actual execution to PipelineOrchestrator, Predictor, and Explainer.
- workspace_path
Root workspace directory
- Type:
Path
- orchestrator
Underlying orchestrator for execution
- Type:
Example
>>> # Training workflow >>> runner = PipelineRunner(workspace_path="./workspace", verbose=1) >>> pipeline = [{"preprocessing": StandardScaler()}, {"model": SVC()}] >>> X, y = load_data() >>> predictions, dataset_preds = runner.run(pipeline, (X, y))
>>> # Prediction workflow >>> runner = PipelineRunner(mode="predict") >>> y_pred, preds = runner.predict(best_model, X_new)
>>> # Explanation workflow >>> runner = PipelineRunner(mode="explain") >>> shap_results, out_dir = runner.explain(best_model, X_test)
- property current_run_dir: Path | None
Get current run directory.
- Returns:
Path to current run directory, or None if not set
- explain(prediction_obj: Dict[str, Any] | str, dataset: DatasetConfigs | SpectroDataset | ndarray | Tuple[ndarray, ...] | Dict | List[Dict] | str | List[str], dataset_name: str = 'explain_dataset', shap_params: Dict[str, Any] | None = None, verbose: int = 0, plots_visible: bool = True) Tuple[Dict[str, Any], str][source]
Generate SHAP explanations for a saved model.
Delegates to Explainer class for actual execution.
- Parameters:
prediction_obj – Model identifier (dict with config_path or prediction ID)
dataset – Dataset to explain on
dataset_name – Name for the dataset
shap_params – SHAP configuration parameters
verbose – Verbosity level
plots_visible – Whether to display plots interactively
- Returns:
Tuple of (shap_results_dict, output_directory_path)
- export(source: Dict[str, Any] | str | Path, output_path: str | Path, format: str = 'n4a', include_metadata: bool = True, compress: bool = True) Path[source]
Export a trained pipeline to a standalone bundle.
Creates a self-contained prediction bundle that can be used for deployment, sharing, or archival without requiring the original workspace or full nirs4all installation.
- Supported formats:
‘n4a’: Full bundle (ZIP archive with artifacts and metadata)
‘n4a.py’: Portable Python script with embedded artifacts
- Phase 6 Feature:
This method enables exporting trained pipelines as standalone bundles that can be loaded and used for prediction without the original workspace structure.
- Parameters:
source – Prediction source to export. Can be: - prediction dict: From a previous run’s Predictions object - folder path: Path to a pipeline directory - Run object: Best prediction from a Run
output_path – Path for the output bundle file
format – Bundle format (‘n4a’ or ‘n4a.py’)
include_metadata – Whether to include full metadata in bundle
compress – Whether to compress artifacts (for .n4a format)
- Returns:
Path to the created bundle file
- Raises:
ValueError – If format is not supported
FileNotFoundError – If source cannot be resolved
Example
>>> runner = PipelineRunner() >>> predictions, _ = runner.run(pipeline, dataset) >>> best_pred = predictions.top(n=1)[0] >>> >>> # Export to .n4a bundle >>> runner.export(best_pred, "exports/wheat_model.n4a") >>> >>> # Export to portable Python script >>> runner.export(best_pred, "exports/wheat_model.n4a.py", format='n4a.py') >>> >>> # Later, predict from bundle >>> y_pred, _ = runner.predict("exports/wheat_model.n4a", X_new)
- export_best_for_dataset(dataset_name: str, mode: str = 'predictions') Path | None[source]
Export best results for a dataset to exports/ folder.
- Parameters:
dataset_name – Name of the dataset to export
mode – Export mode (‘predictions’ or other)
- Returns:
Path to exported file, or None if export failed
- export_model(source: Dict[str, Any] | str | Path, output_path: str | Path, format: str | None = None, fold: int | None = None) Path[source]
Export only the model artifact from a trained pipeline.
Unlike export() which creates a full bundle with all preprocessing artifacts and metadata, this method exports just the model binary. This is useful when you want a lightweight model file that can be loaded directly into other pipelines or used with external tools.
The output format is determined by the file extension or can be specified explicitly. The model can then be reloaded using: - Direct path in pipeline config: {“model”: “path/to/model.joblib”} - As prediction source: runner.predict(“path/to/model.joblib”, data)
- Parameters:
source – Prediction source to export from. Can be: - prediction dict: From a previous run’s Predictions object - folder path: Path to a pipeline directory - bundle path: Path to a .n4a bundle
output_path – Path for the output model file. Extension determines format: .joblib, .pkl, .h5, .keras, .pt
format – Optional explicit format (‘joblib’, ‘pickle’, ‘keras_h5’). If None, determined from output_path extension.
fold – Optional fold index to export. If None, exports fold 0 or the primary model artifact.
- Returns:
Path to the created model file
- Raises:
ValueError – If no model artifact found
FileNotFoundError – If source cannot be resolved
Example
>>> runner = PipelineRunner() >>> predictions, _ = runner.run(pipeline, dataset) >>> best_pred = predictions.top(n=1)[0] >>> >>> # Export just the model >>> runner.export_model(best_pred, "exports/pls_model.joblib") >>> >>> # Later, use in new pipeline >>> new_pipeline = [ ... MinMaxScaler(), ... {"model": "exports/pls_model.joblib", "name": "pretrained"} ... ]
- extract(source: Dict[str, Any] | str | Path) ExtractedPipeline[source]
Extract a trained pipeline for inspection or modification.
Loads a trained pipeline from a prediction source and returns an ExtractedPipeline object that can be inspected, modified, and then executed with runner.run().
- Phase 7 Feature:
This method enables extracting and modifying trained pipelines without retraining from scratch.
- Parameters:
source – Prediction source to extract. Can be: - prediction dict: From a previous run’s Predictions object - folder path: Path to a pipeline directory - Run object: Best prediction from a Run - artifact_id: Direct artifact reference - bundle: Exported prediction bundle (.n4a)
- Returns:
steps: List of pipeline steps (can be modified)
trace: Original execution trace (read-only)
artifact_provider: Provider for original artifacts
model_step_index: Index of the model step
preprocessing_chain: Summary of preprocessing
- Return type:
ExtractedPipeline object with
Example
>>> runner = PipelineRunner() >>> predictions, _ = runner.run(pipeline, dataset) >>> best_pred = predictions.top(n=1)[0] >>> >>> # Extract for inspection >>> extracted = runner.extract(best_pred) >>> print(f"Steps: {len(extracted.steps)}") >>> print(f"Preprocessing: {extracted.preprocessing_chain}") >>> >>> # Modify and run >>> from sklearn.ensemble import RandomForestRegressor >>> extracted.set_model(RandomForestRegressor()) >>> new_preds, _ = runner.run(extracted.steps, new_data)
- property last_aggregate: str | None
Get aggregate column from the last executed dataset.
Returns the aggregation setting from the last dataset processed by run(). This can be used to create a PredictionAnalyzer with matching defaults.
- Returns:
Aggregate column name (‘y’ for y-based aggregation, column name for metadata-based aggregation, or None if no aggregation was set).
Example
>>> runner = PipelineRunner() >>> predictions, _ = runner.run(pipeline, DatasetConfigs(path, aggregate='sample_id')) >>> # Create analyzer with same aggregate setting >>> analyzer = PredictionAnalyzer(predictions, default_aggregate=runner.last_aggregate)
- property last_aggregate_exclude_outliers: bool
Get aggregate exclude_outliers setting from the last executed dataset.
- Returns:
True if T² outlier exclusion was enabled, False otherwise.
- property last_aggregate_method: str | None
Get aggregate method from the last executed dataset.
- Returns:
Aggregate method (‘mean’, ‘median’, ‘vote’) or None for default.
- property library: PipelineLibrary
Get pipeline library for template management.
- Returns:
PipelineLibrary instance for managing pipeline templates
- next_op() int[source]
Get the next operation ID (for controller compatibility).
- Returns:
Next operation counter value
- predict(prediction_obj: Dict[str, Any] | str, dataset: DatasetConfigs | SpectroDataset | List[SpectroDataset] | ndarray | Tuple[ndarray, ...] | Dict | List[Dict] | str | List[str], dataset_name: str = 'prediction_dataset', all_predictions: bool = False, verbose: int = 0) Tuple[ndarray, Predictions] | Tuple[Dict[str, Any], Predictions][source]
Run prediction using a saved model on new dataset.
Delegates to Predictor class for actual execution.
- Parameters:
prediction_obj – Model identifier (dict with config_path or prediction ID)
dataset – New dataset to predict on
dataset_name – Name for the dataset
all_predictions – If True, return all predictions; if False, return single best
verbose – Verbosity level
- Returns:
(y_pred, predictions) If all_predictions=True: (predictions_dict, predictions)
- Return type:
If all_predictions=False
- retrain(source: Dict[str, Any] | str | Path, dataset: DatasetConfigs | SpectroDataset | ndarray | Tuple[ndarray, ...] | Dict | List[Dict] | str | List[str], mode: str = 'full', dataset_name: str = 'retrain_dataset', new_model: Any | None = None, epochs: int | None = None, step_modes: List[StepMode] | None = None, verbose: int = 0, **kwargs) Tuple[Predictions, Dict[str, Any]][source]
Retrain a pipeline on new data.
Enables retraining trained pipelines with various modes: - full: Train from scratch with same pipeline structure - transfer: Use existing preprocessing artifacts, train new model - finetune: Continue training existing model with new data
- Phase 7 Feature:
This method enables retraining pipelines without having to reconstruct the pipeline configuration manually. It uses the resolved prediction source (from Phase 3/4) to extract the pipeline structure and optionally reuse preprocessing artifacts.
- Parameters:
source – Prediction source to retrain from. Can be: - prediction dict: From a previous run’s Predictions object - folder path: Path to a pipeline directory - Run object: Best prediction from a Run - artifact_id: Direct artifact reference - bundle: Exported prediction bundle (.n4a)
dataset – New dataset to train on. Supports same formats as run()
mode – Retrain mode: - ‘full’: Train everything from scratch (same pipeline structure) - ‘transfer’: Use existing preprocessing, train new model - ‘finetune’: Continue training existing model
dataset_name – Name for the dataset if array-based
new_model – Optional new model for transfer mode (replaces original)
epochs – Optional epochs for fine-tuning
step_modes – Optional per-step mode overrides for fine-grained control
verbose – Verbosity level
**kwargs – Additional parameters: - learning_rate: Learning rate for fine-tuning - freeze_layers: List of layers to freeze during fine-tuning
- Returns:
Tuple of (run_predictions, datasets_predictions)
- Raises:
ValueError – If mode is invalid or source cannot be resolved
FileNotFoundError – If source references files that don’t exist
Example
>>> runner = PipelineRunner() >>> predictions, _ = runner.run(pipeline, dataset) >>> best_pred = predictions.top(n=1)[0] >>> >>> # Full retrain on new data >>> new_preds, _ = runner.retrain(best_pred, new_data, mode='full') >>> >>> # Transfer: use preprocessing from old model, train new one >>> new_preds, _ = runner.retrain( ... best_pred, new_data, mode='transfer', ... new_model=XGBRegressor() ... ) >>> >>> # Finetune: continue training existing model >>> new_preds, _ = runner.retrain( ... best_pred, new_data, mode='finetune', epochs=10 ... ) >>> >>> # Fine-grained control: specify per-step modes >>> from nirs4all.pipeline import StepMode >>> step_modes = [ ... StepMode(step_index=1, mode='predict'), # Use existing ... StepMode(step_index=2, mode='train'), # Retrain ... ] >>> new_preds, _ = runner.retrain( ... best_pred, new_data, mode='full', step_modes=step_modes ... )
- run(pipeline: PipelineConfigs | List[Any] | Dict | str, dataset: DatasetConfigs | SpectroDataset | List[SpectroDataset] | ndarray | Tuple[ndarray, ...] | Dict | List[Dict] | str | List[str], pipeline_name: str = '', dataset_name: str = 'dataset', max_generation_count: int = 10000) Tuple[Predictions, Dict[str, Any]][source]
Execute pipeline on dataset(s).
Main entry point for training workflows. Executes one or more pipeline configurations on one or more datasets, tracking predictions and artifacts.
- Parameters:
pipeline – Pipeline definition (PipelineConfigs, list of steps, dict, or path)
dataset – Dataset definition (see DatasetConfigs for supported formats)
pipeline_name – Optional pipeline name for identification
dataset_name – Name for array-based datasets
max_generation_count – Max pipeline combinations to generate
- Returns:
Tuple of (run_predictions, datasets_predictions)
- class nirs4all.PredictResult(y_pred: ~numpy.ndarray, metadata: ~typing.Dict[str, ~typing.Any] = <factory>, sample_indices: ~numpy.ndarray | None = None, model_name: str = '', preprocessing_steps: ~typing.List[str] = <factory>)[source]
Bases:
objectResult from nirs4all.predict().
Wraps prediction outputs with convenient accessors and conversion methods.
- y_pred
Predicted values array (n_samples,) or (n_samples, n_outputs).
- Type:
- sample_indices
Optional indices of predicted samples.
- Type:
numpy.ndarray | None
- Properties:
values: Alias for y_pred (for consistency). shape: Shape of prediction array. is_multioutput: True if predictions have multiple outputs.
Example
>>> result = nirs4all.predict(model, X_new) >>> print(f"Predictions shape: {result.shape}") >>> df = result.to_dataframe()
- to_dataframe(include_indices: bool = True)[source]
Get predictions as pandas DataFrame.
- Parameters:
include_indices – If True and sample_indices available, include as column.
- Returns:
pandas DataFrame with predictions.
- Raises:
ImportError – If pandas is not available.
- class nirs4all.RunResult(predictions: Predictions, per_dataset: Dict[str, Any], _runner: PipelineRunner | None = None)[source]
Bases:
objectResult from nirs4all.run().
Provides convenient access to predictions, best model, and artifacts. Wraps the raw (predictions, per_dataset) tuple returned by PipelineRunner.run().
- predictions
Predictions object containing all pipeline results.
- Type:
- Properties:
best: Best prediction entry by default ranking. best_score: Best model’s primary test score. best_rmse: Best model’s RMSE (regression). best_r2: Best model’s R² (regression). best_accuracy: Best model’s accuracy (classification). artifacts_path: Path to run artifacts directory. num_predictions: Total number of predictions stored.
Example
>>> result = nirs4all.run(pipeline, dataset) >>> print(f"Best RMSE: {result.best_rmse:.4f}") >>> print(f"Best R²: {result.best_r2:.4f}") >>> result.export("exports/best_model.n4a")
- property artifacts_path: Path | None
Get path to run artifacts directory.
- Returns:
Path to the current run directory, or None if not available.
- property best: Dict[str, Any]
Get best prediction entry by default ranking.
- Returns:
Dictionary containing best model’s metrics, name, and configuration. Empty dict if no predictions available.
- property best_accuracy: float
Get best model’s accuracy score (for classification).
- Returns:
Accuracy value or NaN if unavailable.
- property best_r2: float
Get best model’s R² score.
Looks for ‘r2’ in scores dict.
- Returns:
R² value or NaN if unavailable.
- property best_rmse: float
Get best model’s RMSE score.
Looks for ‘rmse’ in scores dict, then falls back to computing from y arrays.
- Returns:
RMSE value or NaN if unavailable.
- property best_score: float
Get best model’s primary test score.
- Returns:
The test_score value from best prediction, or NaN if unavailable.
- export(output_path: str | Path, format: str = 'n4a', source: Dict[str, Any] | None = None) Path[source]
Export a model to bundle.
- Parameters:
output_path – Path for the exported bundle file.
format – Export format (‘n4a’ or ‘n4a.py’).
source – Prediction dict to export. If None, exports best model.
- Returns:
Path to the exported bundle file.
- Raises:
RuntimeError – If runner reference is not available.
ValueError – If no predictions available and source not provided.
- export_model(output_path: str | Path, source: Dict[str, Any] | None = None, format: str | None = None, fold: int | None = None) Path[source]
Export only the model artifact (lightweight).
Unlike export() which creates a full bundle, this exports just the model.
- Parameters:
output_path – Path for the output model file.
source – Prediction dict to export. If None, exports best model.
format – Model format (inferred from extension if None).
fold – Fold index to export (default: fold 0).
- Returns:
Path to the exported model file.
- Raises:
RuntimeError – If runner reference is not available.
- filter(**kwargs) List[Dict[str, Any]][source]
Filter predictions by criteria.
- Parameters:
**kwargs – Filter criteria passed to predictions.filter_predictions(). Supported kwargs include: - dataset_name: Filter by dataset name - model_name: Filter by model name - partition: Filter by partition (‘train’, ‘val’, ‘test’) - fold_id: Filter by fold ID - step_idx: Filter by pipeline step index - branch_id: Filter by branch ID - load_arrays: If True, load actual arrays (default: True)
- Returns:
List of matching prediction dictionaries.
- get_datasets() List[str][source]
Get list of unique dataset names.
- Returns:
List of dataset names in predictions.
- get_models() List[str][source]
Get list of unique model names.
- Returns:
List of model names in predictions.
- property num_predictions: int
Get total number of predictions stored.
- Returns:
Number of prediction entries.
- predictions: Predictions
- summary() str[source]
Get a summary string of the run result.
- Returns:
Multi-line summary string with key metrics.
- top(n: int = 5, **kwargs) List[Dict[str, Any]] | Dict[tuple, List[Dict[str, Any]]][source]
Get top N predictions by ranking.
- Parameters:
n – Number of top predictions to return. When group_by is used, this means top N per group (e.g., top 3 per dataset).
**kwargs –
Additional arguments passed to predictions.top(). Supported kwargs include: - rank_metric: Metric to rank by (default: uses record’s metric) - rank_partition: Partition to rank on (default: “val”) - display_partition: Partition for display metrics (default: “test”) - aggregate_partitions: If True, include train/val/test data - ascending: Sort order (None = infer from metric) - group_by: Group predictions by column(s). Returns top N per group.
Each result includes ‘group_key’ for easy filtering.
return_grouped: If True with group_by, return dict of group->results instead of flat list. Default: False.
- Returns:
- List of prediction dicts,
ranked by score. With group_by, returns top N per group as flat list.
If return_grouped=True: Dict mapping group keys to lists of predictions.
- Return type:
If return_grouped=False (default)
Examples
>>> # Top 5 overall >>> result.top(5) >>> >>> # Top 3 per dataset (flat list) >>> top_per_ds = result.top(3, group_by='dataset_name') >>> ds1 = [r for r in top_per_ds if r['group_key'] == ('my_dataset',)] >>> >>> # Top 3 per dataset (grouped dict) >>> grouped = result.top(3, group_by='dataset_name', return_grouped=True) >>> for key, results in grouped.items(): ... print(f"{key}: {len(results)} results") >>> >>> # Multi-column grouping: top 2 per (dataset, model) combination >>> top_per_combo = result.top(2, group_by=['dataset_name', 'model_name']) >>> # Group keys are tuples: ('wheat', 'PLSRegression'), ('corn', 'RandomForest') >>> for r in top_per_combo: ... dataset, model = r['group_key'] ... print(f"{dataset}/{model}: {r['test_score']:.4f}")
- validate(check_nan_metrics: bool = True, check_empty: bool = True, raise_on_failure: bool = True, nan_threshold: float = 0.0) Dict[str, Any][source]
Validate the run result for common issues.
Checks for NaN values in metrics, empty predictions, and other issues that might indicate problems with the pipeline execution.
- Parameters:
check_nan_metrics – If True, check for NaN values in metrics.
check_empty – If True, check for empty predictions.
raise_on_failure – If True, raise ValueError on validation failure.
nan_threshold – Maximum allowed ratio of predictions with NaN metrics (0.0 = none allowed).
- Returns:
valid: True if all checks passed.
issues: List of issue descriptions.
nan_count: Number of predictions with NaN metrics.
total_count: Total number of predictions.
- Return type:
Dictionary with validation results
- Raises:
ValueError – If raise_on_failure=True and validation fails.
Example
>>> result = nirs4all.run(pipeline, dataset) >>> result.validate() # Raises if issues found >>> # Or check without raising >>> report = result.validate(raise_on_failure=False) >>> if not report['valid']: ... print(f"Issues: {report['issues']}")
- class nirs4all.Session(pipeline: List[Any] | None = None, name: str = '', **runner_kwargs: Any)[source]
Bases:
objectExecution session for resource reuse and stateful pipeline management.
A session can be used in two modes:
Resource sharing mode (no pipeline): Share a PipelineRunner across multiple nirs4all.run() calls.
Stateful pipeline mode (with pipeline): Manage a single pipeline’s lifecycle: train, predict, save, load.
- name
Session/pipeline name for identification.
- pipeline
Pipeline definition (if in stateful mode).
- status
Current session status (‘initialized’, ‘trained’, ‘error’).
- is_trained
Whether the pipeline has been trained.
- runner
The shared PipelineRunner instance.
- workspace_path
Path to the workspace directory.
- Example (resource sharing):
>>> with nirs4all.session(verbose=1) as s: ... result1 = nirs4all.run(pipeline1, data1, session=s) ... result2 = nirs4all.run(pipeline2, data2, session=s)
- Example (stateful pipeline):
>>> session = nirs4all.Session(pipeline=pipeline, name="MyModel") >>> result = session.run("sample_data/regression") >>> predictions = session.predict(new_data) >>> session.save("exports/my_model.n4a")
- __exit__(exc_type: Any, exc_val: Any, exc_tb: Any) None[source]
Exit the session context and clean up resources.
- close() None[source]
Clean up session resources.
Called automatically when exiting a context manager block.
- predict(dataset: str | Path | Any, **kwargs: Any) PredictResult[source]
Make predictions using the trained pipeline.
- Parameters:
dataset – Data to predict on. Can be: - Path to data folder - Numpy array X - Dict with ‘X’ key
**kwargs – Additional arguments for prediction.
- Returns:
PredictResult with predictions.
- Raises:
ValueError – If session has not been trained.
- retrain(dataset: str | Path | Any, mode: str = 'full', **kwargs: Any) RunResult[source]
Retrain the pipeline on new data.
- Parameters:
dataset – New dataset to train on.
mode – Retrain mode (‘full’, ‘transfer’, ‘finetune’).
**kwargs – Additional arguments for retraining.
- Returns:
RunResult from retraining.
- Raises:
ValueError – If session has not been trained.
- run(dataset: str | Path | Any, *, plots_visible: bool = False, **kwargs: Any) RunResult[source]
Train the session’s pipeline on a dataset.
- Parameters:
dataset – Dataset to train on. Can be: - Path to data folder: “sample_data/regression” - Numpy arrays: (X, y) - Dict: {“X”: X, “y”: y}
plots_visible – Whether to show plots during training.
**kwargs – Additional arguments passed to runner.run().
- Returns:
RunResult with predictions and metrics.
- Raises:
ValueError – If no pipeline was provided to the session.
- property runner: PipelineRunner
Get or create the shared PipelineRunner instance.
The runner is created lazily on first access.
- Returns:
The shared PipelineRunner instance.
- save(path: str | Path) Path[source]
Save the trained session to a bundle file.
- Parameters:
path – Output path for the .n4a bundle file.
- Returns:
Path to the saved bundle file.
- Raises:
ValueError – If session has not been trained.
- nirs4all.explain(model: Dict[str, Any] | str | Path, data: str | Path | ndarray | Dict[str, Any] | SpectroDataset | DatasetConfigs, *, name: str = 'explain_dataset', session: Session | None = None, verbose: int = 1, plots_visible: bool = True, n_samples: int | None = None, explainer_type: str = 'auto', **shap_params: Any) ExplainResult[source]
Generate SHAP explanations for a trained model.
This function provides a simple interface for computing SHAP values to explain model predictions. It supports various SHAP explainer types and generates visualizations.
- Parameters:
model – Trained model specification. Can be: - Prediction dict from
result.bestorresult.top()- Path to exported bundle:"exports/model.n4a"- Path to pipeline config directorydata – Data to explain. Can be: - Path to data folder:
"test_data/"- Numpy array:X_test(n_samples, n_features) - Dict:{"X": X, "metadata": meta}- SpectroDataset instancename – Name for the explanation dataset (for logging). Default: “explain_dataset”
session – Optional Session for resource reuse. If provided, uses the session’s runner.
verbose – Verbosity level (0=quiet, 1=info, 2=debug). Default: 1
plots_visible – Whether to display plots interactively. Default: True
n_samples – Number of background samples for SHAP. If None, uses default (typically 100-200).
explainer_type – SHAP explainer type. Options: - “auto”: Automatically select best explainer - “tree”: TreeExplainer (for tree-based models) - “kernel”: KernelExplainer (model-agnostic) - “deep”: DeepExplainer (for neural networks) - “linear”: LinearExplainer (for linear models) Default: “auto”
**shap_params – Additional SHAP configuration parameters. Common options: - feature_names: List of feature names - background_samples: Number of background samples - max_display: Max features to show in plots
- Returns:
- shap_values: SHAP values array or Explanation object
feature_names: Names/labels of features
base_value: Expected value (baseline prediction)
visualizations: Paths to generated plots
mean_abs_shap: Mean absolute SHAP per feature
top_features: Features sorted by importance
Use
result.get_feature_importance()for importance ranking, orresult.to_dataframe()for pandas DataFrame output.- Return type:
ExplainResult containing
- Raises:
ValueError – If model specification is invalid.
FileNotFoundError – If model bundle or data path doesn’t exist.
ImportError – If SHAP is not installed.
Examples
Explain an exported model:
>>> import nirs4all >>> >>> result = nirs4all.explain( ... model="exports/wheat_model.n4a", ... data=X_test ... ) >>> print(f"Top 5 features: {result.top_features[:5]}") >>> importance = result.get_feature_importance(top_n=10)
Explain using a result from a previous run:
>>> # Training >>> train_result = nirs4all.run(pipeline, train_data) >>> >>> # Explain best model >>> explain_result = nirs4all.explain( ... model=train_result.best, ... data=X_test, ... explainer_type="kernel" ... )
Get SHAP values as DataFrame:
>>> result = nirs4all.explain(model, data) >>> df = result.to_dataframe() >>> df.to_csv("shap_values.csv")
Get per-sample explanations:
>>> result = nirs4all.explain(model, data) >>> sample_0_shap = result.get_sample_explanation(0) >>> for feature, value in list(sample_0_shap.items())[:5]: ... print(f"{feature}: {value:.4f}")
See also
nirs4all.run(): Train a pipelinenirs4all.predict(): Make predictionsnirs4all.api.result.ExplainResult: Result class
- nirs4all.framework(framework_name: str) Callable[[F], F][source]
Decorator to mark a function/class with its framework.
This enables automatic framework detection in the model factory.
- Parameters:
framework_name – Name of the framework (‘tensorflow’, ‘pytorch’, ‘jax’)
- Returns:
Decorator function that adds framework attribute.
Example
>>> @framework('tensorflow') ... def build_cnn(input_shape, params): ... import tensorflow as tf ... # ... build model
- nirs4all.is_gpu_available(backend: str | None = None) bool[source]
Check if GPU is available for the specified backend or any backend.
Results are cached for performance. The first call for each backend will import the framework to check GPU availability.
- Parameters:
backend – Specific backend to check (‘tensorflow’, ‘torch’, ‘jax’), or None to check all available backends.
- Returns:
True if GPU is available for the specified backend(s).
Example
>>> if is_gpu_available('torch'): ... device = 'cuda' ... else: ... device = 'cpu'
- nirs4all.is_tensorflow_available() bool[source]
Check if TensorFlow is installed.
- Returns:
True if TensorFlow is available.
- nirs4all.load_session(path: str | Path) Session[source]
Load a session from a saved bundle file.
- Parameters:
path – Path to .n4a bundle file.
- Returns:
Session ready for prediction.
Example
>>> session = nirs4all.load_session("exports/model.n4a") >>> predictions = session.predict(new_data)
- nirs4all.predict(model: Dict[str, Any] | str | Path, data: str | Path | ndarray | Tuple[ndarray, ...] | Dict[str, Any] | SpectroDataset | DatasetConfigs, *, name: str = 'prediction_dataset', all_predictions: bool = False, session: Session | None = None, verbose: int = 0, **runner_kwargs: Any) PredictResult[source]
Make predictions with a trained model on new data.
This function provides a simple interface for running inference with trained nirs4all pipelines. The model can be specified as a prediction dict from a previous run, or as a path to an exported bundle.
- Parameters:
model – Trained model specification. Can be: - Prediction dict from
result.bestorresult.top()- Path to exported bundle:"exports/model.n4a"- Path to pipeline config directorydata – Data to predict on. Can be: - Path to data folder:
"new_data/"- Numpy array:X_new(n_samples, n_features) - Tuple:(X,)or(X, y)for evaluation - Dict:{"X": X, "metadata": meta}- SpectroDataset instancename – Name for the prediction dataset (for logging). Default: “prediction_dataset”
all_predictions – If True, return predictions from all folds. If False (default), return single aggregated prediction.
session – Optional Session for resource reuse. If provided, uses the session’s runner.
verbose – Verbosity level (0=quiet, 1=info, 2=debug). Default: 0
**runner_kwargs – Additional PipelineRunner parameters. Common options: workspace_path, plots_visible
- Returns:
- y_pred: Predicted values array (n_samples,)
metadata: Additional prediction metadata
model_name: Name of the model used
preprocessing_steps: List of preprocessing steps applied
Use
result.to_dataframe()for pandas DataFrame output.- Return type:
PredictResult containing
- Raises:
ValueError – If model specification is invalid.
FileNotFoundError – If model bundle or data path doesn’t exist.
Examples
Predict from an exported bundle:
>>> import nirs4all >>> >>> result = nirs4all.predict( ... model="exports/wheat_model.n4a", ... data=X_new ... ) >>> print(f"Predictions: {result.values[:5]}")
Predict using a result from a previous run:
>>> # Training >>> train_result = nirs4all.run(pipeline, train_data) >>> >>> # Prediction with best model >>> pred_result = nirs4all.predict( ... model=train_result.best, ... data=X_test ... )
Get all fold predictions:
>>> result = nirs4all.predict( ... model="exports/model.n4a", ... data=X_new, ... all_predictions=True ... ) >>> print(f"Shape: {result.shape}")
Convert to DataFrame:
>>> result = nirs4all.predict(model, data) >>> df = result.to_dataframe() >>> df.to_csv("predictions.csv")
See also
nirs4all.run(): Train a pipelinenirs4all.explain(): Generate SHAP explanationsnirs4all.api.result.PredictResult: Result class
- nirs4all.register_controller(operator_cls: Type[OperatorController])[source]
Decorator to register a controller class.
- nirs4all.retrain(source: Dict[str, Any] | str | Path, data: str | Path | ndarray | Tuple[ndarray, ...] | Dict[str, Any] | SpectroDataset | DatasetConfigs, *, mode: str = 'full', name: str = 'retrain_dataset', new_model: Any | None = None, epochs: int | None = None, session: Session | None = None, verbose: int = 1, save_artifacts: bool = True, **kwargs: Any) RunResult[source]
Retrain a pipeline on new data.
This function enables retraining trained pipelines with various modes, allowing for full retraining, transfer learning, or fine-tuning.
- Parameters:
source – Pipeline source to retrain from. Can be: - Prediction dict from
result.bestorresult.top()- Path to exported bundle:"exports/model.n4a"- Path to pipeline config directorydata – New dataset to train on. Can be: - Path to data folder:
"new_data/"- Numpy arrays:(X, y)- Dict:{"X": X, "y": y}- SpectroDataset instancemode – Retrain mode. Options: - “full”: Train everything from scratch (same pipeline structure) - “transfer”: Use existing preprocessing, train new model - “finetune”: Continue training existing model Default: “full”
name – Name for the retrain dataset (for logging). Default: “retrain_dataset”
new_model – Optional new model for transfer mode. Replaces the original model while keeping preprocessing.
epochs – Optional number of epochs for fine-tuning neural networks.
session – Optional Session for resource reuse. If provided, uses the session’s runner.
verbose – Verbosity level (0=quiet, 1=info, 2=debug). Default: 1
save_artifacts – Whether to save retrained artifacts. Default: True
**kwargs – Additional retraining parameters: - learning_rate: Learning rate for fine-tuning - freeze_layers: List of layers to freeze during fine-tuning - step_modes: Per-step mode overrides (advanced)
- Returns:
predictions: Predictions from the retrained pipeline
per_dataset: Per-dataset execution details
best: Best prediction entry
best_score: Best model’s primary test score
- Return type:
RunResult containing
- Raises:
ValueError – If mode is invalid or source cannot be resolved.
FileNotFoundError – If source references files that don’t exist.
Examples
Full retrain on new data:
>>> import nirs4all >>> >>> # Original training >>> original = nirs4all.run(pipeline, train_data) >>> >>> # Retrain on new data with same pipeline >>> retrained = nirs4all.retrain( ... source=original.best, ... data=new_train_data, ... mode="full" ... ) >>> print(f"Original: {original.best_rmse:.4f}") >>> print(f"Retrained: {retrained.best_rmse:.4f}")
Transfer learning with new model:
>>> from sklearn.ensemble import RandomForestRegressor >>> >>> result = nirs4all.retrain( ... source="exports/pls_model.n4a", ... data=new_data, ... mode="transfer", ... new_model=RandomForestRegressor(n_estimators=100) ... )
Fine-tune a neural network:
>>> result = nirs4all.retrain( ... source="exports/nn_model.n4a", ... data=new_data, ... mode="finetune", ... epochs=10, ... learning_rate=0.0001 ... )
Retrain from an exported bundle:
>>> result = nirs4all.retrain( ... source="exports/wheat_model.n4a", ... data="new_wheat_data/", ... mode="full", ... verbose=2 ... ) >>> result.export("exports/retrained_model.n4a")
See also
nirs4all.run(): Train a pipeline from scratchnirs4all.predict(): Make predictionsnirs4all.pipeline.RetrainMode: Retrain mode enum
- nirs4all.run(pipeline: List[Any] | Dict[str, Any] | str | Path | PipelineConfigs | List[List[Any] | Dict[str, Any] | str | Path | PipelineConfigs], dataset: str | Path | ndarray | Tuple[ndarray, ...] | Dict[str, Any] | SpectroDataset | DatasetConfigs | List[str | Path | ndarray | Tuple[ndarray, ...] | Dict[str, Any] | SpectroDataset | DatasetConfigs], *, name: str = '', session: Session | None = None, verbose: int = 1, save_artifacts: bool = True, save_charts: bool = True, plots_visible: bool = False, random_state: int | None = None, **runner_kwargs: Any) RunResult[source]
Execute a training pipeline on a dataset.
This is the primary entry point for training ML pipelines on NIRS data. It provides a simpler interface than creating PipelineRunner and config objects directly.
- Parameters:
pipeline –
Pipeline definition. Can be: - List of steps (most common):
[MinMaxScaler(), PLSRegression(10)]- Dict with steps:{"steps": [...], "name": "my_pipeline"}- Path to YAML/JSON config file:"configs/my_pipeline.yaml"- PipelineConfigs object (backward compatibility) - List of pipelines:[pipeline1, pipeline2, ...]- eachpipeline is executed independently (cartesian product with datasets)
dataset –
Dataset definition. Can be: - Path to data folder:
"sample_data/regression"- Numpy arrays:(X, y)orXalone - Dict with arrays:{"X": X, "y": y, "metadata": meta}- SpectroDataset instance - List of SpectroDataset instances (multi-dataset) - DatasetConfigs object (backward compatibility) - List of datasets:[dataset1, dataset2, ...]- eachdataset is used with each pipeline (cartesian product)
name – Optional pipeline name for identification and logging. If not provided, a name will be generated.
session – Optional Session object for resource reuse across multiple runs. When provided, shares workspace and configuration.
verbose – Verbosity level (0=quiet, 1=info, 2=debug, 3=trace). Default: 1
save_artifacts – Whether to save binary artifacts (models, transformers). Default: True
save_charts – Whether to save charts and visual outputs. Default: True
plots_visible – Whether to display plots interactively. Default: False
random_state – Random seed for reproducibility. Default: None (no seeding)
**runner_kwargs – Additional PipelineRunner parameters. See PipelineRunner.__init__ for full list. Common options: - workspace_path: Workspace root directory - continue_on_error: Whether to continue on step failures - show_spinner: Whether to show progress spinners - log_file: Whether to write logs to disk - log_format: Output format (“pretty”, “minimal”, “json”) - show_progress_bar: Whether to show progress bars - max_generation_count: Max pipeline combinations (for generators)
- Returns:
- predictions: Predictions object with all pipeline results
per_dataset: Dictionary with per-dataset execution details
best: Best prediction entry (convenience accessor)
best_score: Best model’s primary test score
best_rmse, best_r2, best_accuracy: Score shortcuts
Use
result.top(n=5)to get top N predictions, orresult.export("path.n4a")to export the best model.- Return type:
RunResult containing
- Raises:
ValueError – If pipeline or dataset format is invalid.
FileNotFoundError – If pipeline config or dataset path doesn’t exist.
Examples
Simple usage with list of steps:
>>> import nirs4all >>> from sklearn.preprocessing import MinMaxScaler >>> from sklearn.cross_decomposition import PLSRegression >>> >>> result = nirs4all.run( ... pipeline=[MinMaxScaler(), PLSRegression(10)], ... dataset="sample_data/regression", ... verbose=1 ... ) >>> print(f"Best RMSE: {result.best_rmse:.4f}")
With cross-validation and multiple models:
>>> from sklearn.model_selection import ShuffleSplit >>> >>> result = nirs4all.run( ... pipeline=[ ... MinMaxScaler(), ... ShuffleSplit(n_splits=3), ... {"model": PLSRegression(10)} ... ], ... dataset="sample_data/regression", ... name="PLS_experiment", ... verbose=2, ... save_artifacts=True ... )
Multiple pipelines executed independently:
>>> pipeline_pls = [MinMaxScaler(), PLSRegression(10)] >>> pipeline_rf = [StandardScaler(), RandomForestRegressor()] >>> >>> result = nirs4all.run( ... pipeline=[pipeline_pls, pipeline_rf], # Two independent pipelines ... dataset="sample_data/regression", ... verbose=1 ... ) >>> print(f"Total configs: {result.num_predictions}")
Cartesian product of pipelines × datasets:
>>> pipelines = [pipeline1, pipeline2, pipeline3] >>> datasets = [dataset_a, dataset_b] >>> >>> # Runs 6 combinations: p1×da, p1×db, p2×da, p2×db, p3×da, p3×db >>> result = nirs4all.run( ... pipeline=pipelines, ... dataset=datasets, ... verbose=1 ... )
Using a session for multiple runs:
>>> with nirs4all.session(verbose=1) as s: ... r1 = nirs4all.run(pipeline1, data, session=s) ... r2 = nirs4all.run(pipeline2, data, session=s) ... print(f"Pipeline 1: {r1.best_score:.4f}") ... print(f"Pipeline 2: {r2.best_score:.4f}")
Export the best model:
>>> result = nirs4all.run(pipeline, dataset) >>> result.export("exports/best_model.n4a")
See also
nirs4all.predict(): Make predictions with a trained modelnirs4all.explain(): Generate SHAP explanationsnirs4all.session(): Create execution session for resource reusenirs4all.PipelineRunner: Direct runner access for advanced use
- nirs4all.session(pipeline: List[Any] | None = None, name: str = '', **kwargs: Any) Generator[Session, None, None][source]
Create an execution session context manager.
This is a convenience function that creates a Session and yields it within a context manager block.
- Parameters:
pipeline – Optional pipeline definition for stateful mode.
name – Name for the session/pipeline.
**kwargs – Arguments passed to Session (and ultimately PipelineRunner). Common options: - verbose (int): Verbosity level (0-3). Default: 1 - save_artifacts (bool): Save model artifacts. Default: True - workspace_path (str|Path): Workspace directory. - random_state (int): Random seed for reproducibility.
- Yields:
Session – The active session for use within the block.
- Example (resource sharing):
>>> with nirs4all.session(verbose=2, save_artifacts=True) as s: ... r1 = nirs4all.run(pipeline1, data1, session=s) ... r2 = nirs4all.run(pipeline2, data2, session=s) ... print(f"PLS: {r1.best_score:.4f}, RF: {r2.best_score:.4f}")
- Example (stateful pipeline):
>>> with nirs4all.session(pipeline=my_pipeline, name="Demo") as s: ... result = s.run("sample_data/regression") ... print(f"Best score: {result.best_score:.4f}")