nirs4all.data.synthetic.exporter module
Dataset export utilities for synthetic NIRS data.
This module provides tools for exporting synthetic datasets to various file formats and folder structures compatible with nirs4all loaders.
- Key Features:
Export to CSV files (single or multi-file format)
Export to nirs4all standard folder structure (Xcal, Ycal, Xval, Yval)
Export with metadata (sample IDs, groups, etc.)
Generate CSV variations for loader testing
Example
>>> from nirs4all.data.synthetic import SyntheticDatasetBuilder, DatasetExporter
>>>
>>> builder = SyntheticDatasetBuilder(n_samples=1000, random_state=42)
>>> X, y = builder.build_arrays()
>>>
>>> exporter = DatasetExporter()
>>> path = exporter.to_folder(
... "output/synthetic_data",
... X, y,
... train_ratio=0.8,
... wavelengths=builder.state._wavelengths
... )
- class nirs4all.data.synthetic.exporter.CSVVariationGenerator[source]
Bases:
objectGenerate CSV files with various format variations for loader testing.
This class creates CSV files with different delimiters, encodings, header formats, and other variations to test the robustness of CSV loaders.
- base_exporter
DatasetExporter for actual file writing.
Example
>>> generator = CSVVariationGenerator() >>> >>> # Generate all variations >>> paths = generator.generate_all_variations( ... "test_data", ... X, y, ... wavelengths=wavelengths ... ) >>> >>> # Generate specific variation >>> path = generator.with_semicolon_delimiter( ... "data_semicolon", ... X, y ... )
- as_fragmented(path: str | Path, X: ndarray, y: ndarray, *, wavelengths: ndarray | None = None, train_ratio: float = 0.8, random_state: int | None = None) Path[source]
Create fragmented dataset with multiple small files.
- as_single_file(path: str | Path, X: ndarray, y: ndarray, *, wavelengths: ndarray | None = None, train_ratio: float = 0.8, random_state: int | None = None) Path[source]
Create single CSV file with all data and partition column.
- generate_all_variations(base_path: str | Path, X: ndarray, y: ndarray, *, wavelengths: ndarray | None = None, train_ratio: float = 0.8, random_state: int | None = None) Dict[str, Path][source]
Generate CSV files with all format variations.
Creates multiple versions of the dataset with different CSV format options for comprehensive loader testing.
- Parameters:
base_path – Base output folder path.
X – Feature matrix.
y – Target values.
wavelengths – Optional wavelength values.
train_ratio – Train/test split ratio.
random_state – Random seed.
- Returns:
Dictionary mapping variation name to created path.
Example
>>> paths = generator.generate_all_variations( ... "test_variations", ... X, y, ... random_state=42 ... ) >>> print(paths.keys())
- with_comma_delimiter(path: str | Path, X: ndarray, y: ndarray, *, wavelengths: ndarray | None = None, train_ratio: float = 0.8, random_state: int | None = None) Path[source]
Create CSV with comma delimiter.
- with_precision(path: str | Path, X: ndarray, y: ndarray, *, wavelengths: ndarray | None = None, train_ratio: float = 0.8, random_state: int | None = None, precision: int = 6) Path[source]
Create CSV with specified floating point precision.
- with_row_index(path: str | Path, X: ndarray, y: ndarray, *, wavelengths: ndarray | None = None, train_ratio: float = 0.8, random_state: int | None = None) Path[source]
Create CSV with row index column.
- with_semicolon_delimiter(path: str | Path, X: ndarray, y: ndarray, *, wavelengths: ndarray | None = None, train_ratio: float = 0.8, random_state: int | None = None) Path[source]
Create CSV with semicolon delimiter (nirs4all default).
- class nirs4all.data.synthetic.exporter.DatasetExporter(config: ExportConfig | None = None)[source]
Bases:
objectExport synthetic datasets to various file formats.
This class provides methods for exporting synthetic NIRS datasets to files and folders compatible with nirs4all’s data loaders.
- config
Export configuration settings.
- Parameters:
config – Optional ExportConfig. Uses defaults if None.
Example
>>> exporter = DatasetExporter() >>> >>> # Export to standard folder structure >>> path = exporter.to_folder( ... "output/data", ... X, y, ... train_ratio=0.8, ... wavelengths=wavelengths ... ) >>> >>> # Export to single CSV >>> path = exporter.to_csv( ... "output/all_data.csv", ... X, y, ... wavelengths=wavelengths ... )
- to_csv(path: str | Path, X: ndarray, y: ndarray, *, wavelengths: ndarray | None = None, metadata: Dict[str, ndarray] | None = None, include_targets: bool = True) Path[source]
Export dataset to a single CSV file.
Creates a CSV file with features (and optionally targets) combined.
- Parameters:
path – Output file path.
X – Feature matrix (n_samples, n_features).
y – Target values (n_samples,) or (n_samples, n_targets).
wavelengths – Optional wavelength values for column headers.
metadata – Optional dict of metadata arrays.
include_targets – Whether to include target column(s).
- Returns:
Path to created file.
Example
>>> exporter.to_csv("data.csv", X, y, wavelengths=wavelengths)
- to_folder(path: str | Path, X: ndarray, y: ndarray, *, train_ratio: float = 0.8, wavelengths: ndarray | None = None, metadata: Dict[str, ndarray] | None = None, random_state: int | None = None, format: Literal['standard', 'single', 'fragmented'] | None = None) Path[source]
Export dataset to a folder structure.
Creates a folder with CSV files compatible with nirs4all’s DatasetConfigs loader.
- Parameters:
path – Output folder path.
X – Feature matrix (n_samples, n_features).
y – Target values (n_samples,) or (n_samples, n_targets).
train_ratio – Proportion for training set.
wavelengths – Optional wavelength values for column headers.
metadata – Optional dict of metadata arrays (same length as X).
random_state – Random seed for train/test split.
format – Override config format for this export.
- Returns:
Path to created folder.
- Raises:
ValueError – If X and y have incompatible shapes.
ImportError – If pandas is not available.
Example
>>> exporter.to_folder( ... "data/synthetic", ... X, y, ... train_ratio=0.8, ... wavelengths=np.arange(1000, 2500, 2) ... )
- to_numpy(path: str | Path, X: ndarray, y: ndarray, *, wavelengths: ndarray | None = None, compressed: bool = False) Path[source]
Export dataset to numpy .npy or .npz format.
- Parameters:
path – Output file path (without extension).
X – Feature matrix (n_samples, n_features).
y – Target values.
wavelengths – Optional wavelength values.
compressed – Whether to use compressed format (.npz).
- Returns:
Path to created file.
Example
>>> exporter.to_numpy("data", X, y, compressed=True)
- class nirs4all.data.synthetic.exporter.ExportConfig(format: Literal['standard', 'single', 'fragmented'] = 'standard', separator: str = ';', float_precision: int = 6, include_headers: bool = True, include_index: bool = False, compression: Literal['gzip', 'zip'] | None = None, file_extension: str = '.csv')[source]
Bases:
objectConfiguration for dataset export.
- format
Export format (‘standard’, ‘single’, ‘fragmented’). - ‘standard’: Separate Xcal, Ycal, Xval, Yval files. - ‘single’: All data in one file with partition column. - ‘fragmented’: Multiple small files (for loader testing).
- Type:
Literal[‘standard’, ‘single’, ‘fragmented’]
- compression
Optional compression (‘gzip’, ‘zip’, None).
- Type:
Literal[‘gzip’, ‘zip’] | None
- nirs4all.data.synthetic.exporter.export_to_csv(path: str | Path, X: ndarray, y: ndarray, *, wavelengths: ndarray | None = None) Path[source]
Quick function to export synthetic data to single CSV.
- Parameters:
path – Output file path.
X – Feature matrix.
y – Target values.
wavelengths – Optional wavelength values.
- Returns:
Path to created file.
Example
>>> path = export_to_csv("data.csv", X, y)
- nirs4all.data.synthetic.exporter.export_to_folder(path: str | Path, X: ndarray, y: ndarray, *, train_ratio: float = 0.8, wavelengths: ndarray | None = None, format: Literal['standard', 'single', 'fragmented'] = 'standard', random_state: int | None = None) Path[source]
Quick function to export synthetic data to folder.
Convenience function for simple export use cases.
- Parameters:
path – Output folder path.
X – Feature matrix.
y – Target values.
train_ratio – Train/test split ratio.
wavelengths – Optional wavelength values.
format – Export format.
random_state – Random seed.
- Returns:
Path to created folder.
Example
>>> path = export_to_folder( ... "data/synthetic", ... X, y, ... train_ratio=0.8, ... wavelengths=wavelengths ... )