skpro
skpro is an open-source Python library for supervised probabilistic prediction.
skpro is an open-source Python library for supervised probabilistic prediction, providing scikit-learn–like and scikit-base–compatible interfaces for uncertainty-aware modeling. It enables probabilistic regression, interval and quantile prediction, full distribution forecasting, and survival analysis using a unified and extensible API.
The library is developed by the sktime community and is designed to integrate seamlessly with the broader Python machine learning ecosystem.
Purpose
Many machine learning workflows focus solely on point predictions, ignoring uncertainty. skpro addresses this limitation by making probabilistic prediction a first-class concept, allowing models to output distributions, intervals, and quantiles rather than single values.
Its goal is to improve decision-making, risk awareness, and model evaluation by providing robust tools for uncertainty quantification across tabular and time-to-event prediction tasks.
Core Capabilities
Probabilistic tabular regression
Interval, quantile, and full distribution prediction
Time-to-event and survival prediction
Probabilistic performance metrics
Reductions that convert classical regressors into probabilistic models
Pipeline construction and hyperparameter tuning using probabilistic metrics
Symbolic probability distributions with pandas-compatible interfaces
Probabilistic Prediction
Tabular Regression
skpro supports probabilistic regression for tabular data, enabling predictions in multiple modes:
Mean and variance
Prediction intervals
Quantiles
Full predictive distributions
This allows users to quantify uncertainty directly alongside predictions.
Survival and Time-to-Event Prediction
The library includes tools for probabilistic survival analysis, producing instance-level survival distributions rather than single risk scores or point estimates.
Model Reductions and Pipelines
Probabilistic Reductions
skpro provides reductions that wrap classical scikit-learn regressors and extend them with probabilistic outputs, including:
Bootstrap-based methods
Conformal prediction techniques
Residual-based probabilistic modeling
This enables uncertainty-aware modeling without abandoning familiar estimators.
Pipelines and Composite Models
Models can be combined into pipelines and composite estimators, with full support for tuning and evaluation using probabilistic performance metrics.
Probability Distributions
Symbolic Distribution Objects
skpro includes symbolic probability distributions with:
Explicit mathematical representations
pandas DataFrame–based value domains
pandas-like interfaces for manipulation and inspection
These distributions can be used consistently across prediction, evaluation, and downstream analysis.
Evaluation and Metrics
Probabilistic Performance Metrics
The library provides a comprehensive set of metrics for evaluating probabilistic predictions, including:
Pinball loss
Empirical coverage
Continuous Ranked Probability Score (CRPS)
Survival-specific loss functions
This ensures realistic and uncertainty-aware model assessment.
Ecosystem Compatibility
Integration with scikit-learn and sktime
skpro is fully compatible with scikit-learn and sktime, enabling hybrid workflows such as:
Building probabilistic forecasters from deterministic regressors
Combining time series and tabular probabilistic models
Reusing existing estimators with added uncertainty modeling
Interoperability with External Libraries
The project curates interfaces to third-party probabilistic libraries such as cyclic-boosting, MAPIE, and ngboost.
Use Cases
Uncertainty-aware regression modeling
Risk-sensitive decision support systems
Survival and time-to-event analysis
Probabilistic forecasting pipelines
Research in uncertainty quantification and model evaluation
Open Source
skpro is released under the BSD 3-Clause License and developed openly by the sktime community. It follows modern open-source practices, includes extensive documentation and tutorials, and welcomes contributions of all kinds.
GC.OS supports skpro as an open-source project that advances interoperable, reliable, and transparent probabilistic machine learning.