sklearn-pmml-model is an open source Python library I built and maintain. It adds first-class import support for PMML to scikit-learn, the most widely used machine learning library in Python. With it, models trained in any framework that exports PMML (R, SAS, KNIME, xgboost, and many others) can be loaded into scikit-learn and used as if they were trained there.

Every team picks its own ML framework, every framework has its own model file format, and every post-hoc analysis tool (interpretability, feature importance, partial dependence) tends to support only the framework it was written against. The result is a fragmented ecosystem that slows down development, locks businesses into a single stack, and makes it hard for teams using different tools to collaborate.

The Predictive Model Markup Language (PMML) was designed to solve this. It is a mature XML-based standard for describing almost any kind of predictive model, supported broadly across open-source frameworks (caret, mlr, xgboost) and enterprise tools (SAS Enterprise Miner, KNIME, Cloudera, PEGA). The catch: almost every PMML-supporting framework can only export the format. Importing PMML into Python was conspicuously missing. That is the gap sklearn-pmml-model fills.

The library

sklearn-pmml-model adds import functionality for all major estimator classes in scikit-learn, with an API that mirrors scikit-learn itself. Each estimator is a sub-class of its scikit-learn counterpart, so an imported PMML model is immediately compatible with anything that expects a scikit-learn estimator: pipelines, cross-validation, grid search, third-party explainability libraries.

API design of sklearn-pmml-model
PMML estimators extend both a corresponding scikit-learn estimator and a base class.

In practice, loading a model is a one-liner:

from sklearn_pmml_model.ensemble import PMMLForestClassifier

clf = PMMLForestClassifier(pmml="models/randomForest.pmml")
clf.predict(Xte)
clf.score(Xte, yte)

From the user’s perspective, there is no PMML in sight after the constructor. Some PMML constructs do not map cleanly onto what scikit-learn supports out of the box (categorical features, multi-split decision trees), and the library transparently bridges those gaps under the hood.

Performance

Because sklearn-pmml-model parses PMML directly into native scikit-learn estimators rather than running it through a separate scoring engine, it benefits from scikit-learn’s heavily optimized prediction code. Across two common UCI datasets, this comes out at roughly 1× to 8× faster than the closest alternative (PyPMML) on common model types:

Linear Naive Bayes Decision Tree Random Forest Gradient Boosting
Wine 3.23× 1.40× 5.80× 1.09× 1.05×
Breast Cancer 0.91× 1.36× 8.47× 1.34× 2.33×

This matters for any workflow that requires many predictions: partial dependence plots, feature contribution methods, sensitivity analysis, anything that perturbs and re-evaluates a model thousands of times. That is the use case I cared most about.

Open source

Source code lives on GitHub, pre-compiled wheels for Linux, macOS and Windows are on PyPI, and a continuous integration workflow runs the full integration test suite on every supported platform for every contribution.

Since release, the library has been downloaded over 880,000 times on PyPI. It lets anyone switch ML environments effortlessly, and lets any scikit-learn-based project instantly support models trained in a wide range of other frameworks, languages and enterprise tools.