Quickstart¶
Regression¶
import numpy as np
from scikit_opls import OPLS
rng = np.random.default_rng(0)
X = rng.normal(size=(80, 30))
y = X[:, 0] + 0.1 * rng.normal(size=80)
model = OPLS(n_components=1, n_orthogonal=2).fit(X, y)
model.predict(X) # predicted y
model.transform(X) # predictive scores
model.transform_orthogonal(X) # orthogonal scores
model.r2y_, model.rmse_ # training-fit summaries
Two-block O2PLS¶
import numpy as np
from scikit_opls import O2PLS
rng = np.random.default_rng(1)
Z = rng.normal(size=(80, 2))
X = Z @ rng.normal(size=(2, 20)) + 0.1 * rng.normal(size=(80, 20))
Y = Z @ rng.normal(size=(2, 10)) + 0.1 * rng.normal(size=(80, 10))
model = O2PLS(n_components=2, n_x_orthogonal=1, n_y_orthogonal=1).fit(X, Y)
model.predict(X) # predicted joint Y structure, raw Y units
model.predict_x(Y) # predicted joint X structure, raw X units
model.transform(X) # X-side joint scores
model.transform_y(Y) # Y-side joint scores
O2PLS.coef_filtered_ maps scaled, X-orthogonally-filtered X to scaled
predicted Y; it is not a raw-space sklearn coef_ alias. The joint loadings use
the O2PLS orthonormal convention, so joint loadings equal joint weights.
Requested orthogonal components may be truncated with a ConvergenceWarning when
the enlarged preliminary subspace leaves no numerically resolvable block-specific
residual variation, especially when the requested component total approaches a
block's rank or feature dimension.
Choosing n_orthogonal by cross-validation¶
Use scikit-learn's GridSearchCV directly — OPLS has no path structure, so a
dedicated …CV class buys nothing. scoring=None gives out-of-fold R2, which
equals Q2 for OPLS.
from sklearn.model_selection import GridSearchCV
from scikit_opls import OPLS
search = GridSearchCV(
OPLS(n_components=1), {"n_orthogonal": list(range(10))}, cv=7
).fit(X, y)
search.best_params_["n_orthogonal"] # selected count
search.cv_results_["mean_test_score"] # out-of-fold R2/Q2 path
search.best_estimator_.predict(X) # final model refit on all data
Parsimonious selection¶
To prefer the fewest orthogonal components whose score is within a tolerance of
the best, pass a refit callable:
import numpy as np
def parsimonious_refit(cv_results, tol=0.01):
scores = np.asarray(cv_results["mean_test_score"], dtype=float)
counts = np.asarray(cv_results["param_n_orthogonal"], dtype=int)
within = np.flatnonzero(scores >= np.nanmax(scores) - tol)
return int(within[np.argmin(counts[within])])
GridSearchCV(
OPLS(n_components=1), {"n_orthogonal": list(range(10))},
cv=7, refit=parsimonious_refit,
).fit(X, y)
Classification (OPLS-DA)¶
from sklearn.model_selection import GridSearchCV
from scikit_opls import OPLSDA
y_lab = np.where(X[:, 0] > 0, "case", "ctrl")
clf = OPLSDA(n_components=1, n_orthogonal=2).fit(X, y_lab)
clf.predict(X) # class labels
clf.decision_function(X) # raw signed OPLS regression output
# Probabilities via cross-fitted calibration when each class has enough samples
# for the chosen calibration CV split:
from sklearn.calibration import CalibratedClassifierCV
calibrated_clf = CalibratedClassifierCV(clf, cv=5).fit(X, y_lab)
calibrated_clf.predict_proba(X)
# Cross-validated OPLS-DA selection: an int cv is stratified automatically.
GridSearchCV(
OPLSDA(), {"n_orthogonal": list(range(10))}, cv=5, scoring="roc_auc"
).fit(X, y_lab)
Inspection, plotting and validation¶
from scikit_opls.plotting import OPLSScoresDisplay, SPlotDisplay
from scikit_opls.validation import permutation_test
model.vip_ # predictive VIP per feature (lazy)
# Draw score plot (t_pred vs t_ortho). Supports component selection for multi-component PLS
OPLSScoresDisplay.from_estimator(
model, X, y, predictive_component=0, orthogonal_component=0
)
# Draw S-plot (covariance vs correlation) for a specific predictive component
SPlotDisplay.from_estimator(model, X, component=0)
# Permutation significance testing
permutation_test(OPLS(n_orthogonal=2), X, y)
Pipeline support in plotting
Diagnostic plotting displays support OPLS, OPLSDA, pipelines ending in one,
and fitted search meta-estimators exposing best_estimator_ around either shape.
When passing a pipeline, pass raw X as expected by the pipeline. When passing the
final OPLS step directly, pass the already transformed matrix. For pipeline
S-plots, points are in the transformed feature space received by the final OPLS
step.