We had a model that performed well, but couldn’t explain why. Feature importance was inconsistent, and we suspected overfitting.
Ran SHAP across stratified validation slices. Suddenly the picture became clear: a handful of features were carrying the load, and the rest were noise.
Now SHAP runs are standard before anything gets versioned. If I can’t explain what’s driving the prediction, it doesn’t ship.
[[ML]] [[Serendipity]]