Founding pricing available nowPricing review on May 1Early customers keep their price for life

The KS Statistic — Detecting Distribution Shift in Trading Models

When a model stops behaving as expected, the KS statistic is often the first metric to say so.

The KS statistic in trading model evaluation measures how different the distribution of returns from one period is to another. If a model behaved one way in its training data and a different way in recent markets, the KS statistic picks that up — making it one of the more sensitive tools for detecting when a model has started to drift from its validated behaviour.

What the Kolmogorov-Smirnov test actually measures

The Kolmogorov-Smirnov test compares two probability distributions by finding the maximum vertical distance between their cumulative distribution functions. The larger that gap, the more the two distributions diverge — and the higher the KS statistic.

In a model validation context, the two distributions being compared are typically a model's baseline return distribution (established during evaluation) and its more recent return distribution. If the model was well-constructed and market conditions remain broadly similar, these should look statistically alike. If the model was overfit to historical data, or if market conditions have shifted materially, the distributions will differ — and the KS statistic will reflect that.

A KS statistic close to zero indicates the distributions are similar: the model is producing results that look statistically consistent across periods. A high KS statistic is a warning: something has changed, either in the model's internal behaviour, the market regime it is encountering, or both. Neither high nor low should be read in isolation — the KS statistic is most informative when compared over time and against other distribution metrics.

KS statistic vs other distribution similarity tests

darwintIQ tracks several distribution similarity metrics alongside the KS statistic, including Jensen-Shannon Divergence, Wasserstein Distance, PSI, and Mutual Information. Each approaches the same underlying question — has this model's behaviour changed? — from a different mathematical angle, with different sensitivity characteristics.

The KS statistic is non-parametric, meaning it makes no assumptions about the shape of the underlying distributions. This is an advantage when working with trading returns, which are rarely normally distributed and often exhibit skew and fat tails. However, the KS statistic is most sensitive to differences near the centre of a distribution. For detecting shifts in the extremes — tail behaviour, the worst losses — Wasserstein Distance tends to be more informative, as it accounts for the full shape of both distributions.

Using multiple metrics in combination, as darwintIQ does, produces a more complete picture than any single measure alone. A model that triggers elevated readings across KS statistic, JSD, and PSI simultaneously is sending a much stronger signal than one that shows a single elevated metric. Convergent evidence from multiple tests is far harder to dismiss as statistical noise.

How to interpret the KS statistic in darwintIQ

In darwintIQ's evaluation framework, the KS statistic is calculated as part of the continuous model assessment that runs on a rolling 4-hour window. This means the metric reflects current market conditions rather than a fixed snapshot from months ago, giving traders a live view of whether a model's behaviour remains consistent.

When reviewing a model in the Trader Detail view, the distribution similarity metrics — including the KS statistic — help you assess whether the model is performing in line with its historical pattern. A model with a low KS statistic is demonstrating that its return distribution has remained stable over time. One with an elevated KS statistic may still be generating profits in recent periods, but that profitability sits on less certain ground, because the distribution of results has shifted.

This metric is particularly useful when read alongside the Robustness Score and Stability Score, which capture related but distinct dimensions of model reliability. A model that scores well on robustness and stability but shows an elevated KS statistic is worth investigating — the metrics are telling different parts of the same story.

Final thoughts

The KS statistic is a non-parametric test of distributional similarity. Applied to trading model evaluation, it measures whether a model's return distribution has shifted between periods — a reliable early indicator of overfitting or regime sensitivity. darwintIQ includes the KS statistic as part of a suite of distribution-based evaluation metrics, ensuring that model assessments capture structural changes in behaviour rather than relying solely on raw performance figures.