Skip to content

Backtesting VaR

Backtesting checks whether a VaR model is honest by comparing predicted VaR to realised P&L. A day on which the actual loss exceeds the VaR is an exception (or violation). For a correct 99% VaR, exceptions should occur about 1% of the time, so over 250 trading days you expect roughly 2.5. The hit ratio (exceptions divided by days) should match the model’s tail probability. Too many exceptions means the model understates risk; far too few means it is needlessly conservative. The Basel framework formalises this with a traffic-light test: a green zone of up to 4 exceptions in 250 days, an amber zone of 5 to 9 that raises the capital multiplier, and a red zone of 10 or more that can trigger model rejection.

Try it yourself

Backtesting VaR — counting exceptions

A correct VaR is breached on a known fraction of days, p = 1 − α. An exception is a day whose loss exceeds the −VaR line. Count them, compare to the expected p·T, and the Kupiec test asks whether the gap is just luck. Push the model below true risk and exceptions pile up.

Exceptions N vs expected p·T3 vs 2.5
495k247k+0k+247k+495k−VaR = −US$279kTrading day (1 … 250)Daily P&L (US$)−VaR thresholdException (loss > VaR)
Hit ratio N/T 1.20%Promised rate p 1.00%VaR threshold US$279,156
Confidence level α
Sample size T250 days
Model vs true volatility1.00×
At 1.00×: model matches true volatility (correctly specified).
Kupiec LRuc 0.09 vs 3.841
PASS
Basel zone GREEN (3 in 250)
3 exceptions against 2.5 expected is within sampling noise, so the model passes the unconditional-coverage test. Kupiec checks only the count, not whether breaches cluster in time.

Why it matters

A weather forecaster who says "10% chance of rain" should be right about 1 day in 10 over a long run. VaR is the same promise about losses, and backtesting audits it after the fact. Count the days the loss broke through the VaR line: roughly the right number means the model is calibrated, too many means it is dangerously optimistic, almost none means it is wasting capital. Counting alone (unconditional coverage) is the first check; whether the exceptions cluster is the second, sharper one.

Formulas

Expected number of exceptions
E[N]=(1α)TE[N] = (1-\alpha)\,T
Over TT days a correct VaR at confidence α\alpha produces about (1α)T(1-\alpha)T exceptions. For 99% VaR over 250 days, E[N]=0.01×250=2.5E[N] = 0.01 \times 250 = 2.5.
Hit (indicator) sequence
It=1{Lt>VaRα,t}I_t = \mathbf{1}\{\, L_t > \mathrm{VaR}_{\alpha,t} \,\}
It=1I_t = 1 on an exception day and 0 otherwise. A good model has the right average number of ones (correct coverage) and ones that do not bunch together (independence).

Worked examples

Scenario

A bank’s 99% one-day VaR model produced 9 exceptions over the last 250 trading days. In which Basel traffic-light zone does this fall, and what follows?

Solution

Nine exceptions lands in the amber (yellow) zone (5 to 9), where 2.5 were expected. The supervisor increases the capital multiplier (from 3 toward 4) and scrutinises the model. The result is statistically concerning because the model is producing roughly 3.6 times the expected violations, a sign it understates tail risk, though amber is a warning rather than outright rejection (the red zone, 10 or more).

Scenario

In January 2012 JPMorgan’s Chief Investment Office breached its VaR limit on four consecutive days as the "London Whale" credit positions grew. On 30 January the CIO switched to a new VaR model that immediately cut reported VaR roughly in half. How should a backtester read this sequence?

Solution

Four straight exceptions are a clear backtesting red flag: under a correct 99% VaR consecutive breaches are very unlikely, signalling both poor coverage and clustering. Rather than de-risk, the desk changed the model so the number fell, which masks the exposure rather than removing it. The replacement model was later found to contain spreadsheet and implementation errors and was withdrawn, and the trade ultimately lost over US$6 billion. The lesson is that a VaR drop driven by a model change, not a position change, must be challenged, and that exceptions are a signal to investigate the book, not to retune the model.

Common mistakes

  • A model with zero exceptions is the best model. Far too few exceptions means VaR is too conservative and ties up excess capital; the goal is calibration, not the absence of violations.
  • Backtesting needs only the count of exceptions. The number (unconditional coverage) is one test; whether exceptions cluster in time (independence) is a separate and important check.
  • One exception proves the model is broken. A single violation says little; you assess the model statistically over many days against the expected rate, not on any single day.
  • The Basel zones are arbitrary thresholds. They are calibrated to the binomial distribution of exceptions, balancing the chance of wrongly penalising a good model against missing a bad one.
  • A model change that lowers VaR has fixed the risk. Cutting reported VaR by switching models, as JPMorgan’s CIO did in 2012, lowers the number without reducing the position; backtesting exceptions should trigger scrutiny of the book, not a recalibration that hides it.

Revision bullets

  • Exception (violation): a day whose loss exceeds the VaR
  • Correct 99% VaR yields about 1% exceptions (about 2.5 in 250 days)
  • Hit ratio = exceptions / days should match the tail probability
  • Too many exceptions = understated risk; too few = over-conservative
  • Basel traffic light: green 0-4, amber 5-9 (higher multiplier), red 10+ in 250 days

Quick check

For a correctly calibrated 99% daily VaR, how many exceptions do you expect over 250 trading days?

A VaR model that produces almost no exceptions over several years is most likely

Connected topics

Sources

  1. Jorion (2007), Ch. 6
    Jorion, P. Value at Risk: The New Benchmark for Managing Financial Risk. 3rd ed. McGraw-Hill, 2007.
    Chapter 6 covers VaR backtesting, exceptions, and the Basel traffic-light approach.
  2. Basel Committee on Banking Supervision. Supervisory Framework for the Use of "Backtesting" in Conjunction with the Internal Models Approach to Market Risk Capital Requirements. Bank for International Settlements, 1996.
    Defines the green, amber, and red traffic-light zones for exception counts.
  3. US Senate PSI (2013), JPMorgan Whale Trades
    U.S. Senate Permanent Subcommittee on Investigations. JPMorgan Chase Whale Trades: A Case History of Derivatives Risks and Abuses. 2013.
    Documents the CIO breaching its VaR limit four days running in January 2012 and adopting a flawed new VaR model that roughly halved reported VaR.
How to cite this page
Dr. Phil's Quant Lab. (2026). Backtesting VaR. Derivatives Atlas. https://phucnguyenvan.com/concept/frm-backtesting