/ the_codex / correlation

Correlation

Q: What's an example of a positively correlated parlay?

QB passing yards Over + team total Over . If the QB throws for 280 yards, the offense likely had productive drives, scoring is up, team total goes Over. True probability of both ≈ 35-40% (not the 20-25% you'd get by multiplying independent probabilities). Other examples: ① WR receptions Over + WR receiving yards Over (positively correlated by definition); ② team to score first + team to win (positive correlation, especially in NFL where first-score teams win ~58%); ③ team to lead at halftime + team to win (~70% positive correlation in NBA); ④ player anytime TD + team total Over (positive correlation when offense scores). Books embed these correlations into SGP pricing — sometimes accurately, sometimes not.

Q: How do books price correlation?

Two main approaches: ① Conditional probability models — book's risk system computes the probability of leg B given leg A is true, then multiplies. For QB Over + Team Total Over: P(team total over | QB over) ≈ 0.85, vs. P(team total over) ≈ 0.50. The conditional probability gets used in pricing. ② Simulation-based pricing — Monte Carlo models simulate the full game, count joint outcomes, use empirical frequencies. Books increasingly use this approach for SGP pricing. Either way: the customer sees a price that includes the correlation penalty. The 'fair' price for a positively correlated parlay is higher than the simple product; the book's correlated-adjusted price often eats that premium and adds vig.

Q: What's a negatively correlated parlay?

Two bets where one winning reduces the other's probability. Example: Lakers ML + Celtics +7 . If Lakers win, Celtics +7 still might hit (Lakers win by 1-6). But if Lakers win convincingly, the spread doesn't cover. Negative correlation reduces joint probability below the independent product. Books often price negatively correlated parlays at full vig because the joint probability is lower than naive math suggests — making the parlay sucker bait for retail bettors who don't realize they're paying for a structure with even lower probability of cashing. Sharps avoid these unless they have specific model insight into the dependence structure.

Q: Can the bettor profit by finding mispriced correlation?

Yes, but it requires modeling the correlation structure better than the book. Concrete strategy: identify SGP combinations where book's correlation model under-prices the joint dependence. Example: a sharp model might say P(QB Over | RB Over) = 0.65 (positive correlation in pass-heavy offense), while book prices as if P = 0.55. The customer takes the SGP at the under-priced correlation. The math works only in specific market segments — NFL same-game parlay combinations involving game-script-dependent props (passing volume, target shares, rushing distribution) are most commonly mispriced. NBA SGPs and MLB SGPs are usually more accurately priced.

Q: How does correlation affect arbitrage?

Arbitrage requires uncorrelated outcomes — you bet both sides of a single market across different books. The legs are perfectly negatively correlated (one wins, the other loses by definition). Correlation issues arise in 'middling' (betting same side at two different spreads) and in cross-market hedging (betting Lakers ML at one book, Lakers spread at another, hoping the result lands favorably). Correlated hedges aren't true arbitrage — they're conviction-weighted positions with correlated payoff distributions. Sharp middlers track historical landing-frequency around key numbers; they aren't computing fair price, they're estimating the probability the spread lands in their middle.

Q: What does correlation tell us about variance?

Heavily. A portfolio of bets with positive correlation has higher variance than the sum of individual variances; negative correlation has lower variance. For a sharp running 10 simultaneous NFL bets, if all 10 are positively correlated (e.g., 10 bets on home favorites Week 1), variance scales with N², not N. A Sunday with 8 of 10 going against him produces a 20%+ bankroll swing. Sharp bankroll management therefore favors diversification: bet across sports, across game times, across bet types. The 'all underdogs in primetime' bettor running positive correlation across his slate is unknowingly accepting roughly 2-3x the variance of a diversified bettor with the same total stake — for the same expected return.

/ˌkɒrəˈleɪʃən/ · corr · joint dependence

Graph and data analytics — correlation modeling separates sharp parlay bettors from retail — Image: Pixabay Content License

The independence assumption — and why it fails

Beginner parlay math goes like this: leg 1 wins 50% of the time, leg 2 wins 50% of the time, so the parlay wins 25% of the time. Multiply the probabilities, done. This works only when the two legs are independent — when knowing the outcome of leg 1 tells you nothing about leg 2.

Independence is rare in sports betting. Most natural parlay combinations involve game-script dependence: if the offense moves the ball, multiple offensive props correlate. If a team blows out the opponent, spread, total, and ML all correlate. If a pitcher gets shelled, his strikeout prop and the team's run total correlate. Books know this; they price for it. The customer often doesn't.

Pearson correlation — the basic measure

# Pearson correlation coefficient r
r = cov(X, Y) / (sd(X) × sd(Y))

# Range: -1 (perfectly negative) to +1 (perfectly positive)

# In sports betting context
r = +1.0: identical bets (e.g., Lakers ML and Lakers ML)
r = +0.7: strong positive (QB passing yards Over + WR1 receiving yards Over)
r = +0.4: moderate positive (Team total Over + game total Over)
r =  0.0: independent (Lakers ML + Cowboys ML, different sports)
r = -0.3: moderate negative (Lakers ML + Lakers spread covered)
r = -1.0: opposing (Lakers win + Celtics win on same game)

How correlation breaks parlay math

Conditional probability formula: P(A and B) = P(A) × P(B|A)

If A and B are independent: P(B|A) = P(B), and joint = P(A) × P(B).

If A and B are positively correlated: P(B|A) > P(B), and joint > P(A) × P(B).

If A and B are negatively correlated: P(B|A) < P(B), and joint < P(A) × P(B).

# Example: QB Over 250 yards + Team Total Over 24.5
P(QB Over)             = 0.50
P(Team Over)           = 0.50
P(Team Over | QB Over) = 0.85   # conditional probability

# Joint probability (true)
joint = 0.50 × 0.85 = 0.425  # 42.5%

# Naive independent calculation
naive = 0.50 × 0.50 = 0.25   # 25%

# Difference = 17.5 percentage points
# True parlay should pay ~ +135 (decimal 2.35)
# Naive math says +300 (decimal 4.00)
# Book typically prices ~+150 to +200 — capturing correlation

SGP pricing — books embed correlation

Scatter plot data — sportsbooks use empirical correlation matrices to price same-game parlays — Image: Pixabay Content License

Modern US sportsbooks (DraftKings, FanDuel, BetMGM, Caesars) all run Monte Carlo simulation pricers for same-game parlays. Process:

Model the underlying game with statistical distributions (e.g., QB yards ~ N(265, 60), team total ~ N(24, 8)).
Apply empirical correlation matrix derived from historical games.
Simulate 10,000-100,000 games.
Count joint outcomes to estimate joint probability.
Convert to price, add vig (typically 4-8% per leg, accumulated).

The customer sees a final price. Behind the scenes, the price reflects the book's best estimate of correlation. If the book's correlation matrix is accurate, the customer gets a fair (vig-adjusted) price. If it's miscalibrated — the model says r = 0.5 but the true value is r = 0.7 — the customer overpays or underpays accordingly.

Where books miscalibrate correlation

Common book weaknesses:

Game-script-dependent positive correlation — when one team falls behind, they pass more, the QB's volume increases, multiple offensive props correlate. Books often under-price this in trailing-team scenarios.
Garbage-time corrections — late-game scoring patterns in blowouts vs. close games. Books model average distributions; actual conditional outcomes differ.
Weather correlation — high wind reduces passing yards, increases rushing, lowers totals. Books adjust each prop individually for weather but often miss the cross-prop correlations.
Pace correlations — fast-pace teams in basketball produce both more total possessions and more total scoring; both correlate positively. Books model pace but sometimes mis-attribute the correlation strength.
Star-player dependence — if Tatum scores 40, Celtics likely win, team total likely covers. Books often slightly under-price the star-player-driven correlation.

Negative correlation — the hidden tax

Negatively correlated parlays look great on paper but pay less than expected because joint probability is lower than naive multiplication suggests.

# Lakers ML + Lakers spread doesn't cover
P(Lakers ML)                       = 0.60
P(Lakers spread covered)           = 0.50
P(spread covered | Lakers ML)      = 0.83  # positive if they win
P(spread not covered | Lakers ML)  = 0.17

# Trying to parlay Lakers ML + Lakers DOESN'T cover
P(joint) = P(LAL ML) × P(spread NOT covered | LAL ML)
         = 0.60 × 0.17 = 0.102  # 10.2%, fair odds ~+880

# Naive calculation
P(naive) = 0.60 × 0.50 = 0.30  # 30%, naive odds +233

# Books offering "+450 boost" on this look attractive
# But actual fair price is +880 — bettor is paying massive hidden vig

The recreational bettor sees "+450 boost" and feels they're getting value. The true fair price is +880. The book is collecting roughly 38% effective vig. The bettor doesn't realize because they can't compute the conditional probability.

Correlation in advanced strategies

Correlated parlays as a sharp strategy

Some sharps build parlays of positively correlated legs when their model says the correlation is stronger than the book's. The pitch: book prices QB Over + Team Over at +180 (assuming r = 0.5). Sharp model says r = 0.7, true joint probability = 0.45, fair price = +122. Sharp takes the parlay at +180 with documented edge.

This works only when: ① sharp has correlation data the book doesn't; ② sharp has accurate model of underlying distributions; ③ book's correlation model is materially miscalibrated. Most pros find this strategy works on specific NFL same-game prop combinations and rarely on other sports.

Decorrelation as risk management

For straight bettors running multiple bets on a Sunday, betting decorrelated games (different teams, different game scripts) smooths variance. Industry advice: don't have all your bets on home favorites Week 1 because correlation amplifies a bad week. Pro shops explicitly diversify across game times and game-script types to keep portfolio correlation low.

Empirical correlation in major sports


Bet pair	Sport	Empirical r	Book pricing typically
QB passing yards Over + Team total Over	NFL	+0.65	Slight under-correlation
RB rushing yards Over + Team win	NFL	+0.40	Roughly accurate
WR1 yards Over + WR1 TD scored	NFL	+0.55	Under-correlated (sharp opportunity)
Player points Over + Team win	NBA	+0.25	Roughly accurate
Pitcher SO Over + Pitcher win	MLB	+0.35	Roughly accurate
Both teams to score + Over total	Soccer	+0.50	Slightly over-correlated
Team to lead at half + Team to win	NBA	+0.65	Accurate; books price tightly
First scorer + Team to win	NFL	+0.20	Slight under-correlation

The "sharp opportunity" cells are where sharp bettors have documented edge betting these correlated combinations as SGPs. They require active correlation modeling and access to historical data. Not for casual bettors.

Common correlation mistakes

Treating SGP legs as independent — multiplying independent probabilities will overstate parlay probability if positive correlation exists.
Ignoring negative correlation entirely — bettors miss that some combinations are hidden lottery tickets.
Assuming book correlation is accurate — books are usually close but not perfect; small miscalibrations create opportunities.
Single-game bankroll concentration — multiple bets on the same game create high positive correlation in your portfolio.
Using Pearson r where Spearman or copula is needed — Pearson assumes linear relationships; sports outcomes often have nonlinear dependence (tail correlation in blowouts).

Sources & further reading

Embrechts, Paul, McNeil, Alexander, Straumann, Daniel. "Correlation and dependence in risk management: properties and pitfalls." Risk Management: Value at Risk and Beyond, 2002.
Spanos, Aris. "Probability theory and statistical inference." Econometric Theory, 2019.
Spann, Martin & Skiera, Bernd. "Sports forecasting: a comparison of the forecast accuracy of prediction markets, betting odds and tipsters." Journal of Forecasting, 2009.
Buchdahl, Joseph. "Hidden correlation in same game parlays." Football Data Blog, 2022.
DraftKings Engineering Blog — "Pricing same-game parlays with Monte Carlo simulation" (technical methodology).

FAQ

What's an example of a positively correlated parlay?

QB passing yards Over + team total Over. If the QB throws for 280 yards, the offense likely had productive drives, scoring is up, team total goes Over. True probability of both ≈ 35-40% (not the 20-25% you'd get by multiplying independent probabilities). Other examples: ① WR receptions Over + WR receiving yards Over (positively correlated by definition); ② team to score first + team to win (positive correlation, especially in NFL where first-score teams win ~58%); ③ team to lead at halftime + team to win (~70% positive correlation in NBA); ④ player anytime TD + team total Over (positive correlation when offense scores). Books embed these correlations into SGP pricing — sometimes accurately, sometimes not.

How do books price correlation?

Two main approaches: ① Conditional probability models — book's risk system computes the probability of leg B given leg A is true, then multiplies. For QB Over + Team Total Over: P(team total over | QB over) ≈ 0.85, vs. P(team total over) ≈ 0.50. The conditional probability gets used in pricing. ② Simulation-based pricing — Monte Carlo models simulate the full game, count joint outcomes, use empirical frequencies. Books increasingly use this approach for SGP pricing. Either way: the customer sees a price that includes the correlation penalty. The 'fair' price for a positively correlated parlay is higher than the simple product; the book's correlated-adjusted price often eats that premium and adds vig.

What's a negatively correlated parlay?

Two bets where one winning reduces the other's probability. Example: Lakers ML + Celtics +7. If Lakers win, Celtics +7 still might hit (Lakers win by 1-6). But if Lakers win convincingly, the spread doesn't cover. Negative correlation reduces joint probability below the independent product. Books often price negatively correlated parlays at full vig because the joint probability is lower than naive math suggests — making the parlay sucker bait for retail bettors who don't realize they're paying for a structure with even lower probability of cashing. Sharps avoid these unless they have specific model insight into the dependence structure.

Can the bettor profit by finding mispriced correlation?

Yes, but it requires modeling the correlation structure better than the book. Concrete strategy: identify SGP combinations where book's correlation model under-prices the joint dependence. Example: a sharp model might say P(QB Over | RB Over) = 0.65 (positive correlation in pass-heavy offense), while book prices as if P = 0.55. The customer takes the SGP at the under-priced correlation. The math works only in specific market segments — NFL same-game parlay combinations involving game-script-dependent props (passing volume, target shares, rushing distribution) are most commonly mispriced. NBA SGPs and MLB SGPs are usually more accurately priced.

How does correlation affect arbitrage?

Arbitrage requires uncorrelated outcomes — you bet both sides of a single market across different books. The legs are perfectly negatively correlated (one wins, the other loses by definition). Correlation issues arise in 'middling' (betting same side at two different spreads) and in cross-market hedging (betting Lakers ML at one book, Lakers spread at another, hoping the result lands favorably). Correlated hedges aren't true arbitrage — they're conviction-weighted positions with correlated payoff distributions. Sharp middlers track historical landing-frequency around key numbers; they aren't computing fair price, they're estimating the probability the spread lands in their middle.

What does correlation tell us about variance?

Heavily. A portfolio of bets with positive correlation has higher variance than the sum of individual variances; negative correlation has lower variance. For a sharp running 10 simultaneous NFL bets, if all 10 are positively correlated (e.g., 10 bets on home favorites Week 1), variance scales with N², not N. A Sunday with 8 of 10 going against him produces a 20%+ bankroll swing. Sharp bankroll management therefore favors diversification: bet across sports, across game times, across bet types. The 'all underdogs in primetime' bettor running positive correlation across his slate is unknowingly accepting roughly 2-3x the variance of a diversified bettor with the same total stake — for the same expected return.

// published 2026-05-23 · updated 2026-05-23 · OddsCipher Desk