Skip to content

Commit d587802

Browse files
authored
Series7
1 parent 1479eb5 commit d587802

File tree

1 file changed

+308
-0
lines changed

1 file changed

+308
-0
lines changed

ipynb/Series7.ipynb

Lines changed: 308 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,308 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"<div style=\"text-align: right\"><i>Peter Norvig<br>May 2025</i></div>\n",
8+
"\n",
9+
"# Seven-Game Series?\n",
10+
"\n",
11+
"This time of year the basketball playoffs are in full swing. I have a pet peeve: analysts who say *\"These are two evenly matched teams. I expect the series will go seven games.\"* Is that really true? If each game is a 50/50 tossup, how often will this result in a seven-game series? How does the home-court advantage come into play? What if one team is slightly better? This notebook examines these questions. \n",
12+
"\n",
13+
"## Vocabulary of Concepts\n",
14+
"\n",
15+
"Here are the concepts I want to model, and the way I will represent them:\n",
16+
"\n",
17+
"- **Series**: Two teams play games until one wins four games. A game cannot be a tie. So a series can take anywhere from 4 to 7 games.\n",
18+
"- **Seeded team**: The team that is seeded higher (due to regular season wins) gets to host 4 of the potential 7 games (games 1, 2, 5, and 7).\n",
19+
"- **Series outcome**: A pair of integers, for example `(4, 3)` means that the seeded team has 4 wins and the other team has 3.\n",
20+
"- **Outcome distribution**: The class `Dist` will represent a probability distribution over possible series outcomes. \n",
21+
"- **Game win probability**: We assume there is a given fixed probability, *p*, that the seeded team will win a single game on a neutral court.\n",
22+
"- **Home-court advantage**: However, there is a home-court advantage of probability *h*. So the seeded team has probability *p* + *h* of winning each of the games 1, 2, 5, and 7, and probability *p* - *h* of winning each of the other games.\n",
23+
"\n",
24+
"\n",
25+
"Below is the class `Dist`. For example, `Dist({(4, 3): 0.5, (3, 4): 0.5})` represents a series that goes seven games, and each team has an equal chance to win. *Note*: `Dist` inherits from `Counter`, so you can add two dists together, and I define a `__mul__` method so you can multiply a dist by a probability."
26+
]
27+
},
28+
{
29+
"cell_type": "code",
30+
"execution_count": 1,
31+
"metadata": {},
32+
"outputs": [],
33+
"source": [
34+
"from collections import Counter\n",
35+
"\n",
36+
"class Dist(Counter):\n",
37+
" \"\"\"A Probability Distribution of {outcome: probability} results.\"\"\"\n",
38+
" def __mul__(self, weight: float) -> 'Dist':\n",
39+
" \"\"\"You can multiply a Dist by a scalar weight.\"\"\"\n",
40+
" return Dist({outcome: p * weight for outcome, p in self.items()})"
41+
]
42+
},
43+
{
44+
"cell_type": "markdown",
45+
"metadata": {},
46+
"source": [
47+
"The function `series_results` returns a distribution of possible series outcomes. The parameters are:\n",
48+
"- *p*, the seeded team's single-game win probability,\n",
49+
"- *h*, the home-court advantage probability\n",
50+
"- *home*, a 7-character string saying which games are home or away for the seeded team\n",
51+
"- *W* and *L*, the number of wins and losses for the seeded team so far in the series (default 0 of each)."
52+
]
53+
},
54+
{
55+
"cell_type": "code",
56+
"execution_count": 2,
57+
"metadata": {},
58+
"outputs": [],
59+
"source": [
60+
"def series_results(p=0.50, h=0.10, home='HHAAHAH', W=0, L=0) -> Dist:\n",
61+
" \"\"\"Return {(win, loss): probability, ...} for all possible outcomes of the series, given\n",
62+
" the single-game win probability for the seeded team, `p`, and the home-court advantage, `h`.\"\"\"\n",
63+
" def results(W: int, L: int) -> Dist:\n",
64+
" if W == 4 or L == 4:\n",
65+
" return Dist({(W, L): 1.0})\n",
66+
" else:\n",
67+
" p1 = p + (h if home[W + L] == 'H' else -h) # Probability of winning this one game\n",
68+
" return Dist(results(W + 1, L) * p1 +\n",
69+
" results(W, L + 1) * (1 - p1))\n",
70+
" return results(W, L)"
71+
]
72+
},
73+
{
74+
"cell_type": "markdown",
75+
"metadata": {},
76+
"source": [
77+
"Let's look at the results for a truly even series, where each game is 50/50, and there is no home-court advantage:"
78+
]
79+
},
80+
{
81+
"cell_type": "code",
82+
"execution_count": 3,
83+
"metadata": {},
84+
"outputs": [
85+
{
86+
"data": {
87+
"text/plain": [
88+
"Dist({(4, 2): 0.15625,\n",
89+
" (4, 3): 0.15625,\n",
90+
" (3, 4): 0.15625,\n",
91+
" (2, 4): 0.15625,\n",
92+
" (4, 1): 0.125,\n",
93+
" (1, 4): 0.125,\n",
94+
" (4, 0): 0.0625,\n",
95+
" (0, 4): 0.0625})"
96+
]
97+
},
98+
"execution_count": 3,
99+
"metadata": {},
100+
"output_type": "execute_result"
101+
}
102+
],
103+
"source": [
104+
"series_results(p=0.50, h=0)"
105+
]
106+
},
107+
{
108+
"cell_type": "markdown",
109+
"metadata": {},
110+
"source": [
111+
"The four most common outcomes are equally likely: either 6 or 7 games, with either team winning. It is easy to see that 6 or 7 games is equally likely: after 5 games, if the series isn't over, it must be 3-2 or 2-3. Half the time the \"3\" team will win game 6 (resulting in a 6 game series) and half the time they will lose (resulting in a 7 game series).\n",
112+
"\n",
113+
"Now I would like to make a **table** of results, for various winning percentages *p*. For each *p* I want to see \n",
114+
"- The probability for each possible series outcome (4-3, 4-2, etc.).\n",
115+
"- The probability for each possible series length (4, 5, 6, or 7 games).\n",
116+
"- The probability that the seeded team wins the series.\n",
117+
"\n",
118+
"*Note*: I will consider cases where the higher-seeded team has a game win probability less than 50% (as sometimes happens when a player is injured)."
119+
]
120+
},
121+
{
122+
"cell_type": "code",
123+
"execution_count": 10,
124+
"metadata": {},
125+
"outputs": [],
126+
"source": [
127+
"from numpy import arange\n",
128+
"from typing import Sequence\n",
129+
"\n",
130+
"def series_results_table(h=0.0, pcts=arange(0.48, 0.70, 0.02)):\n",
131+
" \"\"\"What happens in the series for various values of `p` and a giv en value of `h`?\"\"\"\n",
132+
" outcomes = [(4, 3), (4, 2), (4, 1), (4, 0), (0, 4), (1, 4), (2, 4), (3, 4)]\n",
133+
" bar = f'----+' + '-' * 5 * len(pcts)\n",
134+
" print(f' | Game Win Percentage (± home-court advantage = {h:.0%})')\n",
135+
" print(row('', pcts))\n",
136+
" print(bar)\n",
137+
" for (W, L) in outcomes:\n",
138+
" results = [series_results(p, h)[W, L] for p in pcts]\n",
139+
" print(row(f'{W}-{L}', results))\n",
140+
" print(bar)\n",
141+
" for N in [7, 6, 5, 4]:\n",
142+
" results = [series_results(p, h)[4, N-4] + series_results(p, h)[N-4, 4] for p in pcts]\n",
143+
" print(row(N, results))\n",
144+
" print(bar)\n",
145+
" print(row('Win', [sum(series_results(p, h)[W, L] for W, L in outcomes if W == 4) for p in pcts]))\n",
146+
"\n",
147+
"def row(name, pcts: Sequence[float]) -> str:\n",
148+
" \"\"\"Create a string representing a row in the table.\"\"\"\n",
149+
" return f'{name:^3} | ' + ' '.join(f'{p*100:4.1f}' for p in pcts)"
150+
]
151+
},
152+
{
153+
"cell_type": "markdown",
154+
"metadata": {},
155+
"source": [
156+
"Here is the table when there is no home-court advantage:"
157+
]
158+
},
159+
{
160+
"cell_type": "code",
161+
"execution_count": 11,
162+
"metadata": {},
163+
"outputs": [
164+
{
165+
"name": "stdout",
166+
"output_type": "stream",
167+
"text": [
168+
" | Game Win Percentage (± home-court advantage = 0%)\n",
169+
" | 48.0 50.0 52.0 54.0 56.0 58.0 60.0 62.0 64.0 66.0 68.0\n",
170+
"----+-------------------------------------------------------\n",
171+
"4-3 | 14.9 15.6 16.2 16.6 16.8 16.8 16.6 16.2 15.7 14.9 14.0\n",
172+
"4-2 | 14.4 15.6 16.8 18.0 19.0 20.0 20.7 21.3 21.7 21.9 21.9\n",
173+
"4-1 | 11.0 12.5 14.0 15.6 17.3 19.0 20.7 22.5 24.2 25.8 27.4\n",
174+
"4-0 | 5.3 6.2 7.3 8.5 9.8 11.3 13.0 14.8 16.8 19.0 21.4\n",
175+
"0-4 | 7.3 6.2 5.3 4.5 3.7 3.1 2.6 2.1 1.7 1.3 1.0\n",
176+
"1-4 | 14.0 12.5 11.0 9.7 8.4 7.2 6.1 5.2 4.3 3.5 2.9\n",
177+
"2-4 | 16.8 15.6 14.4 13.1 11.8 10.5 9.2 8.0 6.9 5.8 4.8\n",
178+
"3-4 | 16.2 15.6 14.9 14.1 13.2 12.1 11.1 9.9 8.8 7.7 6.6\n",
179+
"----+-------------------------------------------------------\n",
180+
" 7 | 31.1 31.2 31.1 30.7 29.9 28.9 27.6 26.2 24.5 22.6 20.6\n",
181+
" 6 | 31.2 31.2 31.2 31.0 30.8 30.4 30.0 29.4 28.6 27.8 26.7\n",
182+
" 5 | 25.1 25.0 25.1 25.3 25.7 26.2 26.9 27.6 28.5 29.3 30.2\n",
183+
" 4 | 12.6 12.5 12.6 13.0 13.6 14.4 15.5 16.9 18.5 20.3 22.4\n",
184+
"----+-------------------------------------------------------\n",
185+
"Win | 45.6 50.0 54.4 58.7 62.9 67.1 71.0 74.8 78.3 81.6 84.7\n"
186+
]
187+
}
188+
],
189+
"source": [
190+
"series_results_table(h=0.0)"
191+
]
192+
},
193+
{
194+
"cell_type": "markdown",
195+
"metadata": {},
196+
"source": [
197+
"Note that a 6-game series is most likely, except when *p* is exactly 50% (in which case 6- and 7-game series are equally likely), or when *p* is 66% or more (in which case a 5-game series is more likely).\n",
198+
"\n",
199+
"In recent years the home-court advantage has been about 5%:"
200+
]
201+
},
202+
{
203+
"cell_type": "code",
204+
"execution_count": 12,
205+
"metadata": {},
206+
"outputs": [
207+
{
208+
"name": "stdout",
209+
"output_type": "stream",
210+
"text": [
211+
" | Game Win Percentage (± home-court advantage = 5%)\n",
212+
" | 48.0 50.0 52.0 54.0 56.0 58.0 60.0 62.0 64.0 66.0 68.0\n",
213+
"----+-------------------------------------------------------\n",
214+
"4-3 | 16.6 17.3 17.8 18.2 18.4 18.3 18.1 17.6 17.0 16.1 15.1\n",
215+
"4-2 | 13.2 14.4 15.5 16.6 17.6 18.4 19.1 19.6 20.0 20.1 20.0\n",
216+
"4-1 | 12.2 13.7 15.4 17.1 18.9 20.7 22.5 24.4 26.2 27.9 29.6\n",
217+
"4-0 | 5.2 6.1 7.2 8.4 9.7 11.1 12.8 14.6 16.6 18.8 21.2\n",
218+
"0-4 | 7.2 6.1 5.2 4.4 3.7 3.0 2.5 2.0 1.6 1.3 1.0\n",
219+
"1-4 | 12.7 11.2 9.9 8.6 7.4 6.3 5.3 4.5 3.7 3.0 2.4\n",
220+
"2-4 | 18.2 16.9 15.5 14.1 12.7 11.3 9.9 8.6 7.4 6.3 5.2\n",
221+
"3-4 | 14.7 14.1 13.5 12.6 11.7 10.8 9.7 8.7 7.6 6.6 5.6\n",
222+
"----+-------------------------------------------------------\n",
223+
" 7 | 31.3 31.4 31.3 30.8 30.1 29.1 27.8 26.3 24.6 22.7 20.7\n",
224+
" 6 | 31.5 31.3 31.1 30.7 30.3 29.7 29.1 28.3 27.4 26.4 25.3\n",
225+
" 5 | 24.9 25.0 25.3 25.7 26.3 27.0 27.9 28.8 29.8 30.9 31.9\n",
226+
" 4 | 12.4 12.3 12.4 12.7 13.3 14.2 15.3 16.6 18.2 20.0 22.1\n",
227+
"----+-------------------------------------------------------\n",
228+
"Win | 47.2 51.6 56.0 60.3 64.5 68.6 72.5 76.2 79.7 82.9 85.8\n"
229+
]
230+
}
231+
],
232+
"source": [
233+
"series_results_table(h=0.05)"
234+
]
235+
},
236+
{
237+
"cell_type": "markdown",
238+
"metadata": {},
239+
"source": [
240+
"Now, when the seeded team has a game win probability in the range 50% to 54%, the other team is favored to win game 6, making a 7-game series more likely. Overall, a home-court advantage of 5% gives the seeded team about a 1.6% better chance of winning the series.\n",
241+
"\n",
242+
"Some people think a home-court advantage of as much as 13% is reasonable:"
243+
]
244+
},
245+
{
246+
"cell_type": "code",
247+
"execution_count": 7,
248+
"metadata": {},
249+
"outputs": [
250+
{
251+
"name": "stdout",
252+
"output_type": "stream",
253+
"text": [
254+
" | Game Win Percentage (± home court advantage = 13%)\n",
255+
" | 48.0 50.0 52.0 54.0 56.0 58.0 60.0 62.0 64.0 66.0\n",
256+
"----+--------------------------------------------------\n",
257+
"4-3 | 19.8 20.5 21.1 21.4 21.5 21.4 21.0 20.3 19.4 18.3\n",
258+
"4-2 | 11.5 12.6 13.6 14.6 15.5 16.3 16.9 17.4 17.6 17.7\n",
259+
"4-1 | 13.9 15.7 17.6 19.5 21.6 23.6 25.7 27.8 29.9 31.9\n",
260+
"4-0 | 4.6 5.4 6.4 7.5 8.8 10.2 11.8 13.5 15.4 17.5\n",
261+
"0-4 | 6.4 5.4 4.6 3.8 3.1 2.5 2.0 1.6 1.3 1.0\n",
262+
"1-4 | 10.5 9.2 8.0 6.8 5.8 4.8 4.0 3.2 2.6 2.0\n",
263+
"2-4 | 20.7 19.1 17.4 15.7 14.1 12.4 10.9 9.4 8.0 6.7\n",
264+
"3-4 | 12.7 12.1 11.4 10.5 9.7 8.7 7.7 6.8 5.8 4.9\n",
265+
"----+--------------------------------------------------\n",
266+
" 7 | 32.5 32.6 32.5 32.0 31.2 30.1 28.7 27.1 25.2 23.2\n",
267+
" 6 | 32.1 31.6 31.0 30.3 29.6 28.7 27.8 26.7 25.6 24.4\n",
268+
" 5 | 24.4 24.9 25.5 26.3 27.3 28.5 29.7 31.1 32.5 33.9\n",
269+
" 4 | 11.0 10.9 11.0 11.3 11.9 12.8 13.8 15.1 16.7 18.5\n",
270+
"----+--------------------------------------------------\n",
271+
"Win | 49.7 54.2 58.7 63.1 67.4 71.5 75.4 79.0 82.4 85.5\n"
272+
]
273+
}
274+
],
275+
"source": [
276+
"series_results_table(h=0.13)"
277+
]
278+
},
279+
{
280+
"cell_type": "markdown",
281+
"metadata": {},
282+
"source": [
283+
"This means that a 50/50 team with a 13% home-court advantage wins a series about as often as a 52% team with no home-court advantage."
284+
]
285+
}
286+
],
287+
"metadata": {
288+
"kernelspec": {
289+
"display_name": "Python 3 (ipykernel)",
290+
"language": "python",
291+
"name": "python3"
292+
},
293+
"language_info": {
294+
"codemirror_mode": {
295+
"name": "ipython",
296+
"version": 3
297+
},
298+
"file_extension": ".py",
299+
"mimetype": "text/x-python",
300+
"name": "python",
301+
"nbconvert_exporter": "python",
302+
"pygments_lexer": "ipython3",
303+
"version": "3.13.1"
304+
}
305+
},
306+
"nbformat": 4,
307+
"nbformat_minor": 4
308+
}

0 commit comments

Comments
 (0)