|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "<div style=\"text-align: right\"><i>Peter Norvig<br>May 2025</i></div>\n", |
| 8 | + "\n", |
| 9 | + "# Seven-Game Series?\n", |
| 10 | + "\n", |
| 11 | + "This time of year the basketball playoffs are in full swing. I have a pet peeve: analysts who say *\"These are two evenly matched teams. I expect the series will go seven games.\"* Is that really true? If each game is a 50/50 tossup, how often will this result in a seven-game series? How does the home-court advantage come into play? What if one team is slightly better? This notebook examines these questions. \n", |
| 12 | + "\n", |
| 13 | + "## Vocabulary of Concepts\n", |
| 14 | + "\n", |
| 15 | + "Here are the concepts I want to model, and the way I will represent them:\n", |
| 16 | + "\n", |
| 17 | + "- **Series**: Two teams play games until one wins four games. A game cannot be a tie. So a series can take anywhere from 4 to 7 games.\n", |
| 18 | + "- **Seeded team**: The team that is seeded higher (due to regular season wins) gets to host 4 of the potential 7 games (games 1, 2, 5, and 7).\n", |
| 19 | + "- **Series outcome**: A pair of integers, for example `(4, 3)` means that the seeded team has 4 wins and the other team has 3.\n", |
| 20 | + "- **Outcome distribution**: The class `Dist` will represent a probability distribution over possible series outcomes. \n", |
| 21 | + "- **Game win probability**: We assume there is a given fixed probability, *p*, that the seeded team will win a single game on a neutral court.\n", |
| 22 | + "- **Home-court advantage**: However, there is a home-court advantage of probability *h*. So the seeded team has probability *p* + *h* of winning each of the games 1, 2, 5, and 7, and probability *p* - *h* of winning each of the other games.\n", |
| 23 | + "\n", |
| 24 | + "\n", |
| 25 | + "Below is the class `Dist`. For example, `Dist({(4, 3): 0.5, (3, 4): 0.5})` represents a series that goes seven games, and each team has an equal chance to win. *Note*: `Dist` inherits from `Counter`, so you can add two dists together, and I define a `__mul__` method so you can multiply a dist by a probability." |
| 26 | + ] |
| 27 | + }, |
| 28 | + { |
| 29 | + "cell_type": "code", |
| 30 | + "execution_count": 1, |
| 31 | + "metadata": {}, |
| 32 | + "outputs": [], |
| 33 | + "source": [ |
| 34 | + "from collections import Counter\n", |
| 35 | + "\n", |
| 36 | + "class Dist(Counter):\n", |
| 37 | + " \"\"\"A Probability Distribution of {outcome: probability} results.\"\"\"\n", |
| 38 | + " def __mul__(self, weight: float) -> 'Dist':\n", |
| 39 | + " \"\"\"You can multiply a Dist by a scalar weight.\"\"\"\n", |
| 40 | + " return Dist({outcome: p * weight for outcome, p in self.items()})" |
| 41 | + ] |
| 42 | + }, |
| 43 | + { |
| 44 | + "cell_type": "markdown", |
| 45 | + "metadata": {}, |
| 46 | + "source": [ |
| 47 | + "The function `series_results` returns a distribution of possible series outcomes. The parameters are:\n", |
| 48 | + "- *p*, the seeded team's single-game win probability,\n", |
| 49 | + "- *h*, the home-court advantage probability\n", |
| 50 | + "- *home*, a 7-character string saying which games are home or away for the seeded team\n", |
| 51 | + "- *W* and *L*, the number of wins and losses for the seeded team so far in the series (default 0 of each)." |
| 52 | + ] |
| 53 | + }, |
| 54 | + { |
| 55 | + "cell_type": "code", |
| 56 | + "execution_count": 2, |
| 57 | + "metadata": {}, |
| 58 | + "outputs": [], |
| 59 | + "source": [ |
| 60 | + "def series_results(p=0.50, h=0.10, home='HHAAHAH', W=0, L=0) -> Dist:\n", |
| 61 | + " \"\"\"Return {(win, loss): probability, ...} for all possible outcomes of the series, given\n", |
| 62 | + " the single-game win probability for the seeded team, `p`, and the home-court advantage, `h`.\"\"\"\n", |
| 63 | + " def results(W: int, L: int) -> Dist:\n", |
| 64 | + " if W == 4 or L == 4:\n", |
| 65 | + " return Dist({(W, L): 1.0})\n", |
| 66 | + " else:\n", |
| 67 | + " p1 = p + (h if home[W + L] == 'H' else -h) # Probability of winning this one game\n", |
| 68 | + " return Dist(results(W + 1, L) * p1 +\n", |
| 69 | + " results(W, L + 1) * (1 - p1))\n", |
| 70 | + " return results(W, L)" |
| 71 | + ] |
| 72 | + }, |
| 73 | + { |
| 74 | + "cell_type": "markdown", |
| 75 | + "metadata": {}, |
| 76 | + "source": [ |
| 77 | + "Let's look at the results for a truly even series, where each game is 50/50, and there is no home-court advantage:" |
| 78 | + ] |
| 79 | + }, |
| 80 | + { |
| 81 | + "cell_type": "code", |
| 82 | + "execution_count": 3, |
| 83 | + "metadata": {}, |
| 84 | + "outputs": [ |
| 85 | + { |
| 86 | + "data": { |
| 87 | + "text/plain": [ |
| 88 | + "Dist({(4, 2): 0.15625,\n", |
| 89 | + " (4, 3): 0.15625,\n", |
| 90 | + " (3, 4): 0.15625,\n", |
| 91 | + " (2, 4): 0.15625,\n", |
| 92 | + " (4, 1): 0.125,\n", |
| 93 | + " (1, 4): 0.125,\n", |
| 94 | + " (4, 0): 0.0625,\n", |
| 95 | + " (0, 4): 0.0625})" |
| 96 | + ] |
| 97 | + }, |
| 98 | + "execution_count": 3, |
| 99 | + "metadata": {}, |
| 100 | + "output_type": "execute_result" |
| 101 | + } |
| 102 | + ], |
| 103 | + "source": [ |
| 104 | + "series_results(p=0.50, h=0)" |
| 105 | + ] |
| 106 | + }, |
| 107 | + { |
| 108 | + "cell_type": "markdown", |
| 109 | + "metadata": {}, |
| 110 | + "source": [ |
| 111 | + "The four most common outcomes are equally likely: either 6 or 7 games, with either team winning. It is easy to see that 6 or 7 games is equally likely: after 5 games, if the series isn't over, it must be 3-2 or 2-3. Half the time the \"3\" team will win game 6 (resulting in a 6 game series) and half the time they will lose (resulting in a 7 game series).\n", |
| 112 | + "\n", |
| 113 | + "Now I would like to make a **table** of results, for various winning percentages *p*. For each *p* I want to see \n", |
| 114 | + "- The probability for each possible series outcome (4-3, 4-2, etc.).\n", |
| 115 | + "- The probability for each possible series length (4, 5, 6, or 7 games).\n", |
| 116 | + "- The probability that the seeded team wins the series.\n", |
| 117 | + "\n", |
| 118 | + "*Note*: I will consider cases where the higher-seeded team has a game win probability less than 50% (as sometimes happens when a player is injured)." |
| 119 | + ] |
| 120 | + }, |
| 121 | + { |
| 122 | + "cell_type": "code", |
| 123 | + "execution_count": 10, |
| 124 | + "metadata": {}, |
| 125 | + "outputs": [], |
| 126 | + "source": [ |
| 127 | + "from numpy import arange\n", |
| 128 | + "from typing import Sequence\n", |
| 129 | + "\n", |
| 130 | + "def series_results_table(h=0.0, pcts=arange(0.48, 0.70, 0.02)):\n", |
| 131 | + " \"\"\"What happens in the series for various values of `p` and a giv en value of `h`?\"\"\"\n", |
| 132 | + " outcomes = [(4, 3), (4, 2), (4, 1), (4, 0), (0, 4), (1, 4), (2, 4), (3, 4)]\n", |
| 133 | + " bar = f'----+' + '-' * 5 * len(pcts)\n", |
| 134 | + " print(f' | Game Win Percentage (± home-court advantage = {h:.0%})')\n", |
| 135 | + " print(row('', pcts))\n", |
| 136 | + " print(bar)\n", |
| 137 | + " for (W, L) in outcomes:\n", |
| 138 | + " results = [series_results(p, h)[W, L] for p in pcts]\n", |
| 139 | + " print(row(f'{W}-{L}', results))\n", |
| 140 | + " print(bar)\n", |
| 141 | + " for N in [7, 6, 5, 4]:\n", |
| 142 | + " results = [series_results(p, h)[4, N-4] + series_results(p, h)[N-4, 4] for p in pcts]\n", |
| 143 | + " print(row(N, results))\n", |
| 144 | + " print(bar)\n", |
| 145 | + " print(row('Win', [sum(series_results(p, h)[W, L] for W, L in outcomes if W == 4) for p in pcts]))\n", |
| 146 | + "\n", |
| 147 | + "def row(name, pcts: Sequence[float]) -> str:\n", |
| 148 | + " \"\"\"Create a string representing a row in the table.\"\"\"\n", |
| 149 | + " return f'{name:^3} | ' + ' '.join(f'{p*100:4.1f}' for p in pcts)" |
| 150 | + ] |
| 151 | + }, |
| 152 | + { |
| 153 | + "cell_type": "markdown", |
| 154 | + "metadata": {}, |
| 155 | + "source": [ |
| 156 | + "Here is the table when there is no home-court advantage:" |
| 157 | + ] |
| 158 | + }, |
| 159 | + { |
| 160 | + "cell_type": "code", |
| 161 | + "execution_count": 11, |
| 162 | + "metadata": {}, |
| 163 | + "outputs": [ |
| 164 | + { |
| 165 | + "name": "stdout", |
| 166 | + "output_type": "stream", |
| 167 | + "text": [ |
| 168 | + " | Game Win Percentage (± home-court advantage = 0%)\n", |
| 169 | + " | 48.0 50.0 52.0 54.0 56.0 58.0 60.0 62.0 64.0 66.0 68.0\n", |
| 170 | + "----+-------------------------------------------------------\n", |
| 171 | + "4-3 | 14.9 15.6 16.2 16.6 16.8 16.8 16.6 16.2 15.7 14.9 14.0\n", |
| 172 | + "4-2 | 14.4 15.6 16.8 18.0 19.0 20.0 20.7 21.3 21.7 21.9 21.9\n", |
| 173 | + "4-1 | 11.0 12.5 14.0 15.6 17.3 19.0 20.7 22.5 24.2 25.8 27.4\n", |
| 174 | + "4-0 | 5.3 6.2 7.3 8.5 9.8 11.3 13.0 14.8 16.8 19.0 21.4\n", |
| 175 | + "0-4 | 7.3 6.2 5.3 4.5 3.7 3.1 2.6 2.1 1.7 1.3 1.0\n", |
| 176 | + "1-4 | 14.0 12.5 11.0 9.7 8.4 7.2 6.1 5.2 4.3 3.5 2.9\n", |
| 177 | + "2-4 | 16.8 15.6 14.4 13.1 11.8 10.5 9.2 8.0 6.9 5.8 4.8\n", |
| 178 | + "3-4 | 16.2 15.6 14.9 14.1 13.2 12.1 11.1 9.9 8.8 7.7 6.6\n", |
| 179 | + "----+-------------------------------------------------------\n", |
| 180 | + " 7 | 31.1 31.2 31.1 30.7 29.9 28.9 27.6 26.2 24.5 22.6 20.6\n", |
| 181 | + " 6 | 31.2 31.2 31.2 31.0 30.8 30.4 30.0 29.4 28.6 27.8 26.7\n", |
| 182 | + " 5 | 25.1 25.0 25.1 25.3 25.7 26.2 26.9 27.6 28.5 29.3 30.2\n", |
| 183 | + " 4 | 12.6 12.5 12.6 13.0 13.6 14.4 15.5 16.9 18.5 20.3 22.4\n", |
| 184 | + "----+-------------------------------------------------------\n", |
| 185 | + "Win | 45.6 50.0 54.4 58.7 62.9 67.1 71.0 74.8 78.3 81.6 84.7\n" |
| 186 | + ] |
| 187 | + } |
| 188 | + ], |
| 189 | + "source": [ |
| 190 | + "series_results_table(h=0.0)" |
| 191 | + ] |
| 192 | + }, |
| 193 | + { |
| 194 | + "cell_type": "markdown", |
| 195 | + "metadata": {}, |
| 196 | + "source": [ |
| 197 | + "Note that a 6-game series is most likely, except when *p* is exactly 50% (in which case 6- and 7-game series are equally likely), or when *p* is 66% or more (in which case a 5-game series is more likely).\n", |
| 198 | + "\n", |
| 199 | + "In recent years the home-court advantage has been about 5%:" |
| 200 | + ] |
| 201 | + }, |
| 202 | + { |
| 203 | + "cell_type": "code", |
| 204 | + "execution_count": 12, |
| 205 | + "metadata": {}, |
| 206 | + "outputs": [ |
| 207 | + { |
| 208 | + "name": "stdout", |
| 209 | + "output_type": "stream", |
| 210 | + "text": [ |
| 211 | + " | Game Win Percentage (± home-court advantage = 5%)\n", |
| 212 | + " | 48.0 50.0 52.0 54.0 56.0 58.0 60.0 62.0 64.0 66.0 68.0\n", |
| 213 | + "----+-------------------------------------------------------\n", |
| 214 | + "4-3 | 16.6 17.3 17.8 18.2 18.4 18.3 18.1 17.6 17.0 16.1 15.1\n", |
| 215 | + "4-2 | 13.2 14.4 15.5 16.6 17.6 18.4 19.1 19.6 20.0 20.1 20.0\n", |
| 216 | + "4-1 | 12.2 13.7 15.4 17.1 18.9 20.7 22.5 24.4 26.2 27.9 29.6\n", |
| 217 | + "4-0 | 5.2 6.1 7.2 8.4 9.7 11.1 12.8 14.6 16.6 18.8 21.2\n", |
| 218 | + "0-4 | 7.2 6.1 5.2 4.4 3.7 3.0 2.5 2.0 1.6 1.3 1.0\n", |
| 219 | + "1-4 | 12.7 11.2 9.9 8.6 7.4 6.3 5.3 4.5 3.7 3.0 2.4\n", |
| 220 | + "2-4 | 18.2 16.9 15.5 14.1 12.7 11.3 9.9 8.6 7.4 6.3 5.2\n", |
| 221 | + "3-4 | 14.7 14.1 13.5 12.6 11.7 10.8 9.7 8.7 7.6 6.6 5.6\n", |
| 222 | + "----+-------------------------------------------------------\n", |
| 223 | + " 7 | 31.3 31.4 31.3 30.8 30.1 29.1 27.8 26.3 24.6 22.7 20.7\n", |
| 224 | + " 6 | 31.5 31.3 31.1 30.7 30.3 29.7 29.1 28.3 27.4 26.4 25.3\n", |
| 225 | + " 5 | 24.9 25.0 25.3 25.7 26.3 27.0 27.9 28.8 29.8 30.9 31.9\n", |
| 226 | + " 4 | 12.4 12.3 12.4 12.7 13.3 14.2 15.3 16.6 18.2 20.0 22.1\n", |
| 227 | + "----+-------------------------------------------------------\n", |
| 228 | + "Win | 47.2 51.6 56.0 60.3 64.5 68.6 72.5 76.2 79.7 82.9 85.8\n" |
| 229 | + ] |
| 230 | + } |
| 231 | + ], |
| 232 | + "source": [ |
| 233 | + "series_results_table(h=0.05)" |
| 234 | + ] |
| 235 | + }, |
| 236 | + { |
| 237 | + "cell_type": "markdown", |
| 238 | + "metadata": {}, |
| 239 | + "source": [ |
| 240 | + "Now, when the seeded team has a game win probability in the range 50% to 54%, the other team is favored to win game 6, making a 7-game series more likely. Overall, a home-court advantage of 5% gives the seeded team about a 1.6% better chance of winning the series.\n", |
| 241 | + "\n", |
| 242 | + "Some people think a home-court advantage of as much as 13% is reasonable:" |
| 243 | + ] |
| 244 | + }, |
| 245 | + { |
| 246 | + "cell_type": "code", |
| 247 | + "execution_count": 7, |
| 248 | + "metadata": {}, |
| 249 | + "outputs": [ |
| 250 | + { |
| 251 | + "name": "stdout", |
| 252 | + "output_type": "stream", |
| 253 | + "text": [ |
| 254 | + " | Game Win Percentage (± home court advantage = 13%)\n", |
| 255 | + " | 48.0 50.0 52.0 54.0 56.0 58.0 60.0 62.0 64.0 66.0\n", |
| 256 | + "----+--------------------------------------------------\n", |
| 257 | + "4-3 | 19.8 20.5 21.1 21.4 21.5 21.4 21.0 20.3 19.4 18.3\n", |
| 258 | + "4-2 | 11.5 12.6 13.6 14.6 15.5 16.3 16.9 17.4 17.6 17.7\n", |
| 259 | + "4-1 | 13.9 15.7 17.6 19.5 21.6 23.6 25.7 27.8 29.9 31.9\n", |
| 260 | + "4-0 | 4.6 5.4 6.4 7.5 8.8 10.2 11.8 13.5 15.4 17.5\n", |
| 261 | + "0-4 | 6.4 5.4 4.6 3.8 3.1 2.5 2.0 1.6 1.3 1.0\n", |
| 262 | + "1-4 | 10.5 9.2 8.0 6.8 5.8 4.8 4.0 3.2 2.6 2.0\n", |
| 263 | + "2-4 | 20.7 19.1 17.4 15.7 14.1 12.4 10.9 9.4 8.0 6.7\n", |
| 264 | + "3-4 | 12.7 12.1 11.4 10.5 9.7 8.7 7.7 6.8 5.8 4.9\n", |
| 265 | + "----+--------------------------------------------------\n", |
| 266 | + " 7 | 32.5 32.6 32.5 32.0 31.2 30.1 28.7 27.1 25.2 23.2\n", |
| 267 | + " 6 | 32.1 31.6 31.0 30.3 29.6 28.7 27.8 26.7 25.6 24.4\n", |
| 268 | + " 5 | 24.4 24.9 25.5 26.3 27.3 28.5 29.7 31.1 32.5 33.9\n", |
| 269 | + " 4 | 11.0 10.9 11.0 11.3 11.9 12.8 13.8 15.1 16.7 18.5\n", |
| 270 | + "----+--------------------------------------------------\n", |
| 271 | + "Win | 49.7 54.2 58.7 63.1 67.4 71.5 75.4 79.0 82.4 85.5\n" |
| 272 | + ] |
| 273 | + } |
| 274 | + ], |
| 275 | + "source": [ |
| 276 | + "series_results_table(h=0.13)" |
| 277 | + ] |
| 278 | + }, |
| 279 | + { |
| 280 | + "cell_type": "markdown", |
| 281 | + "metadata": {}, |
| 282 | + "source": [ |
| 283 | + "This means that a 50/50 team with a 13% home-court advantage wins a series about as often as a 52% team with no home-court advantage." |
| 284 | + ] |
| 285 | + } |
| 286 | + ], |
| 287 | + "metadata": { |
| 288 | + "kernelspec": { |
| 289 | + "display_name": "Python 3 (ipykernel)", |
| 290 | + "language": "python", |
| 291 | + "name": "python3" |
| 292 | + }, |
| 293 | + "language_info": { |
| 294 | + "codemirror_mode": { |
| 295 | + "name": "ipython", |
| 296 | + "version": 3 |
| 297 | + }, |
| 298 | + "file_extension": ".py", |
| 299 | + "mimetype": "text/x-python", |
| 300 | + "name": "python", |
| 301 | + "nbconvert_exporter": "python", |
| 302 | + "pygments_lexer": "ipython3", |
| 303 | + "version": "3.13.1" |
| 304 | + } |
| 305 | + }, |
| 306 | + "nbformat": 4, |
| 307 | + "nbformat_minor": 4 |
| 308 | +} |
0 commit comments