Probability and Statistics 101 for Sports Simulation Games
This page is meant to introduce younger players to the mathematics of probabilities and statistics and how these concepts are applied to sports simulation games. Understanding these concepts will help you make better decisions and improve your strategies when playing your favorite sports simulation games.
Let's explore the mathematics of probabilities and statistics.
What are probabilities?
Probability is the chance of a particular outcome occuring for a random event. A probability is typically written as a decimal number between 0.0 and 1.0, but is often read as a percentage between 0% and 100%. A probability can never be less than 0.0 or greater than 1.0. In general, you can use the following formula to calculate the probability of an outcome occuring:
Let's consider the random event of rolling a single six-sided die. There is one occurrence of each number 1 through 6 on the die. There are six possible outcomes on the die. Using the above formula we can calculate the probability of rolling a number as 1 / 6 = 0.167. The probability of rolling any number 1 through 6 is the same (i.e. the probability of rolling 1 is 0.167 and the probability of rolling 6 is also 0.167).
Let's consider some different outcomes:
- Rolling a number between 1 and 6
- Rolling a number less than 1 or greater than 6 (e.g. rolling a 0 or 9)
- Rolling a 3, 4, 5, or 6
- Rolling an even number
- Rolling a 1 or 2
Outcomes are considered to be:
- Certain: This outcome is guaranteed to always happen. When you roll a single die you will always roll a number between 1 and 6. The probability of this outcome is 1.0.
- Impossible: This outcome is guaranteed to never happen. When you roll a single die you will never roll a number that is not between 1 and 6, such as 9. The probability of this outcome is 0.0.
- Likely: This outcome has a high chance of happening. Typically this means a probability greater than 0.5. The probability of rolling a 3, 4, 5, or 6 is 0.667 because there are four numbers out of six that match this outcome.
- Even: This outcome has an equal chance of happening compared to its compliment. The probability of rolling an even number is 0.5 because there are three even numbers out of six. The complimentary outcome is rolling an odd number, which also has a probability of 0.5 because there are three odd numbers out of six as well.
- Unlikely: This outcome has a low chance of happening. Typically this means a probability less than 0.5. The probability of rolling a 1 or 2 is 0.333 because there are two numbers out of six that match this outcome.
Note that rolling a 1 or 2 is the compliment of rolling 3, 4, 5, or 6; there are no other possible outcomes. Because these two outcomes are complimentary, there are a couple of interesting properties:
Adding their probabilities of all distinct outcomes equals 1.0:
Subtract from 1.0 the probability of one outcome to get the probability of its compliment:
Probabilities help us guess what the outcome of a random event might be. Probabilities do not guarantee a particaly outcome, but they do provide information about the chances of a given result happening.
What are probability distributions?
A probability distribution is a way to visualize the probabilities of outcomes occurring for some random event, like rolling a dice.
In the previous section we calculated the probability of rolling any number on a six-sided die to be 0.167. Here is what the distribution looks like:
This is a special distribution called a Uniform Distribution because the probability of all outcomes is the same or uniform. The probability distribution of any single die which has an equal number of occurrences for each outcome will always produce a uniform distribution. For example, a 20-sided die will also produce a uniform distribution, where each outcome has a probability of 0.05 (i.e. 1 / 20).
Something interesting happens when you add a second die. Now the outcomes of the sum of the dice rolled are between 2 and 12. However, the number of occurrences of each of these outcomes is not the same. We can see this in the following table:
| 1 | 2 | 3 | 4 | 5 | 6 | |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| 6 | 7 | 8 | 9 | 10 | 11 | 12 |
Let's visualize this data in a form that is easier to understand. The total number of outcomes from Table 1 is 36. We can calculate the number of occurences of an outcome by counting the number of times it appears in Table 1. Then we can calculate the probability of each outcome by using the formula above:
| 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |
| Occurences | 1 | 2 | 3 | 4 | 5 | 6 | 5 | 4 | 3 | 2 | 1 |
| Probability | 0.028 | 0.056 | 0.083 | 0.111 | 0.139 | 0.167 | 0.139 | 0.111 | 0.083 | 0.056 | 0.028 |
There are some interesting properties of probabilities to note from Table 2:
- Probabilities are additive; the probability of rolling a 2 or 3 is 0.028 + 0.056 = 0.084
- Probabilities can be decomposed; the probability of rolling a 7 (6 occurrences) is the same as the probability of rolling 3 (2 occurrences) or 5 (4 occurrences) for example
Using the data from Table 2, here is what the distribution looks like:
This is called a Discrete Triangular Distribution because the outcomes are discrete values and the shape of the distribution is a triangle.
How can you apply probabilities to improve your decision making?
Let's examine how you can use these concepts to improve your decision making. Consider these three hockey players:
OUTSIDE | INSIDE | REB/BKWY |
| 2. Goal 1-9 | 2. X-DRD | 2. X-DC |
| 3. X-DLW | 3. Goalie rating | 3. Goalie rating |
| 4. X-any D player | 4. Goal 1-9 | 4. Goalie rating |
| 5. X-DC | 5. X-DRW | 5. Goal 1-16 |
| 6. X-DRD | 6. X-DC | 6. X-DC |
| 7. Goalie rating | 7. Goalie rating | 7. Goalie rating |
| 8. X-Reb DEFLECT? | 8. X-Reb | 8. X-Reb |
| 9. X-DLD | 9. X-DLD | 9. Goalie rating |
| 10. Lose to DRW | 10. Lose to DLD | 10. Lose to DLW |
| 11. X-DRW | 11. Goalie rating | 11. Goalie rating |
| 12. X-DRW | 12. X-DLW | 12. X-DRD |
OUTSIDE | INSIDE | REB/BKWY |
| 2. Goal 1-9 | 2. Goal 1-14 | 2. Goal+ |
| 3. Goalie rating+ | 3. X-DLD | 3. Goal 1-14 |
| 4. X-DRD | 4. Goalie rating+ | 4. X-DRD |
| 5. Goalie rating | 5. Goalie rating | 5. Goalie rating |
| 6. X-DLW | 6. X-DRD | 6. X-any D player |
| 7. X-DC | 7. X-DLW | 7. Goalie rating |
| 8. X-Reb DEFLECT? | 8. X-Reb | 8. X-Reb |
| 9. X-DRD | 9. X-DC | 9. Goalie rating+ |
| 10. X-DRW | 10. Goalie rating | 10. Goalie rating |
| 11. X-DRW | 11. X-any D player | 11. X-DLD |
| 12. Goal+ 1-2 | 12. Goal+ 1-8 | 12. X-DC |
OUTSIDE | INSIDE | REB/BKWY |
| 2. Goal 1-8 | 2. Goal 1-10 | 2. Goal 1-17 |
| 3. Goalie rating | 3. X-DRW | 3. X-DLW |
| 4. X-DRW | 4. X-DRW | 4. X-DC |
| 5. Lose to DRW | 5. Lose to DRD | 5. Lose to DLW |
| 6. X-DLD | 6. X-DLD | 6. X-DLW |
| 7. X-DLW | 7. Goalie rating | 7. Goalie rating |
| 8. X-Reb DEFLECT? | 8. X-Reb | 8. X-Reb |
| 9. X-DC | 9. X-DC | 9. Goalie rating |
| 10. X-any D player | 10. X-DRW | 10. X-DRW |
| 11. X-DRD | 11. X-any D player | 11. X-any D player |
| 12. X-DLD | 12. X-DLD | 12. X-DLD |
Which player has the greatest chance of rolling a Goal or Goalie Rating on an OUTSIDE shot? Let's use Table 2 to calculate the probability for each player. For Player 1 we need to roll a 2 or 7, which results in a probability of 0.028 + 0.167 = 0.195. For Player 2 we need to roll a 2 or 5, which results in a probability of 0.028 + 0.111 = 0.139. For Player 3 we need to roll a 2 or 3, which results in a probability of 0.028 + 0.056 = 0.084. Given the choice, you would prefer to have Player 1 take the OUTSIDE shot because their probability is 0.056 greater than Player 2 and 0.111 greater than Player 3.
Should Player 1 take an OUTSIDE shot or try for an INSIDE shot? If Player 1 takes an OUTSIDE shot, the probability of rolling a Goal or Goalie Rating is 0.195 (see above). For an INSIDE shot we need to roll a 3, 4, 7, or 11, which results in a probability of 0.056 + 0.083 + 0.167 + 0.056 = 0.362. This is 0.167 greater than if Player 1 takes an OUTSIDE shot, almost twice as likely of rolling a good result. Because Player 1 is good at penetrating for an INSIDE shot he should probably do so.
Should Player 2 take an OUTSIDE shot or try for an INSIDE shot? If Player 2 takes an OUTSIDE shot, the probability of rolling a Goal or Goalie Rating is 0.139 (see above). For an INSIDE shot we need to roll a 2, 5, or 10, which results in a probability of 0.028 + 0.139 + 0.083 = 0.250. This is 0.111 greater than if Player 2 takes an OUTSIDE shot. Because Player 2 is not good at penetrating for an INSIDE shot he should probably not do so.
Which player has the greatest chance of rolling a Goal or Goalie Rating on a REBOUND shot? Player 1 has a probability of 0.056 + 0.083 + 0.111 + 0.167 + 0.111 + 0.056 = 0.584. Player 2 has a probability of 0.056 + 0.111 + 0.167 + 0.083 = 0.417. Player 3 has a probability of 0.056 + 0.167 + 0.111 = 0.334. So you would prefer to have Player 1 get the REBOUND shot.
Understanding probabilities and applying these concepts can help you make better decisions in the game, which can lead to more scoring opportunities.
What is the Normal Distribution?
A Normal Distribution is a very important distribution. It is used to model many random events that occurs in nature. Here is what the normal distribution looks like:
Note that the normal distribution is continuous, meaning that for any value on the x-axis there is a corresponding probability. On the other hand, the discrete triangular distribution is called discrete because probabilities are defined for a small range of values on the x-axis between 2 and 12 only. Note also, how the shape of the uniform distribution changes to become the discrete triangular distribution when a second die is added. What happens to the shape of the distribution if we add more dice? The shape of the distribution starts to approach that of the normal distribution. Here is the distribution using six dice:
If the normal distribution can more accurately represent random events, then why don't we add more dice to make our sports simulation games more realistic? The advantage of using two dice instead of one die should be obvious when you compare the uniform distribution to the discrete triangular distribution. Using one die only, the amount of outcomes are limited and it is hard to differentiate the likelihood of certain outcomes occuring. By using two dice, we can generate a wider variety of outcomes as well as more effectively differentiate the likelihood of outcomes occuring (e.g. the likelihood of rolling a 7 versus an 11). If we use six dice, it is true that the distribution more closely approximates the normal distribution, but the game now has to define outcomes for values between 6-36. The incremental improvement to the quality of the game starts to decrease at the expense of the game becoming harder to design by the creator and harder to play by the player. So there is a tradeoff that has to be acknowledged which balances the complexity of the game with the ease and enjoyment of playing the game.
What is data?
Data or data point is a single piece of information describing an event. Examples of events are:
- A pitcher throwing a strike
- A player sinking a 3-point shot
- A quarterback completing a pass
- A skater scoring a goal
An event may be described by multiple data points. For example, the event of a hockey player shooting on goal may be described by the following data points:
- The player shooting the puck
- The player defending the goal
- The type of shot (e.g. outside, inside, rebound, etc)
- Whether the shot was a goal
- The player with the first assist
- The player with the second assist
- etc.
A single data point only describes a single event in a sequence of events that make up a game. A single data point doesn't tell much about the performance of an athlete if it is only considered in isolation. Data points considered in the entirety of a game can give a picture of how well the athlete performed in that game only. Data points considered in the entirety of a season can give a picture of how well the athlete performed across all games of that season. We can do this by calculating statistics on a group of similar data points.
What are statistics?
Statistics is the process of collecting, organizing, and analyzing data to find interesting patterns.
Let's consider the player above who made a shot in a hockey game. We can organize all of the data about this player in a list like this:
| Skater | Goalie | Type | Goal |
| S1 | G1 | OUTSIDE | NO |
| S1 | G1 | INSIDE | NO |
| S1 | G1 | OUTSIDE | NO |
| S1 | G1 | REBOUND | YES |
| S1 | G1 | INSIDE | NO |
We can collapse all five of these data points into statistics that describes the performance of the player in the game like this:
| Game | Skater | Shots | Goals | OUTSIDE Shots | INSIDE Shots | RBND Shots | Shot % |
| 1 | S1 | 5 | 1 | 2 | 2 | 1 | 20% |
Suppose we do the same thing for other players and gather all of their statistics into a list like this:
| Game | Skater | Shots | Goals | OUTSIDE Shots | INSIDE Shots | RBND Shots | Shot % |
| 1 | S1 | 5 | 1 | 2 | 2 | 1 | 20% |
| 1 | S2 | 2 | 0 | 2 | 0 | 0 | 0% |
| 1 | S3 | 4 | 0 | 1 | 3 | 0 | 0% |
We can now compare the performance of each player by comparing their statistics. We can identify the following:
- S1 had the most shots (5) of the three players
- S1 was the only player to score a goal
- S1 scored one goal in five shots for a 20% shot percentage
- S1 tended to shoot from outside the same amount as he does from inside
- S2 had the fewest shots (2)
- S2 had two shots from outside
- S3 had one less shot (4) than S1 (5)
- S3 had three shots from inside
Based on the data for this one game it seems that S1 and S3 tend to take more shots than S2. And S1 shoots equally from outside as inside, while S2 tends to shoot from outside and S3 tends to shoot from inside.
But it doesn't stop there. We can use the statistics from this game as a data point as well, and compare against the statistics from other games over the course of a season. For example, player statistics for the first four games might look like this:
| Game | Skater | Shots | Goals | OUTSIDE Shots | INSIDE Shots | RBND Shots | Shot % |
| 1 | S1 | 5 | 1 | 2 | 2 | 1 | 20% |
| 1 | S2 | 2 | 0 | 2 | 0 | 0 | 0% |
| 1 | S3 | 4 | 0 | 1 | 3 | 0 | 0% |
| 2 | S1 | 5 | 1 | 1 | 3 | 1 | 20% |
| 2 | S2 | 1 | 0 | 1 | 0 | 0 | 0% |
| 2 | S3 | 5 | 0 | 1 | 3 | 1 | 0% |
| 3 | S1 | 4 | 0 | 2 | 2 | 0 | 0% |
| 3 | S2 | 2 | 0 | 1 | 0 | 1 | 0% |
| 3 | S3 | 4 | 0 | 0 | 3 | 1 | 0% |
| 4 | S1 | 6 | 1 | 2 | 3 | 1 | 17% |
| 4 | S3 | 4 | 1 | 0 | 3 | 1 | 25% |
We might want to calculate different statistics for the season than we did for a single game, which might look like this:
| Skater | Games Played | Shots | Goals | Shot % | Shots / Game |
| S1 | 4 | 20 | 3 | 15% | 5.0 |
| S2 | 3 | 5 | 0 | 0% | 1.7 |
| S3 | 4 | 17 | 1 | 6% | 4.3 |

