My high school statistics teacher always said, if someone gives you an average, ask for the standard deviation.
And that’s good advice.
Imagine a scenario where the class average on a test is 50. That doesn’t tell you much about the difficulty of the test. Did everyone take it and fail, or did half of the students score 100 while the other half skipped class?
In both cases, the average is 50, but the spread of the data varies widely. This principle applies, albeit not to such an extreme, wherever an average is involved. Averages are usually involved wherever data is involved, and there’s data everywhere.
Arguably, my favorite game of the past few years has been DEAD BY DAYLIGHT (DBD), an asymmetric multiplayer survival horror. It’s a 1v4 game where one plays as a killer while the other four play as survivors. The killer hunts each survivor, while the survivors try to avoid being killed and power up exit gates by working together to repair five generators (gens). The killer wants to prevent everyone from escaping while the survivors want to escape.
The game might seem simple enough, but it has so many perks and mechanics that interact with each other in so many different ways that it can take hundreds of hours to get a grasp on what’s going on.
But the complexities are built on simple foundations.
For one survivor, fully repairing a gen takes 90 seconds. For each second the survivor repairs the gen, 1% progress is added. In other words, they repair the gen at a rate of 1 charge per second (1c/s), and it takes 90 charges to be fully repaired.
So far, this data is incredibly easy to work with since the values are constant. If we plot the time it takes to complete a gen, it would look like this:
For each set, the mean, the median, and the mode are all the same, and the standard deviation is zero.
However, in reality, the game is a little more complicated than that.
Every second a survivor works on a gen, there’s an 8% chance of receiving a skill check. A skill check is a quick time event whereby a red pointer rotates clockwise around a circle. The survivor must press the action button once the pointer is inside of the success zone. If the action button is pressed outside of the success zone, the gen’s current progress is reduced by 10%.
The success zone is broken into two sections, the good zone and the great zone. If the action button is pressed in the good zone, all it does is prevent regression. However, if the button is pressed in the great zone, the gen receives an additional 1% progress. These are known as good skill checks and great skill checks.
Now, if we wanted to find the average gen completion time assuming we always hit our great skill checks, it’s a bit more complicated, but still simple enough to get a rough estimate.
We just need to figure out how many skill checks we receive on average over the 90 second repair time, and then subtract that from the total time. Obviously, it won’t be exact because with each great skill check, you reduce the total amount of gen time by 1%. Similarly, it takes about 1.1 seconds for a skill check event to complete, meaning other skill checks can’t happen within this time frame. However, the answer should still be fairly close.
The total number of skill checks on average will be:
90 * 0.08 = 7.2
The total amount of time saved on average will be:
90 – 7.2 = 82.8
So, on average, one person working on a gen and hitting all their great skill checks will complete it in about 83 seconds.
Uh oh, I just gave you an average. So, what’s the standard deviation, you ask?
Fortunately this is still within the realm of being easy enough mathematically.
What we have here is a binomial distribution. We already know the formula for the mean of a binomial distribution, because that’s what we already did.
μ = np
where n is the number of seconds and p is the probability of getting a skill check.
μ = 90 * 0.08
μ = 7.2
The formula for standard deviation is the following:
σ = sqrt( np( 1 − p ) )
Plugging in the values we get:
σ = sqrt( 90 * 0.08 * ( 1 – 0.08 ) )
σ = sqrt( 7.2 * ( 0.92 ) )
σ = sqrt( 6.624 )
σ = 2.5737
Since the number of skill checks is equal to the amount of time saved, the mean is just a linear translation from 90 and the standard deviation remains the same.
μ = 82.80
σ = 2.5737
However, once again, the game is a little more complicated than that.
There are hundreds of perks survivors can choose from that alter the behavior of in-game mechanics, several of which affect gen repair speeds. Most of them simply apply a buff to the progression rate, which makes working out the completion time trivial since you’re just shifting the baseline over.
One perk in particular is a bit more complex:
With each great skill check, Hyperfocus increases the chances of receiving a skill check, and it increases the bonus progression value.
Awesome, so how long should it take to complete a gen with Hyperfocus and what’s the standard deviation?
I’m not a mathematician. I don’t know how to solve this. I think we’ve hit the point of the problem becoming too mathematically complex for the average person to work out.
I’m not a mathematician, but I am a programmer, and one thing programming is great at is simulating statistical problems. If we can capture, or approximate, the exact logic the game uses, we can complete thousands of generators within seconds and then use that data to calculate the mean and the standard deviation.
First, we’ll create a function representing a single gen.
Since a gen needs 90 charges to be completed, we’ll set the max progress to 90. We’ll also need our base progression per second, what our current progress is, and what the current time is:
def gen():
MAX_PROGRESS = 90
BASE_PROGRESSION = 1
current_progress = 0
time = 0
while current_progress < MAX_PROGRESS:
current_progress += BASE_PROGRESSION
time += 1
print(time, current_progress)
This gives us the time and the current progress at each step:
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
...
To improve accuracy, let’s set the max progress to 9000 and divide the time by 100 later. Let’s also add in the base chance of getting a skill check, and then every second we can test to see if we get one or not. Finally, instead of printing the progress at every tick, let’s return the final time at the end:
import random
def gen() -> float:
MAX_PROGRESS = 9000
BASE_PROGRESSION = 1
BASE_CHANCE = 8
current_progress = 0
time = 0
while current_progress < MAX_PROGRESS:
current_progress += BASE_PROGRESSION
time += 1
if time % 100 != 0:
continue
if random.random() < (BASE_CHANCE / 100):
current_progress += (MAX_PROGRESS * 0.01)
return time / 100
Now that we have a generator function, we can call it thousands of times, store the completion times in a list, and then calculate the mean and the standard deviation:
import math
SIMS = 10000
times = [gen() for i in range(SIMS)]
mean = sum(times) / len(times)
squared_diffs = [(i - mean) ** 2 for i in times]
variance = sum(squared_diffs) / len(squared_diffs)
standard_deviation = math.sqrt(variance)
print(mean)
print(standard_deviation)
This gives us (drum roll)...
84.02971999999974
2.076434617704139
Not exactly our 82.8 and 2.57, but we knew it wouldn’t be exact. For every great skill check, we’re reducing the amount of time available to get skill checks, meaning we’re gonna get fewer of them, and it’s gonna take us slightly longer. And since the mean is different, so is the standard deviation.
But they’re both pretty close.
And actually, if this is the logic DBD uses, these values are more accurate.
Here is a plot of the data:
If we want to model Hyperfocus, we need to add some stuff to our function.
Before the main loop, we’ll add the Hyperfocus progression and speed bonus values, as well as set our current Hyperfocus tokens to zero. Then, inside of the main loop, we need to update the chance of a skill check appearing based on how many tokens we have. Similarly, how much progress we gain needs to be tied to how many tokens we have.
Then, after we add the progression value to our current progress, we can add another Hyperfocus token as long as we have fewer than six. Finally, we’ll add some time during every skill check where we cant get another one.
Every skill check, the player receives a sound notification 0.5 seconds before it starts. The success zone can then be anywhere between 4 o’clock and 11 o’clock. We’ll add these two times to every skill check to try and make it a little more accurate:
def gen() -> float:
MAX_PROGRESS = 9000
BASE_PROGRESSION = 1
BASE_CHANCE = 8
HF_PROGRESSION = (0, 0.3, 0.6, 0.9, 1.2, 1.5, 1.8)
HF_SPEED = (1.1, 1.056, 1.012, 0.968, 0.924, 0.88, 0.836)
tokens = 0
skillcheck_in_progress = 0
current_progress = 0
time = 0
while current_progress < MAX_PROGRESS:
current_progress += BASE_PROGRESSION
time += 1
if skillcheck_in_progress > 0:
skillcheck_in_progress -= 1
if time % 100 != 0:
continue
if skillcheck_in_progress:
continue
chance = (BASE_CHANCE + (tokens * 4)) / 100
if random.random() < chance:
progress = (HF_PROGRESSION[tokens] + 1)
time_to_4 = int(HF_SPEED[tokens] / 12.0 * 4 * 100)
time_to_11 = int(HF_SPEED[tokens] / 12.0 * 11 * 100)
skillcheck_in_progress = 50 + random.randrange(time_to_4, time_to_11)
current_progress += progress * MAX_PROGRESS * 0.01
if tokens < 6:
tokens += 1
return time / 100
This gives us (drum roll)…
65.77565849999779
7.13327731806177
A significant improvement in the mean, but a substantial deterioration in the standard deviation. Here is a plot of the data:
This is super interesting.
It looks like it’s normally distributed, but for some reason it’s way more likely to be certain numbers. Maybe it’s because when you add a lot of progression in single blows, it doesn’t matter if it was 98% or 99%, it’s going to pass the 100% mark regardless. But I don’t know.
As a player of DBD, I’m glad to finally know how Hyperfocus affects generators.
Knowing the mean and the standard deviation as well as seeing the data plotted tells me that this perk should save about 18 seconds on average. It also tells me that, although saving 18 seconds is the most likely outcome, that won’t always happen. In the best case, it will save nearly 35 seconds, but in the worst case, it will do absolutely nothing.
So whether you're marking tests, repairing generators, or simply going about life, there will always be data. And where there is data, there is usually an average. But the average doesn’t tell the whole story. The next time someone gives you an average, do yourself a favor — ask for the standard deviation. It’s there that the most interesting, and often most useful, insights lie.