**DaveWentzel.com**** ** ** All Things Data**

# Confidence Intervals

**Confidence Intervals (understanding the quality of LR data)**

*accurate*statement by your performance team might end up being "Performance improvement with the F5 with a 95% confidence interval is 64%, +/- 62%."

**What is a confidence interval?**

For certain types of variables that are independently identically normally distributed, we can construct a confidence interval off of a T distribution. However, many of the things that we measure in the IT world aren’t normally distributed. This means that we need to construct the confidence interval off of a normal distribution. ** In order for our results to be valid in this case, we need a minimum of 30 samples. **Generally, more samples is better, since more samples allows us to construct smaller confidence intervals at a given level of confidence, or conversely, to assign a higher level of confidence to a particular interval.

To construct a two-sided interval (which is what we would normally use), we calculate the following:

Lower Bound = sample mean – (Zvalue * SD) / sqrt(n)

Upper Bound = sample mean + (Zvalue * SD) / sqrt(n)

The ‘Zvalue’ is obtained by looking it up in a table (at the bottom). The ‘sample mean’ is the average value of our measurements. ‘SD’ is the sample standard deviation. ‘n’ is the number of samples.

Looking at the equations above, we can see that as the number of samples increases, the size of the interval shrinks. As the variability of the data increases (higher SD), the interval widens. As the confidence level increases, the Zvalue gets larger, causing the interval to widen. The calculations seem complex but can be calculated quite easily using a calculator or Excel.

**How to Interpret Confidence Intervals**

Essentially, any sample average that we measure within a confidence interval is statistically the same as any other value that we might measure within that interval. For example, in the example above, an average response time of 5.7 seconds would be indistinguishable from a response time of 7.3 seconds at the 95% level of confidence, since the entire difference could simply be due to normal random variation of our samples. We could reduce the size of the confidence interval until one of the averages lies outside of it, either by reducing the level of confidence or by increasing the number of samples taken. At that point, we could say that there is a statistically significant difference in the two values at that confidence level.

When we are using benchmarking to compare alternatives, there are three things that we want to keep in mind:

• First, if the confidence intervals for two alternatives overlap, then we cannot say that there is any statistical difference between the alternatives at that level of confidence.

• Second, we can only compare confidence intervals constructed at the same level of confidence.

• Third, when we make a comparison between two alternatives, we want to express it as a range of values, from most conservative to most optimistic. For example, if we had two confidence intervals: 5.5 seconds to 7.5 seconds, and 8.5 seconds to 12.5 seconds, we would express the amount of improvement of the first alternative vs. the second alternative as follows:

8.5 / 7.5 – 1 = 13.3%

12.5 / 5.5 -1 = 127.3%

And we would say that there was between a 13.3% to 127.3% improvement at that level of confidence. If we want to be more precise in our range, then we need to shrink the size of the confidence intervals, either by reducing the level of confidence or by increasing the number of samples that we measure.

**Dealing with Highly Variable Data**

There are some situations where the data that we measure is highly variable, with persistent outliers that increase the size of the standard deviation of our sample set. For example, in an informal study of the wireless network latency within a particular facility, a coefficient of variance of 2.0 was measured on one of the 100-sample ping tests. Other tests ranged from a C.O.V. of 1.6 to 1.8 (Coefficient Of Variance = Std. Deviation /

Mean). This is extremely high. If we wanted to construct a 95% confidence interval that said that the average latency was X +/- 10%, we would need over 1,600 samples. To say the average is X +/- 5%, we would need 4 times as many samples. The other thing is that when the range in C.O.V.s is high, and we would need much larger sample sets to get better consistency.

**Z value Lookup Table for Constructing 2-Sided Confidence Intervals**

Confidence Level |
Zvalue |

20 |
0.253 |

40 |
0.524 |

60 |
0.842 |

80 |
1.282 |

90 |
1.645 |

95 |
1.960 |

99 |
2.576 |

## Add new comment