DaveWentzel.com            All Things Data

Confidence Intervals


Confidence Intervals (understanding the quality of LR data)
Assume you are determining if you should buy a very expensive BIG IP (F5) network load balancer device.  You need to know if the benefit justify the cost.  Your performance team runs LR tests with and without the device and tells you, "Performance improvement with the F5 is 64%".  Do you buy it? 
Better to ask first what is the quality of the LR data.  In other words, I want to know the confidence interval.  A more accurate statement by your performance team might end up being "Performance improvement with the F5 with a 95% confidence interval is 64%, +/- 62%." 
Now do you buy it? 
High variability in performance is normal, especially in web based applications.  We need to make sure we report accurate numbers from LR tests. 
What is a confidence interval?
At election time we see confidence intervals everywhere.  Generally polls are conducted at a 95% confidence interval and have a margin of error of +/- 3%. 
A low confidence interval may be acceptable where the cost of making a mistake is small and the precision required in minimal.  But for our F5 example, with significant costs, we need more precision.  We need to be more scientific and less qualitative. 
Confidence intervals are used to account for the variability in measurements. 
For example, we might obtain a confidence interval such that we can say with 95% certainty that the average value of a response time is between 5.5 seconds and 7.5 seconds. This is important, since each time we take a measurement, the results will be different, and we need to understand how the data varies and what the normal range of measurements is. 
In order to construct a confidence interval, we need at least a minimal amount of data.

For certain types of variables that are independently identically normally distributed, we can construct a confidence interval off of a T distribution. However, many of the things that we measure in the IT world aren’t normally distributed. This means that we need to construct the confidence interval off of a normal distribution. In order for our results to be valid in this case, we need a minimum of 30 samples. Generally, more samples is better, since more samples allows us to construct smaller confidence intervals at a given level of confidence, or conversely, to assign a higher level of confidence to a particular interval.

To construct a two-sided interval (which is what we would normally use), we calculate the following:

Lower Bound = sample mean – (Zvalue * SD) / sqrt(n)

Upper Bound = sample mean + (Zvalue * SD) / sqrt(n)

The ‘Zvalue’ is obtained by looking it up in a table (at the bottom). The ‘sample mean’ is the average value of our measurements. ‘SD’ is the sample standard deviation. ‘n’ is the number of samples.

Looking at the equations above, we can see that as the number of samples increases, the size of the interval shrinks. As the variability of the data increases (higher SD), the interval widens. As the confidence level increases, the Zvalue gets larger, causing the interval to widen.  The calculations seem complex but can be calculated quite easily using a calculator or Excel. 

How to Interpret Confidence Intervals

Essentially, any sample average that we measure within a confidence interval is statistically the same as any other value that we might measure within that interval. For example, in the example above, an average response time of 5.7 seconds would be indistinguishable from a response time of 7.3 seconds at the 95% level of confidence, since the entire difference could simply be due to normal random variation of our samples.  We could reduce the size of the confidence interval until one of the averages lies outside of it, either by reducing the level of confidence or by increasing the number of samples taken. At that point, we could say that there is a statistically significant difference in the two values at that confidence level.

When we are using benchmarking to compare alternatives, there are three things that we want to keep in mind:

• First, if the confidence intervals for two alternatives overlap, then we cannot say that there is any statistical difference between the alternatives at that level of confidence.

• Second, we can only compare confidence intervals constructed at the same level of confidence.

• Third, when we make a comparison between two alternatives, we want to express it as a range of values, from most conservative to most optimistic. For example, if we had two confidence intervals: 5.5 seconds to 7.5 seconds, and 8.5 seconds to 12.5 seconds, we would express the amount of improvement of the first alternative vs. the second alternative as follows:

8.5 / 7.5 – 1 = 13.3%

12.5 / 5.5 -1 = 127.3%

And we would say that there was between a 13.3% to 127.3% improvement at that level of confidence. If we want to be more precise in our range, then we need to shrink the size of the confidence intervals, either by reducing the level of confidence or by increasing the number of samples that we measure.

Dealing with Highly Variable Data

There are some situations where the data that we measure is highly variable, with persistent outliers that increase the size of the standard deviation of our sample set. For example, in an informal study of the wireless network latency within a particular facility, a coefficient of variance of 2.0 was measured on one of the 100-sample ping tests. Other tests ranged from a C.O.V. of 1.6 to 1.8 (Coefficient Of Variance = Std. Deviation /

Mean). This is extremely high. If we wanted to construct a 95% confidence interval that said that the average latency was X +/- 10%, we would need over 1,600 samples. To say the average is X +/- 5%, we would need 4 times as many samples. The other thing is that when the range in C.O.V.s is high, and we would need much larger sample sets to get better consistency.

Z value Lookup Table for Constructing 2-Sided Confidence Intervals

Confidence Level
95 1.960
99 2.576


Performance Management Home

Add new comment