Thursday, November 10, 2011

Stats for risk modeling: The Uniform Distribution

This blog entry originally appeared on the Society of Information Risk Analysts web page on November 3, 2011:

Next up in my series on distributions, I'd like to talk about the Uniform distribution.  Uniform is quickly becoming my favorite distribution because it is really easy to understand and it helps us to avoid a common mistake in risk modeling; namely that we tend to underestimate the likelihood of the extremes.  If you remember back to my previous post on the Normal distribution you will recall that once you get three standard deviations away from the mean you are getting into frequencies that are pretty low.  In fact, in a perfect normal distribution it would take 10,000 loss events to find 15 losses that were greater than 3 times the standard deviation.

Uniform distribution makes a great "safety" distribution.  Maybe you're pretty confident that some random variable can be represented by a normal distribution, and if that is the case then use it.  That's what it's there for.  But what if your random variable is bimodal, meaning that has ups and downs and no single most likely value?  What if you think that values at the far end of the distribution likely occur more often than what the Normal distribution allows?  Uniform distribution never lets me underestimate my tails (unless my random variable is a strange U shaped phenomenon) and it never makes me ignore one mode in favor of another.

SOME BACKGROUND: Distributions can be discreet or continuous.  Discrete distributions have clear values with nothing in between.  In the two pictures above, the distribution on top is continuous, and the bottom is discrete.  A fair six-sided die is a good example of a discrete uniform distribution.  It has 6 and only 6 possible outcomes.  If I used a continuous uniform distribution to represent that variable I might get values like 4.5.  One of the "rules" of modeling risk in Monte Carlo simulations is that each value in each iteration has to be possible.  No gibberish in the model.  So having a variable that tells you (with great frequency) that you rolled 3.291 is bad.  Built into Excel you have the RANDBETWEEN function which will give you a discrete uniform distribution of random numbers.  If you're using the FAIR Lite tool, you can get a nearly uniform distribution by setting the confidence of an estimate to very low and putting the most likely value right between the minimum and the maximum.  It's not exactly uniform, but probably close enough.

WHEN TO USE IT: Use the uniform distribution when you have a good idea about the upper and lower bound of your variable, but you are uncertain about the shape.

WHAT MAKES IT COOL: It's easy to understand.  You can explain this distribution to even the most statistically challenged of executives.  And thanks to the Central Limit Theorem when you combine several uniform distributions you'll get a normal distribution which will satisfy the people that want nice graphs and tighter estimates of the most likely value.

WHEN TO AVOID IT: You should absolutely avoid this distribution if you have any evidence that the variable you're representing is U-shaped, in other words the values at the extreme are more likely than values in the middle.  Other than that, this is a great distribution if you want to be open to possibilities and are willing to admit that you don't know a whole lot.

EXTRA CREDIT: U-shaped distributions are most often seen where there is cyclic data.  For example, if you were molding the the snowfall in a given month and your X axis starts with January and ends with December then you would likely see more snowfall on the two ends and less in the middle.  Can you think of any variables in information security that might follow a U-shaped distribution?

No comments: