The Central Limit Theorem

The Central Limit Theorem is a pretty important concept in statistics. It states that, even if the original probability distribution isn’t normal, the mean of the samples taken from this distribution is distributed normally as the number of samples increases. What does this mean? Let’s try an experiment. You’re going to need python (I’m using python 3.7), numpy for the math bits, and matplotlib and seaborn for the plotting (actually you only need matplotlib but I like how seaborn looks)

The Uniform Distribution

We will start with a continuous uniform distribution, U(0,1) and we’ll just sample it a bunch of times using numpy’s random.random() function.

This will produce the following plots:

Top. The continuous uniform distribution on (0,1).
Bottom. The result of sampling 1,000,000 times from this distribution

This is pretty much as expected, you get a nice flat distribution. Okay, let’s put the central limit theorem to the test. We will do four experiments. We will draw N times from the uniform distribution and calculate its mean. This will be our random variable. We will repeat the experiment 100,000 times for each N, using N = 2, 10, 25, and 50 to see how N affects the final distribution. By the way, you can think of the above graph as N = 1.

We can use the following code to carry out our experiment:

The blue bars are the histogram of the random variable X, the mean of drawing N=2, 10, 25, and 50 samples from a uniform distribution. The red lines represent a normal distribution with mean and standard deviation equal to the sampled data.

As you can see, the average of pulling N samples from a uniform distribution looks a lot like a normal distribution. Increasing N decreases the standard deviation of the distribution, but doesn’t change the mean, which is always 0.5. You can change N by changing num_events in the above code to see other values of N.

To Be Continued

Okay, well that was neat, but that was for a uniform distribution. What about some weirdly shaped distribution? Does the central limit theorem still hold? I’m going to cover that in a second post, because it will include some code for sampling a weird distribution that is itself useful to know.

Leave a Reply

Your email address will not be published. Required fields are marked *