Site icon Meccanismo Complesso

The Central Limit Theorem with Python

Central Limit Theorem
Central Limit Theorem header

Statistics is a fundamental discipline for the analysis and interpretation of data. One of the most powerful conceptual tools in statistics is the Central Limit Theorem (CLT). This theorem is crucial to inferential statistics and provides the basis for many statistical analyzes applied in a wide range of fields.

The Central Limit Theorem

The Central Limit Theorem is one of the fundamental principles of statistics that describes the behavior of distributions of means of random samples. In essence, the theorem states that, regardless of the shape of the distribution of the starting population, the distribution of the sample means gets closer and closer to a normal (or Gaussian) distribution as the sample size increases. To understand the Central Limit Theorem, it is important to highlight some of its main foundations:

If you want to delve deeper into the topic and discover more about the world of Data Science with Python, I recommend you read my book:

Python Data Analytics 3rd Ed

Fabio Nelli

Practical Applications of the Central Limit Theorem

The Central Limit Theorem has profound practical implications. For example, it allows statisticians to make inferences about the source population even when the distribution of this population is unknown or complex. Furthermore, it justifies the use of the normal distribution in statistical procedures, even when the population distribution is unknown or non-normal.

In summary, the Central Limit Theorem constitutes a solid theoretical basis for the application of many statistical techniques in real situations, significantly contributing to making

Recommended Book:

If you are interested to this topic, I suggest to read this:

Practical Statistics for Data Scientists

Numerical Example in Python

The example provided simulates the central limit theorem using the roll of a six-sided fair die. The goal is to demonstrate how the distribution of means of an increasing number of samples becomes increasingly closer to a normal distribution, regardless of the shape of the original distribution of the data.

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats

# Number of experiments to perform
num_experiments = 1000

# List of sample numbers to consider
num_samples = [10, 30, 50, 100]

# Creation of the figure
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 10))
fig.suptitle('Central Limit Theorem with Dice Rolling', y=1.02)

for i, n in enumerate(num_samples):
    # List to store the averages of the results of each experiment
    means_experiments = []

    # Simulation of experiments
    for _ in range(num_experiments):
        results = np.random.randint(1, 7, n)  # Throwing the dice
        mean_experiments = np.mean(results)  # Calculation of the mean
        means_experiments.append(mean_experiments)

    # Histogram plot of the averages of the results
    ax = axes[i // 2, i % 2]
    sns.histplot(means_experiments, kde=True, ax=ax, color='skyblue')

    # Add a line for the theoretical normal distribution
    mean_dado = 3.5  # Average of a six-sided fair die
    std_dev_dado = (1 / 6) ** 0.5  # Standard deviation of a fair six-sided die
    x = np.linspace(1, 6, 100)
    y = stats.norm.pdf(x, mean_dado, std_dev_dado / (n ** 0.5))
    ax.plot(x, y, 'k--', linewidth=2)

    # Labels and titles
    ax.set_title(f'{n} Samples')
    ax.set_xlabel('Mean of launch results')
    ax.set_ylabel('Density')

    # Calculate and print statistical metrics
    mean_experiments_mean = np.mean(means_experiments)
    mean_experiments_std = np.std(means_experiments)
    
    # Add text with mean and standard deviation into the graph
    ax.text(0.05, 0.9, f'Media: {mean_experiments_mean:.2f}', transform=ax.transAxes, fontsize=10)
    ax.text(0.05, 0.8, f'Dev. Std.: {mean_experiments_std:.2f}', transform=ax.transAxes, fontsize=10)

# Adjust the layout and show graphs
plt.tight_layout(rect=[0, 0, 1, 0.96])
plt.show()

Running the above code gives the following result:

Here is a step-by-step description of the example:

The final goal is to visually illustrate how, by increasing the number of samples, the distribution of the means of the results becomes increasingly closer to a normal distribution, thus confirming the central limit theorem. Furthermore, by including statistical metrics in the graphs, it is possible to numerically observe how the mean and standard deviation of the means of the results converge to the values expected from the normal distribution.

Statistical metrics

In the Central Limit Theorem and its practical implications, statistical metrics are useful tools for understanding the distribution of sample means, identifying patterns, and interpreting the results of statistical analyses. The Central Limit Theorem is closely related to concepts such as the standard error, confidence interval, and margin of error. Let’s see how these concepts relate to the Central Limit Theorem:

In short, the Central Limit Theorem provides the theoretical context that justifies the use of these measures and concepts in practical situations. These tools are particularly useful when working with sample data and wanting to make inferences about the source population, exploiting the convergence properties of the distribution of sample means to the normal distribution.

Below is a general list of these items with some additional categories:

Position Measurements:

Dispersion Measurements:

Shape Measurements:

Central Trend Measures:

Some additional concepts that may be relevant in specific contexts include:

These are just a few examples, and the wide range of statistical measures reflects the complexity of data analysis and distributions. The choice of tools often depends on the nature of the data and the objectives of the statistical analysis.

Conclusions

The Central Limit Theorem is a milestone in statistical theory. Its ability to establish the normality of sample means makes it possible to apply numerous statistical methods in many real-world situations. Understanding this theorem is critical for anyone involved in analyzing data and formulating conclusions based on random sampling.

Exit mobile version