The Probability Density Function (PDF) with Python

PDF Probability Distribution Function

The Probability Density Function (PDF) is a mathematical function that describes the relative probability of a random variable taking on certain values. In other words, it provides a representation of the probability distribution of a continuous variable. The PDF is non-negative and the area under the curve is 1, as it represents the total probability. For example, in the normal distribution, the PDF is represented by a bell curve.

The Probability Density Function

The Probability Density Function (PDF) is a function that describes the probability distribution of a continuous random variable. The PDF is often denoted as (f(x)), where (x) represents the value of the random variable.

PDF equation:
The specific form of the equation depends on the distribution of the random variable. For example, for the standard normal distribution, the PDF is:

 f(x) = \frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}

Where:

  • \pi is the value of Pi (circa 3.14159),
  • e is the base of the natural logarithm (circa 2.71828),
  • \frac{1}{\sqrt{2\pi}} is a normalization constant.

This PDF describes the shape of the classical Gaussian bell. The PDF is therefore a key tool for understanding and working with continuous random variables, allowing you to make predictions, calculate probabilities and analyze the shape of the probability distribution.

The derivation of the PDF depends on the specific distribution of the random variable. In some cases, it can be derived from the Cumulative Distribution Function (CDF).

Utilities of PDF

  • Probability Calculation: The area under the PDF curve in an interval corresponds to the probability that the random variable falls in that interval. This is achieved by integrating the PDF over that range.
  • Calculating Statistics: The PDF allows you to calculate various statistics, such as mean and variance, by integrating the appropriate expressions on the PDF.
  • Predictions and Analysis: The shape of the PDF provides information about the distribution of the random variable. For example, in the normal distribution, a bell-shaped PDF indicates that central values are more likely than extreme values.
  • Comparison of Distributions: Can be used to compare different distributions and understand how the probabilities change in different regions of the random variable.

In essence, PDF is a fundamental tool for understanding and working with continuous random variables, providing a detailed description of the probability distribution and enabling a wide range of statistical analyses.

Book - Practical Statistics for Data Scientists

Recommended Book:

If you are interested to this topic, I suggest to read this:

Practical Statistics for Data Scientists

Some examples in Python

Here are some code examples involving the Probability Density Function (PDF) for some common distributions using the Python language with the SciPy library:

Normal Distribution

The most commonly used probability distribution is that of the normal distribution. The following code allows us to calculate it and display it in a graph.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

mu, sigma = 0, 1  # Mean and standard deviation
x = np.linspace(-5, 5, 1000)
pdf = norm.pdf(x, mu, sigma)

plt.plot(x, pdf, label='Distribuzione Normale')
plt.title('PDF of the Normal Distribution')
plt.xlabel('Value')
plt.ylabel('Probability')
plt.legend()
plt.show()

Running the code we obtain the following graph which corresponds to the PDF of the normal distribution

PDF of the normal distribution

Uniform Distribution

Another well-known and used probability distribution function is that relating to the uniform distribution. The PDF for a random variable (X) with uniform distribution between (a) and (b) is defined as follows:

 f(x; a, b) = \frac{1}{b - a}

where a and b are the parameters that define the support range of the uniform distribution, and x is a value within this range. The PDF is zero outside this range.

This equation represents the constant probability density between a and b , meaning that every point within this interval has the same probability of being sampled.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import uniform

a, b = 0, 1  # Extrema of the interval
x = np.linspace(a - 0.1, b + 0.1, 1000)
pdf = uniform.pdf(x, a, b - a)

plt.plot(x, pdf, label='Uniform Distribution')
plt.title('Uniform Distribution PDF')
plt.xlabel('Value')
plt.ylabel('Probability')
plt.legend()
plt.show()

Executing you obtain the following graph.

PDF of the uniform distribution

Exponential Distribution

Another widely used distribution is the exponential one. The PDF for a random variable (X) with an exponential distribution is given by:

 f(x; \lambda) = \lambda e^{-\lambda x}

Where x \geq 0 and \lambda > 0 is the scaling parameter.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import expon

# Parameter of the exponential distribution
lambda_param = 0.5

x = np.linspace(0, 5, 1000)
pdf = expon.pdf(x, scale=1/lambda_param)

plt.plot(x, pdf, label='Esponenziale')
plt.title('Probability Density Function (PDF) of the Exponential Distribution')
plt.xlabel('Value')
plt.ylabel('Probability Density')
plt.legend()
plt.show()

By running the code you obtain the representation of the PDF in the following graph.

PDF of the exponential distribution

Gamma Distribution

Yet another distribution is the gamma one. Here too in this case is the code to obtain your PDF. The PDF for a random variable X with gamma distribution is given by:

 f(x; \alpha, \beta) = \frac{\beta^\alpha x^{\alpha-1} e^{-\beta x}}{\Gamma(\alpha)}

Where x \geq 0, \alpha > 0 is the shape parameter, \beta > 0 is the scale parameter, and \Gamma(\alpha) is the gamma function.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gamma

# Parameters of the gamma distribution
alpha = 2
beta = 1

x = np.linspace(0, 10, 1000)
pdf = gamma.pdf(x, alpha, scale=1/beta)

plt.plot(x, pdf, label='Gamma')
plt.title('Probability Density Function (PDF) of the Gamma Distribution')
plt.xlabel('Value')
plt.ylabel('Probability Density')
plt.legend()
plt.show()
PDF of the gamma distribution
Python Data Analytics

If you want to delve deeper into the topic and discover more about the world of Data Science with Python, I recommend you read my book:

Python Data Analytics 3rd Ed

Fabio Nelli

Log-Normal Distribution

To conclude, let’s also look at the Log-Normal probability distribution function. The PDF for a random variable (X) with log-normal distribution is given by:

 f(x; \mu, \sigma) = \frac{1}{x\sigma \sqrt{2\pi}} e^{-\frac{(\ln(x) - \mu)^2}{2\sigma^2}}

Where x > 0, \mu is the mean of the logarithm, \sigma is the standard deviation of the logarithm.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import lognorm

# Parameters of the log-normal distribution
mu = 0
sigma = 0.1

x = np.linspace(0, 3, 1000)
pdf = lognorm.pdf(x, sigma, scale=np.exp(mu))

plt.plot(x, pdf, label='Log-Normale')
plt.title('Probability Density Function (PDF) of the Log-Normal Distribution')
plt.xlabel('Value')
plt.ylabel('Probability Density')
plt.legend()
plt.show()
PDF of the Log-normal distribution

These are just a few examples of continuous distributions that can be explored using the SciPy library in Python. Each distribution has its own specific parameters that can be adapted according to the needs of the analysis.

Leave a Reply