Calculating centrality measures with Python: Mean, Median and Mode

webmaster

6 months ago

Centrality measures, such as mean, median, and mode, are fundamental in descriptive statistics. The mean represents the average value of a set of data, the median indicates the central value when the data is sorted, while the mode identifies the most frequent value. Each provides a unique view of the distribution of the data, useful for understanding their central tendency. These measures help synthesize complex information, facilitating comparisons across sets of data and supporting informed decisions in diverse contexts, from financial analysis to business performance monitoring.

Centrality Measures

Centrality measures are like “benchmarks” that give us an idea of the average or typical value of a data set. Imagine you have a set of numbers representing the heights of a group of people. The mean would give you an estimate of the average height of the group, the median would tell you what height is in the middle of the distribution (i.e., half of the people are taller and half are shorter than this value), while the mode would tell you what height it is the most common.

These measures are like “magnifying glasses” that allow us to focus attention on different aspects of the data. For example, if we have a wide range of values and want to understand which value best represents the majority of the data, we would look at mode. If we want to understand what the overall average value is, considering all the values, we would look at the mean. If we are interested in the central element that divides the group into two equal parts, we would look at the median.

These measures are extremely useful because they allow us to summarize a complex set of data into a single value that represents their “central position”, giving us a starting point to better understand the distribution of the data and make informed decisions based on it.

Mean, Median and Mode

Centrality measures are tools used in descriptive statistics to represent the “center” of a data set, so as to get an idea of their typical position or average value. The main measures of centrality are the mean, the median and the mode.

Mean:
The mean of a data set is calculated as the sum of all values divided by the total number of values. Mathematically, if we have a data set X with n elements, the mean is given by:

Where represents the ith element of the data set.

Median:
The median is the central value of an ordered data set. If the data set has an odd number of elements, the median is simply the central value. If it has an even number of elements, the median is the average of the two central values. Mathematically, if (X) is the ordered data set, the median (M) is given by:

Mode:
Mode is the value that occurs most frequently in a data set. There can be more than one mode if multiple values occur with the same maximum frequency. Mathematically, if we have a data set X, the mode is the value that appears most frequently.

These are the main centrality measures used in descriptive statistics.

Calculate the mean, median and mode with Python

You can calculate the mean, median, and mode of a data set using several Python libraries, such as numpy and statistics. Here are some examples on how to do this:

Using the numpy library:

import numpy as np

# Sample data
data = [2, 4, 6, 8, 10]

# Calculating the mean
mean = np.mean(data)
print("Mean:", mean)

# Calculating the median
median = np.median(data)
print("Median:", median)

# Calculating the mode
from scipy.stats import mode
mode_result = mode(data)
print("Mode:", mode_result[0])

Executing we get the following result:

Mean: 6.0
Median: 6.0
Mode: 2

Using the statistics library:

import statistics

# Sample data
data = [2, 4, 6, 8, 10]

# Calculating the mean
mean = statistics.mean(data)
print("Mean:", mean)

# Calculating the median
median = statistics.median(data)
print("Median:", median)

# Calculating the mode
mode_value = statistics.mode(data)
print("Mode:", mode_value)

Executing you get the following result:

Mean: 6
Median: 6
Mode: 2

These examples show you how to calculate the mean, median, and mode of a data set using Python. Make sure to install the necessary libraries (numpy and scipy) if you don’t already have them installed in your Python environment.

Other centrality measures

In addition to these, there are other measures, such as the geometric mean, the harmonic mean, which are used in specific contexts.

In addition to the mean, median, and mode, there are other measures of centrality that are less common but equally useful in certain contexts. Here are some of them:

Geometric Mean: The geometric mean is the product of all values raised to a power equal to the inverse of the total number of values. The formula for the geometric mean of a data set X with n elements is:

The geometric mean is useful when working with data that grows or falls exponentially, such as growth rates. For example, imagine you have data representing the annual growth rate of a population. By calculating the geometric mean of these growth rates, you will obtain a value that represents the average growth rate over the period considered, taking exponential growth into account.

Harmonic Mean: The harmonic mean is the reciprocal of the average magnitude of the reciprocals of the values. The formula for the harmonic mean of a data set X with n elements is:

The harmonic mean is useful when you want to calculate a weighted average where larger values have a greater impact. For example, imagine you have data representing the time it takes to cover a given distance with varying speeds. The harmonic mean of these travel times will give you a value that takes into account the greater impact of the shorter times, thus reflecting the average time actually taken.

Trimmed Mean: The trimmed mean excludes a number of the highest and lowest extreme values before averaging the remaining values.

The trimmed mean is useful when you want to mitigate the effect of outliers or extreme values on your statistical analysis. For example, if you are analyzing house prices in a neighborhood and there are some houses selling for very high or very low prices that skew the average, you might consider calculating a trimmed average, excluding the highest and lowest prices, to obtain a more accurate estimate of the average house price.

Weighted Average: The weighted average takes into account the relative weight of each value in the data set. It is calculated by multiplying each value by its relative weight (usually a coefficient) and then dividing the sum of the products by the sum of the weights.

The weighted average is useful when you want to give greater weight to certain values than others. For example, if you are analyzing student ratings in a class and want to give more weight to the ratings of the best students, you can calculate a weighted average using the ratings as weights, so that the ratings of the best students contribute more to the overall average.

Overall, these particular averages are useful tools for gaining a deeper understanding of data distribution and central tendencies, allowing you to make informed decisions based on the specific characteristics of the analyzed data. Each of these averages has a specific application and can be used to obtain more detailed or correct information in certain analytical contexts.

Calculate these other averages with Python

To calculate these additional centrality measures with Python, we use the NumPy library to simplify the calculations. Make sure you have NumPy installed before running these examples.

import numpy as np

# Sample data
data = [1, 2, 3, 4, 5]

# Geometric mean
geometric_mean = np.prod(data) ** (1 / len(data))
print("Geometric mean:", geometric_mean)

# Harmonic mean
harmonic_mean = len(data) / np.sum(1 / np.array(data))
print("Harmonic mean:", harmonic_mean)

# Trimmed mean (excluding lowest and highest values)
sorted_values = sorted(data)
trimmed_values = sorted_values[1:-1]  # Exclude lowest and highest value
trimmed_mean = np.mean(trimmed_values)
print("Trimmed mean:", trimmed_mean)

# Weighted mean (with arbitrary weights)
weighted_data = np.array(data)
weights = np.array([0.1, 0.2, 0.3, 0.2, 0.2])  # Arbitrary weights
weighted_mean = np.average(weighted_data, weights=weights)
print("Weighted mean:", weighted_mean)

Executing you get the following result:

Geometric mean: 2.605171084697352
Harmonic mean: 2.18978102189781
Trimmed mean: 3.0
Weighted mean: 3.2

This code calculates the geometric mean, harmonic mean, trimmed mean (excluding the lowest and highest values), and weighted mean of a sample data set using the NumPy library. You can replace data with your own data set and change the weights if you want to calculate a weighted average with different weights.