In machine learning, entropy and information gain are fundamental concepts used in decision trees and supervised learning to make data division decisions during the training process of a model. These concepts are often associated with the Iterative Dichotomiser 3 (ID3) algorithm and its variants, such as C4.5 and CART.
Entropia and Information Gain
Entropy: Entropy is a measure of the disorganization or uncertainty in a data set. In a machine learning context, entropy is used to evaluate the “purity” of a data set, i.e. how homogeneous or heterogeneous they are. The entropy of a data set is calculated as:
Where E(S) is the entropy of the dataset, and pi represents the probability of belonging to a specific class i in the dataset. When entropy is high, it means that the dataset is very heterogeneous, while when it is low, the set is more homogeneous.
Information Gain: Information gain is a concept used to measure how much a given feature contributes to reducing the entropy or uncertainty in the data. In other words, information gain tells us how informative a feature is in dividing the data into more homogeneous classes. The information gain for a feature A with respect to a data set S is calculated as:
Where IG(S,A) is the information gain, E(S) is the entropy of data set S, values(A) are the distinct values of feature A, ∣Sv∣ represents the size of the subset of data where feature A has the value v, and E(Sv) is the entropy of the subset Sv.
The goal in building a decision tree is to select the feature that maximizes the information gain, as this feature contributes more to dividing the data into more homogeneous classes. This process is performed recursively to create a decision tree that can be used to make classification or regression decisions on test data.
In summary, entropy measures the uncertainty in the data, while information gain measures how informative a feature is in the division of data. These concepts are important to supervised learning and are often used to build effective decision trees.
An example in Python
Here are practical examples in Python for calculating entropy and information gain using an example dataset. We will be using the numpy Python library for math calculations.
Suppose we have a dataset with two classes, “Yes” and “No”, and a feature called “Outlook” with three possible values: “Sunny”, “Cloudy” and “Rainy”. We will calculate the entropy and information gain with respect to this feature.
import numpy as np
#
Function to calculate entropydef entropy(probabilities): return -np.sum(probabilities * np.log2(probabilities)) #
We calculate the initial entropy of the datasettotal_samples = 14 # Totale delle istanze nel dataset yes_samples = 9 # Numero di istanze con classe "Sì" no_samples = 5 # Numero di istanze con classe "No" p_yes = yes_samples / total_samples # Probabilità della classe "Sì" p_no = no_samples / total_samples # Probabilità della classe "No" initial_entropy = entropy([p_yes, p_no]) print("Initial Entropy:", initial_entropy) #
Now suppose we split the data based on the "Outlook" feature#
and calculate the information gain.#
Data for the "Sunny" divisionsamples_sunny = 5 p_sunny_yes = 3 / samples_sunny p_sunny_no = 2 / samples_sunny entropy_sunny = entropy([p_sunny_yes, p_sunny_no]) #
Data for the "Cloudy" divisionsamples_overcast = 5 p_overcast_yes = 4 / samples_overcast p_overcast_no = 1 / samples_overcast entropy_overcast = entropy([p_overcast_yes, p_overcast_no]) #
Data for the "Rainy" divisionsamples_rainy = 4 p_rainy_yes = 2 / samples_rainy p_rainy_no = 2 / samples_rainy entropy_rainy = entropy([p_rainy_yes, p_rainy_no]) #
We calculate the information gainweighted_entropy = (samples_sunny / total_samples) * entropy_sunny + \ (samples_overcast / total_samples) * entropy_overcast + \ (samples_rainy / total_samples) * entropy_rainy
print("Weighted Entropy:", weighted_entropy)information_gain = initial_entropy - weighted_entropy print("Information Gain:", information_gain)
In this example, we calculated the initial data set entropy and information gain for the “Outlook” feature by dividing the data by its (“Sunny”, “Cloudy” and “Rainy”) values.
Running the above code, you get the following result:
Initial Entropy: 0.9402859586706311
Weighted Entropy: 0.8903138176221539
Information Gain: 0.04997214104847725
The information gain tells us how informative the “Outlook” feature is in dividing the data into more homogeneous classes. The greater the information gain, the greater the importance of this feature in building a decision tree.