Site icon Meccanismo Complesso

Linear Regression with Elastic Net in Machine Learning with scikit-learn

Elastic Net linear regression
Elastic Net linear regression header

Elastic Net is a linear regression technique that adds a regularization term by combining both the L1 penalty (as in Lasso regression) and the L2 penalty (as in ridge regression). So, it is based on the linear regression model, but with the addition of these penalties to improve the performance of the model, especially when there are multicollinearities between the variables or you want to make a selection of the variables.

Elastic Net

The Elastic Net was introduced in 2005 by two researchers, Hui Zou and Trevor Hastie, in their article entitled “Regularization and variable selection via the elastic net”, published in the journal “Journal of the Royal Statistical Society: Series B (Statistical Methodology )”.

Zou and Hastie developed Elastic Net as a solution to address the limitations of Lasso regression and ridge regression, two widely used regression techniques. Both of these techniques had their advantages, but also significant drawbacks: Lasso regression tended to select a small subset of predictor variables, while ridge regression retained all variables but performed no true variable selection.

The Elastic Net combined the features of both methods, introducing a mixed regularization that includes both the L1 penalty and the L2 penalty. This allowed us to obtain the variable selection benefits of Lasso regression and the stability of Ridge regression.

Zou and Hastie’s paper sparked great interest in the statistics and machine learning community, leading to the widespread adoption of Elastic Net in various application areas. Since then, Elastic Net has become a very popular tool for regression and data analysis, used to address a wide range of problems, including those with high-dimensional data and multicollinearity.

Elastic Net is a regression model that combines aspects of linear regression and Lasso (Least Absolute Shrinkage and Selection Operator) regression to handle multicollinearity and variable selection problems.

Traditional linear regression can suffer from multicollinearity problems when the independent variables are highly correlated with each other. Lasso regression addresses this problem by imposing a penalty on the sum of the absolute values of the coefficients during the training process, which tends to reduce some coefficients to zero, thus performing a kind of variable selection.

However, Lasso regression can be too strict in its variable selection, eliminating too many coefficients and potentially ignoring useful variables.

Elastic Net aims to overcome these limitations by combining the Lasso regression penalty with an additional penalty, similar to the L2 norm of ridge regression. This allows Elastic Net to maintain some of the advantages of Lasso regression in variable selection, while at the same time alleviating the excessive tendency to select variables when there are strong correlations between predictors.

The general form of the objective function for the Elastic Net is:

Where:

In summary, Elastic Net offers greater flexibility than Lasso regression and ridge regression, allowing you to effectively handle multicollinearity and variable selection problems.

Elastic Net with scikit-learn for Linear Regression

Elastic Net is integrated into the Python scikit-learn library, which is one of the most used libraries for machine learning. In scikit-learn, you can use the ElasticNet class to train Elastic Net regression models.

In this example, we will use an Elastic Net regression model that will train on randomly generated synthetic data using make_regression from scikit-learn. Once a dataset has been generated, it will be divided into two portions: training set and testing set. We will use the fit function to train the model on the training set. Once trained, we will make predictions on the testing set using predict. To evaluate the performance of the model, the mean square error is used as a metric. This value is easily obtained from the mean_squared_error function provided by scikit-learn. Here is the code that does all these tasks:

from sklearn.linear_model import ElasticNet
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Generate synthetic data for the example
X, y = make_regression(n_samples=100, n_features=10, noise=0.1, random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the Elastic Net model
elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.5, random_state=42)

# Train the model on the training data
elastic_net.fit(X_train, y_train)

# Make predictions on the test set
predictions = elastic_net.predict(X_test)

# Evaluate the model's performance
mse = mean_squared_error(y_test, predictions)
print("Mean Squared Error:", mse)

Running you get the MSE value:

Mean Squared Error: 176.0283275056508

Since this is a value which must be as small as possible, but read in absolute value it does not give us any information. We can use a very useful graphical representation that allows us to understand how the predicted values differ from the real ones for the entire range of the dataset. Here is the code to generate the graph:

import matplotlib.pyplot as plt

# Plot of predictions versus actual values
plt.scatter(y_test, predictions)
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], 'r--')
plt.xlabel("Actual Values")
plt.ylabel("Predictions")
plt.title("Scatter plot: Actual Values vs Predictions")
plt.show()

Executing you obtain the graph described previously.

As we can see the points do not deviate much from the red diagonal (predicted value = real value). So our model proved to be a good prediction model in this case.

A metric that gives us similar information is R^2 (or coefficient of determination). This is a common metric used to evaluate how well regression models predict. R^2 measures the proportion of variance in the dependent variable that is explained by the model. A value closer to 1 indicates a better model, while a value closer to 0 indicates a worse model.

from sklearn.metrics import r2_score

r2 = r2_score(y_test, predictions)
print("Coefficient R^2:", r2)

Running we get the value of the metric:

Coefficient R^2: 0.9970546671780763

As we can see, it is a value very close to 1. This only confirms what was said after viewing the previous graph.

Example with the diabetes dataset for a Linear Regression with Elastic Net

So far the model has performed very well in predicting artificially generated values. But how will it behave with a dataset of real data, such as diabetes provided by the scikit-net library.

The “diabetes” dataset included in the scikit-learn library is an example dataset that contains measurements derived from diabetes patients. It is commonly used for machine learning and data analysis purposes.

Here is a quick description of the characteristics of the “diabetes” dataset:

The 10 attributes/predictors represent:

The target represents diabetes disease progression over one year and is represented as a quantitative measure of disease progression.

This dataset is often used for educational examples and to evaluate the performance of regression algorithms in machine learning. Let’s see how our Elastic Net model behaves in this regard.

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error, r2_score

# Load the "diabetes" dataset
diabetes = load_diabetes()
X = diabetes.data
y = diabetes.target

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the Elastic Net model
elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.5, random_state=42)

# Train the model on the training data
elastic_net.fit(X_train, y_train)

# Make predictions on the test set
predictions = elastic_net.predict(X_test)

# Calculate the Mean Squared Error (MSE) and the coefficient R^2
mse = mean_squared_error(y_test, predictions)
r2 = r2_score(y_test, predictions)

print("Mean Squared Error (MSE):", mse)
print("Coefficient R^2:", r2)

Running you get the values:

Mean Squared Error (MSE): 4775.466767154695
Coefficient R^2: 0.09865421116113748

These are certainly not excellent results…

Improving the model: the search for optimal parameters

You might consider adopting an optimal parameter search strategy using random search to explore different combinations of hyperparameters for your Elastic Net model. Here is an example of how we could do this using scikit-learn’s RandomizedSearchCV class:

from sklearn.model_selection import RandomizedSearchCV
import numpy as np

# Define the parameter grid to explore
param_grid = {
    'alpha': np.linspace(0.1, 1.0, 10),  # alpha values from 0.1 to 1.0
    'l1_ratio': np.linspace(0.1, 0.9, 9)  # l1_ratio values from 0.1 to 0.9
}

# Initialize the Elastic Net model
elastic_net = ElasticNet(random_state=42)

# Search for optimal parameters using random search
random_search = RandomizedSearchCV(estimator=elastic_net, param_distributions=param_grid, n_iter=100, cv=5, scoring='neg_mean_squared_error', random_state=42)
random_search.fit(X_train, y_train)

# Get the best model
best_elastic_net = random_search.best_estimator_

# Make predictions on the test set
predictions = best_elastic_net.predict(X_test)

# Calculate Mean Squared Error (MSE) and Coefficient R^2
mse = mean_squared_error(y_test, predictions)
r2 = r2_score(y_test, predictions)

print("Mean Squared Error (MSE):", mse)
print("Coefficient R^2:", r2)
print("Best parameters:", random_search.best_params_)

In this code, we define a grid of parameters to explore for alpha and l1_ratio, and use RandomizedSearchCV to perform a random search on this grid. After finding the optimal parameters, we train a new Elastic Net model using these optimal parameters and evaluate the model’s performance on the test set. Finally, we print the MSE, R^2 and the best parameters found.

Mean Squared Error (MSE): 3792.129166396345
Coefficient R^2: 0.2842543312471031
Best parameters: {'l1_ratio': 0.9, 'alpha': 0.1}

We are still far away…

Improving the model: data normalization

The results obtained, although improved compared to the previous iteration, are still not satisfactory. However, we can continue to explore additional strategies to try to improve the model’s performance.

Certainly! Data normalization is a common practice in machine learning that can help improve model performance, especially when data features have different scales. We can use standardization or min-max normalization to normalize the data.

Here is an example of how we can apply data standardization using scikit-learn’s StandardScaler and then retrain the Elastic Net model:

from sklearn.preprocessing import StandardScaler

# Data standardization
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Training the Elastic Net model on standardized data
elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.9, random_state=42)
elastic_net.fit(X_train_scaled, y_train)

# Predictions on the test set
predictions = elastic_net.predict(X_test_scaled)

# Calculating MSE and R^2
mse = mean_squared_error(y_test, predictions)
r2 = r2_score(y_test, predictions)

print("Mean Squared Error (MSE) with normalization:", mse)
print("Coefficient R^2 with normalization:", r2)

In this code, we apply data standardization using StandardScaler on training and test data. Next, we train the Elastic Net model on the standardized data and calculate the MSE and R^2 coefficient using the model’s predictions.

Mean Squared Error (MSE) with normalization:: 2878.8291644017645
Coefficient R^2 with normalization:: 0.45663520015111103

We have doubled the predictive capacity of the model, but still a value of 0.45 of R^2 is still too low. Normalizing the data can help the model converge more quickly and can improve overall performance. However, it is important to test the effect of data normalization on model performance and evaluate whether it actually improves performance.

Improving the Model: Feature Engineering

Feature engineering is an important step in the process of developing a machine learning model. We can explore different transformations of existing features or create new features based on existing ones to try to better capture the relationships between the independent and dependent variables.

Here are some examples of possible feature engineering techniques we might consider for the “diabetes” dataset:

Among the possible options we try to reduce the dimensions, using the PCA technique and apply polynomial features. To do this, we define a pipeline that combines data normalization with PCA. Next, we pipeline the training and test data and then add the degree 2 polynomial features.

from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline

# Definition of the pipeline with normalization and PCA
pipeline = Pipeline([
    ("scaler", StandardScaler()),
    ("pca", PCA(n_components=0.95))  # Preserve 95% of variance
])

# Application of the pipeline to training and test data
X_train_pca = pipeline.fit_transform(X_train)
X_test_pca = pipeline.transform(X_test)

# Adding polynomial features of degree 2
poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train_pca)
X_test_poly = poly.transform(X_test_pca)

# Training the Elastic Net model on polynomial features
elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.9, random_state=42)
elastic_net.fit(X_train_poly, y_train)

# Predictions on the test set
predictions = elastic_net.predict(X_test_poly)

# Calculating MSE and R^2
mse = mean_squared_error(y_test, predictions)
r2 = r2_score(y_test, predictions)

print("Mean Squared Error (MSE) with PCA and polynomial features:", mse)
print("Coefficient R^2 with PCA and polynomial features:", r2)
Mean Squared Error (MSE) with PCA and polynomial features: 2605.224776619983
Coefficient R^2 with PCA and polynomial features: 0.5082766783059008

It’s a further improvement in the model’s performance! Combining data normalization with PCA and polynomial features led to a further reduction in mean square error (MSE) and an increase in R^2 coefficient, indicating that the model is providing better predictions.

And so on. However in these cases, you should evaluate the possibility of using different methods and see if you get better results, before continuing to proceed with increasingly complex optimization processes (since we are still around R^2 = 0.5).

When to use Elastic Net in Linear Regressions

The previous example makes us understand that choosing the right model can make a difference in forecasting capabilities, especially within each dataset. Let’s see some rules that could be useful in this regard.

The choice between Elastic Net and other linear regression methods depends on the specific characteristics of your problem and data. Here are some considerations for when it might be appropriate to choose Elastic Net over other linear regression methods:

Elastic Net is a regularization method that combines both L1 (lasso) regularization and L2 (ridge) regularization. It is particularly useful when there are many predictor variables in the dataset or when these variables are highly correlated with each other (multicollinearity).

However, there is no regression method that is universally optimal for all types of datasets. There are several reasons why Elastic Net may not work well with a specific dataset like “diabetes”:

In summary, if Elastic Net does not work well with a certain dataset such as “diabetes”, you may need to explore other options, such as hyperparameter tuning, feature engineering, or using more complex models, to achieve better performance .

Exit mobile version