PDF Probability Distribution Function

The Probability Density Function (PDF) with Python

The Probability Density Function (PDF) is a mathematical function that describes the relative probability of a random variable taking on certain values. In other words, it provides a representation of the probability distribution of a continuous variable. The PDF is non-negative and the area under the curve is 1, as it represents the total probability. For example, in the normal distribution, the PDF is represented by a bell curve.

Student's t distribution

Student’s t Distribution with Python

The Student’s t-distribution is a probability distribution that derives from the concept of t-statistics. It is often used in statistical inference when the sample on which an analysis is based is relatively small and the population standard deviation is unknown. The shape of the t distribution is similar to the normal one, but has thicker tails, making it more suitable for small sample sizes.

Kurtosis

Kurtosis with Python

Kurtosis is a statistical measure that describes the shape of the distribution of a data set. Essentially, it indicates how much the tails of a distribution differ from those of a normal distribution. A kurtosis value greater than zero suggests heavier tails (more “pointed” distribution), while a lower value indicates lighter tails (more “flat” distribution). Kurtosis can be positive (the tails are heavier), negative (the tails are lighter), or zero (similar to a normal distribution).

Statistics - Skewness

Skewness calculation with Python

Skewness is a statistical measure that describes the skewness of the distribution of a data set. Indicates whether the tail of the distribution is shifted to the left or to the right compared to its central part. Positive skewness indicates a longer tail on the right, while negative skewness indicates a longer tail on the left.

Anova, the Analysys of Variations tecnique

Anova, the analysis of variance technique with Python

ANOVA, an acronym for “Analysis of Variance”, is a statistical technique used to evaluate whether there are significant differences between the means of three or more independent groups. In other words, ANOVA compares the means of different groups to determine whether at least one of them is significantly different from the others.

Large Language Models

Large Language Models (LLM), what they are and how they work

Large Language Models (LLM) are artificial intelligence models that have demonstrated remarkable capabilities in the field of natural language. They mainly rely on complex architectures that allow them to capture linguistic relationships in texts effectively. These models are known for their enormous size (hence the term “Large”), with millions or billions of parameters, which allows them to store vast linguistic knowledge and adapt to a variety of tasks.

Machine Learning with Python - IDE3 algorithm

The IDE3 algorithm in Machine Learning with Python

The IDE3 (Iterative Dichotomiser 3) algorithm is a predecessor of the C4.5 algorithm and represents one of the first algorithms for building decision trees. Even though C4.5 and its successors have become more popular, IDE3 is still interesting because it helped lay the foundation for decision trees and machine learning. Below, I will explain how IDE3 works and how to use it in Python.

Beyond the classification and regression problems

Machine Learning: beyond classification and regression problems

t Machine Learning is a very broad field, and there are many other types of problems and techniques beyond classification and regression. Problems such as clustering, dimensionality reduction, reinforcement learning, text generation, and many others are equally important and present unique challenges. Many advanced courses and more specialized resources also cover these less popular topics. So, if you have an interest in specific types of Machine Learning problems, you can find specialized resources to meet your needs.

Machine Learning with Python - Entropy and Information Gain

Entropy and information gain in Machine Learning

In machine learning, entropy and information gain are fundamental concepts used in decision trees and supervised learning to make data division decisions during the training process of a model. These concepts are often associated with the Iterative Dichotomiser 3 (ID3) algorithm and its variants, such as