probabilities Archives

August 26, 2018October 23, 2019

Conceptual and mathematical summary for machine learning

Machine learning makes use of multiple mathematical formulas and relations to implement the different tasks it can handle. Gathered in the following “cheat sheets” by Afshine and Shervine Amidi, the concepts for supervised and unsupervised learning, deep learning together with machine learning tips and tricks, probabilities, statistics algebra and calculus reminders, are all presented in details with the underlying math.

Gradient descent diagram

Based on the Stanford course on Machine Learning (CS 229), the cheat sheets summarize the important concepts of each branch with simple explanations and diagrams, such as the following table cover underfitting and overfitting.

	Underfitting	Just right	Overfitting
Symptoms	• High training error • Training error close to test error • High bias	• Training error slightly lower than test error	• Very low training error • Training error much lower than test error • High variance
Regression illustration
Classification illustration
Deep learning illustration
Possible remedies	• Complexify model • Add more features • Train longer		• Perform regularization • Get more data

The main machine learning cheat sheets can be found here:

Supervised Learning
Results about linear models, generative learning, support vector machines and kernel methods
Unsupervised Learning
Formulas about clustering methods and dimensionality reduction
Deep Learning
Main concepts around neural networks, backpropagation and reinforcement learning
Machine Learning Tips and Tricks
Good habits and sanity checks to make sure that your model is trained the right way

Other mathematics and coding cheat sheets can be found here:

Probabilities and Statistics
Formulas about combinatorics, random variables, main probability distributions, and parameter estimation
Linear Algebra and Calculus
Matrix-vector notations as well as algebra and calculus properties
Getting started with Matlab
Main features and good practices to adopt

The complete cheat sheets can also be found on Github.

January 3, 2018October 23, 2019

22. Probabilistic Inference II

Beliefs nets

Continued from previous class

Events diagrams must always be arranged in a way so that there are final nodes and no loops. Recording probabilities in tables for each event, the tables are filled by repeating experience so as to know the probabilities and occurrences of each event.

Bayesian inference

Several models can be drawn for a given set of events. To know which model is right, the Bayesian probabilities formulas can be used to confirm if events are independent or not, make them easier to compute, and choose the more appropriate model.

P(a/b) = P(a,b) / P(b)
P(a/b) P(b) = P(a,b) = P(b/a) P(a)
P(a/b) = P(b/a) P(a) / P(b)

Defining a as a class, and b as the evidence, the probability of the evidence given the class can be obtained through these formulas.

P(class/evidence) = P(evidence/class) P(class) / P(evidence)

Using the evidence from experience, classes can inferred by analyzing the results and corresponding probabilities.

Structure discovery

Given the data from experience / simulation, the right model can be sorted as it better corresponds to the probabilities. This allows to select between 2 existing models.

However if multiple models can be created, volumes of data make it impossible to compare them all. The solution is to use two models and compare them recursively. At each trial, the losing model is modified for improvements until a model fits certain criteria for success.

A trick is to use the sum of the logarithms rather than the probabilities, as large numbers of trials will make numbers too small to compute properly.

To avoid local maxima, a radical rearrangement of structure is launched after a certain number of trials.

Applications

This Bayesian structure discovery works quite well in situations when a diagnosis must be completed: medical diagnosis, lie-detector, symptoms of aircraft or program not working…

January 2, 2018October 23, 2019

21. Probabilistic Inference I

Probabilities in Artificial Intelligence

With a joint probability table, recording the tally of crossed events occurrence will allow us to measure the probabilities of each event happening, conditional or unconditional probabilities, independence of events, etc.

The problem with such table is that as the number of variables increase, the number of rows in the table grows exponentially.

Reminders of probabilities formulas

Basic axioms of probability

0 ≤ P(a) ≤ 1
P(True) = 1 ; P(False) = 0
P(a+b) = P(a) + P(b) – P(a,b)

Basic definitions of probability

P(a/b) = P(a,b) / P(b)
P(a,b) = P(a/b) P(b)
P(a/b,c) = P(a/b,c) P(b,c) = P(a/b,c)P(b/c)P(c)

Chain rule of probability

By generalizing the previous formula, we obtain the following chain rule:

$Chain rule of probability$

Independence

Independent events

P(a/b) = P(a) if a and b are independent

Conditional independence

If a and b are independent

P(a/b+z) = P(a/z)
P(a+b/z) = P(a/z)P(b/z)

Belief nets

Causal relations between events can be represented in nets. These models highlight that any event is only dependent from its parents and descendants. Recording the probabilities at each node, the number of table and rows is significantly smaller than a general table of all events tallies.