Conceptual and mathematical summary for machine learning

Machine learning makes use of multiple mathematical formulas and relations to implement the different tasks it can handle. Gathered in the following “cheat sheets” by Afshine and Shervine Amidi, the concepts for supervised and unsupervised learning, deep learning together with machine learning tips and tricks, probabilities, statistics algebra and calculus reminders, are all presented in details with the underlying math.

Gradient descent diagram

Based on the Stanford course on Machine Learning (CS 229), the cheat sheets summarize the important concepts of each branch with simple explanations and diagrams, such as the following table cover underfitting and overfitting.

UnderfittingJust rightOverfitting
Symptoms• High training error
• Training error close to test error
• High bias
• Training error slightly lower than test error• Very low training error
• Training error much lower than test error
• High variance
Regression illustrationIllustrationIllustrationIllustration
Classification illustrationIllustrationIllustrationIllustration
Deep learning illustrationIllustrationIllustrationIllustration
Possible remedies• Complexify model
• Add more features
• Train longer
• Perform regularization
• Get more data

The main machine learning cheat sheets can be found here:

Other mathematics and coding cheat sheets can be found here:

The complete cheat sheets can also be found on Github.

Guide to real Machine Learning applications

This series of articles dives deeper into the actual applications of Machine Learning that are currently in use in many current technological processes and devices.

Amazon Alexa

Through these posts entitled “Machine Learning is Fun!”, Adam Geitgey guides us step by step through the concepts, data, algorithms, code, results and pitfalls of machine learning applications from image, face and speech recognition to language translation and more. It also gathers several different sources for more details on each application and its development.

Image encoding

This series is really dense with detailed code, but it is also explained very clearly, step by step, with detailed illustration. It notably covers the use of a Convolutional Neural Network (including Generative Adversarial Network) and Recurrent Neural Network, together with some of their most prominent applications in daily life. It is a real course not to be missed for any ML developer!

Here is the list of posts with direct links:

Catalogue of neural networks architectures

Neural networks come in a wide range of shapes and functions, with diverse architectures and parameters for input, hidden and output nodes as well as convolutive or recurrent nodes.

Overview of the most popular neural networks
Overview of the most popular neural networks

Regrouped in a convenient summary by Fjodor Van Veen, the most popular architectures for neural networks have been cataloged with detailed descriptions for each type of neural network. The complete post with explanations on the use and goals of each network can be be found on the Asimov Institute “the neural network zoo“.

Programming a simple classifier with TensorFlow

TensorFlow is an open-source machine learning framework developed by Google. It relies upon Tensors (multi-dimensional arrays) which empower a wide range of API to develop machine learning applications, primarily deep neural networks. TensorFlow is commonly used in machine learning practice, so better start using it already.

Thankfully the TensorFlow website provides a guide for programmers as well as detailed tutorials. Here is the basic tutorial to get get started with TensorFlow. To accompany programmers, Google cloud has also created a series of videos on machine learning and TensorFlow.

This next video is going over the basic tutorial with iris flowers images classification. Yufeng Guo walks us through the initial tutorial to develop a linear model to classify flowers, corresponding to the explanations and code available in the page “getting started with TensorFlow: Premade Estimators” and aimed at readers who have some experience in machine learning.

Note: to get this tutorial running well, you will need to have a Python IDE (such as PyCharm, or a Jupyter notebook) with a virtual environment loaded with the TensorFlow, Pandas and Numpy librairies. You will also need a Git client software (Git for Windows if you’re using Windows) to download the data from GitHub. You may use Anaconda to properly load the librairies in your Python IDE.

Programming a simple neural network

Though neural networks were considered to be of little use for a long time, the recent development of computing power and database size has proven otherwise. Since the revolution of machine learning in the last few years has been primarily driven by them, let’s dive right into the actual coding of neural nets.

Before coding, it can be useful to review the principles of neural nets to make sure we understand what we will be doing here. Thankfully, the work of coding has already been done by Milo Spencer-Harper, based upon the previous works of Andrew Trask. He guides us step by step into building a single-layer neural net and multiple-layer neural net, with crystal-clear coding and without using any machine learning library.

Since a single-layer neural net is of little use and the problem it solves can better be achieved through other methods, it is included here as a didactic step to better understand and learn to code multiple-layer neural nets.

In these two example, the complexity of the problem, and therefore the number of layers needed to solved it, consists in the number of columns of data to be taken into account. The single layer neural network solves the problem when one column of data is critical, the multiple-layers neural net (2 layers) when two columns are critical.

Single-layer neural net

In the first post, the building of a simple neural network is detailed through the following key steps synthesized here. The data set is a 3 columns matrix where only one column affects the results.  The single layer neural net is used to understand the direct influence this single column of data over the result.

Training data: Only the first column of input impacts the output

Since the code has been written in a previous version of Python, here are also included fully functional updated version for Python 3.6. Here is the light 9-line initial code for the single layer neural network.

Training Process

  1. Take inputs, adjust with weights (positive or negative numbers), norm them through a sigmoid function
  2. Calculate the error between the neuron’s output and the actual training data set
  3. Adjust weights according to the error
  4. Repeat 10,000 times

Calculate the neuron’s output

Sum the weighted input data into a sigmoid function to obtain normed results in the interval ]0;1[

logistic curve graph

The final formula for the output of the neuron is:

Adjusting weights

Use the error weighted derivative = gradient descent

  1. The neuron output from the sigmoid function indicates that if the output is close to 0 or 1, the data was close to the expected result
  2. Close to 0 or 1, the derivative of the sigmoid function is almost flat
  3. The adjustment to the weights based upon the derivative from the sigmoid function will therefore be very small when the neuron’s output is close to expected results or large when results differ from expectations

The sigmoid derivative equation is:

So the final equation to adjust weights is:

Here is a fully functional version of the final code for the single-layer neural network with all details and comments, updated for Python 3.6.

Multiple-layer neural net

In the second post, the building of a multiple neural network is detailed through the following key steps reproduced. Bear in mind that such a neural network may be to complicated to solve simple problems and that it is best to understand nonlinear patterns, where the second layer of neurons can take combination of data inputs into account.

Here this two-layer neural net is used to understand how two columns in the data influence the results.

Training data: The two first columns of input impact the output (XOR gate)

As Milo Spencer-Harper reminds us in his article, multiple layers are the source of the revolution in machine learning and artificial intelligence:

The process of adding more layers to a neural network, so it can think about combinations, is called deep learning.

The main difference in the code from the single-layer neural net is that the two layers influence the calculations for the error, and therefore the adjustment of weights. The errors from the second layer of neurons need to be propagated backwards to the first layer, this is called backpropagation.

Here is a fully functional version of the code for a two-layer neural network with all details and comments, updated for Python 3.6.

To sum up, in the following video, Siraj Raval goes over the detailed programming of a similar neural net (the original from Andrew Trask) in 4 minutes.

Attention: he uses an older version of Python, he also builds a 3-layer neural net, but the first layer is actually the input data without computation.