Working with images can be a very time-consuming task, especially if you have many images to work on. Machine learning can thus be a great time-saver for various image analysis and editing tasks, such as finding the dominant colors of an image thanks to the K-means clustering algorithm.
This series of articles dives deeper into the actual applications of Machine Learning that are currently in use in many current technological processes and devices.
Through these posts entitled “Machine Learning is Fun!”, Adam Geitgey guides us step by step through the concepts, data, algorithms, code, results and pitfalls of machine learning applications from image, face and speech recognition to language translation and more. It also gathers several different sources for more details on each application and its development.
This series is really dense with detailed code, but it is also explained very clearly, step by step, with detailed illustration. It notably covers the use of a Convolutional Neural Network (including Generative Adversarial Network) and Recurrent Neural Network, together with some of their most prominent applications in daily life. It is a real course not to be missed for any ML developer!
Here is the list of posts with direct links:
- Part 1: The world’s easiest introduction to Machine Learning
- Part 2: Using Machine Learning to generate Super Mario Maker levels
- Part 3: Deep Learning and Convolutional Neural Networks
- Part 4: Modern Face Recognition with Deep Learning
- Part 5: Language Translation with Deep Learning and the Magic of Sequences
- Part 6: How to do Speech Recognition with Deep Learning
- Part 7: Abusing Generative Adversarial Networks to Make 8-bit Pixel Art
- Part 8: How to Intentionally Trick Neural Networks
Image recognition by a deep neural net
Convolution: a neuron looks for patterns in a small portion (10×10 px) of an image (256×256 px), the process is repeated by moving this small area little by litte.
Pooling: The result of the convolution is computed as a point for each portion analyzed. By a similar step by step process, a small set of points are computed into values by choosing the maximum value (“max pooling”).
By reproducing the pooling process multiple times (100x), and feeding it to a neural net, it will compute how likely the initial image is recognized as a known category.
A small number of neurons (~2), the “hidden layer“, a bottleneck of neurons between two columns of multiple neurons (~10) is used to obtain output values z[n] that are the same as input values x[n].
Such results implies that a form a generalization is accomplished by the hidden layer, or rather, a form of encoded generalization, as the actual parameters of the bottleneck of neurons seems not so obvious to understand.
Final layer of neurons
As the neural net is trained with parameters and thresholds, the shape and corresponding equation of the sigmoid function is adapted to properly sort positive and negative results, by maximizing the probability of sorting examples properly.
Instead of sorting by the maximum value and the corresponding category, the final output is an array of the most probable categories (~5 categories).
The problems of neural nets is that they can get blocked in local maximum areas. To prevent this, at each computation, one neuron is deactivated to check if its behavior is skewing the neural net. At each new computation another is shut down, or dropped out, to check all neurons.
Thanks to wider neural networks, neural nets can avoid being jammed into local maximum as they can analyze local maximum through more parameters.