Learning in human-like way, in one shot: learning something definite from each example.
The evolving model
Comparing an initial model example, a seed, with a near miss or another example, the evolving model understands an important characteristic for each new near miss or example compared.
The evolving model develops a set of heuristics to describe the seed, specializing with near misses (reducing the potential matches) or generalizing with examples (broadening the potential matches) the characteristics of the seed.
Require link heuristic: specialization
Forbid link heuristic: specialization
Extend set heuristic: generalization
Drop link heuristic: generalization
Climb tree heuristic: generalization
Felicity conditions
The teacher and learner must know about each other to achieve the best learning. The learner must talk to himself to understand what he is doing.
How to package ideas better
To better communicate ideas to others in order to achieve better results, the following 5 characteristics makes communication more effective.
Words can be classified in tables representing the 14 features of each sound, notably if they are “voiced” or not. With such a matrix of sounds, a “speaking machine” can be built to produce words and sentences.
Example: a machine pronouncing plurals
The machine processes from recognition through different characteristics of words (verbs, nouns, plurals…) and constraints (pronunciation of plural) to produce sounds. The machine’s input comes from visual recognition and its output is sounds to mimic a voice to produce the right sound, “z“, “s” or “e-z“.
The vision apparatus recognizes objects
The information flows to the meaning register, labeling the objects, as well as through characteristics register to define if they are plural or not
The information from the meaning register also flows through the characteristics register to define if they are nouns or verbs, and to a buffer which stores the beginning of the word letters while waiting to know if a “z” or “s” or “e-z” plural sound needs to be added.
The buffer letters moves to be processed by the pronunciation apparatus, waiting to be processed by the plural constraint.
The word is send to the plural constraint processor.
If the word is plural and the last letter is voiced, the constraint adds a “z” sound, if not voiced an “s” sound is added. If the word is not plural, nothing is added, ending the word’s pronunciation.
Such machine is called a propagator as connections are reversible: by hearing a plural word, the machine can figure where are many of the objects to be seen by the visual apparatus.
Building phonological algorithm
Collect + and – examples
Pick and seed positive
Generalize by churning useless information where + and – examples are not differentiated
Quit when cover a negative example, or go back to step 3 and generalize more
Sparse space
The plural machine works because in a high dimensional sparse space, a hyperplane can easily separate two sets of examples, as phonemes constitutes a 14-dimension space.
Generalizing to problem solving with AI
Define the problem
Devise a representation suiting the problem
Determine an approach / method
Pick a mechanism or devise an algorithm
Experiment
Troubles ensues when people focus on a specific mechanism rather the problem at hand.
The right representation does the following:
it makes the problem explicit
it exposes constraint
it generates localness to solve the problem: the answer can be pinpointed rather than gathered from various sources, areas, places, etc.
Strings of 0 and 1 simulate chromosomes in binary systems. An initial population of chromosomes is allowed to mutate (modification of certain bit in the strings), cross over with other strings during replication, or to remain the same.
Mutation: enables a step by step modification of chromosomes, allowing hill-climbing to improve fitness little by little
Cross over: enables the aggregation of the best characteristics from different chromosomes into a new one
The chromosomes “genotype”, encodes the phenotype of corresponding individuals (animals, peoples, programs…), which is represented by its fitness to the environment. The fitness provides a probability of survival to the next generation through a selection process.
Through the alteration (mutation, cross over, remain) of chromosomes and selection (fitness) before reproduction, chromosomes can evolve from one generation to the next.
This process of selection and reproduction is bound to get blocked in local maxima, but allows for a lot of intervention by the algorithm designer to circumvent such problems.
Defining the probability of survival
By fitness
Probability of survival = Fitness / Sum of fitness of all chromosomes
Rank Space
The probability of survival of the highest ranking individual is defined as a constant P[c]. If this highest ranking individual is not reaching the selection threshold, then the second highest ranking individual is checked, with a probability of (1-P[c]) P[c] and so on:
P[1] = P[c]
P[2] = (1-P[c]) P[c]
P[3] = (1-P[c])^2 P[c]
P[n-1] = (1-P[c])^(n-2) P[c]
P[n] = (1-P[c])^(n-1)
By fitness and diversity
Both the most fit and most diverse individuals are retained for reproduction in the next generation. Diversity allows genetic algorithms to be much more effective and to not get blocked in local maxima.
Issues with genetic algorithm
The transition of genotype to phenotype is still not well understood.
The effectiveness of a genetic algorithm may lie in its creator’s ability to program it with the right parameters rather than the algorithm itself. It may also come from the richness of the environment, as other types of algorithm may do as well or better.
Applications
The combination of different good options from crossing over (beginning, end) can be useful in planning a series of steps. Combining different plans and different series of steps can result in a better plan.
Combining different rules from a rule-based expert system, such as horse-racing prediction.
Genetic algorithm may be used to encode “creatures“: their size, number of parts, articulations or method of operations…
Convolution: a neuron looks for patterns in a small portion (10×10 px) of an image (256×256 px), the process is repeated by moving this small area little by litte.
Pooling: The result of the convolution is computed as a point for each portion analyzed. By a similar step by step process, a small set of points are computed into values by choosing the maximum value (“max pooling”).
By reproducing the pooling process multiple times (100x), and feeding it to a neural net, it will compute how likely the initial image is recognized as a known category.
Autocoding
A small number of neurons (~2), the “hidden layer“, a bottleneck of neurons between two columns of multiple neurons (~10) is used to obtain output values z[n] that are the same as input values x[n].
Such results implies that a form a generalization is accomplished by the hidden layer, or rather, a form of encoded generalization, as the actual parameters of the bottleneck of neurons seems not so obvious to understand.
Final layer of neurons
As the neural net is trained with parameters and thresholds, the shape and corresponding equation of the sigmoid function is adapted to properly sort positive and negative results, by maximizing the probability of sorting examples properly.
Softmax
Instead of sorting by the maximum value and the corresponding category, the final output is an array of the most probable categories (~5 categories).
Dropout
The problems of neural nets is that they can get blocked in local maximum areas. To prevent this, at each computation, one neuron is deactivated to check if its behavior is skewing the neural net. At each new computation another is shut down, or dropped out, to check all neurons.
Thanks to wider neural networks, neural nets can avoid being jammed into local maximum as they can analyze local maximum through more parameters.
Neural nets are modeled upon real biological neurons, which have the following characteristics:
All or none: the input from each entry in the neural net is either O or 1, the output is also 0 or 1
Cumulative influence: the influence of various input neurons accumulates to produce the final output result, if a certain threshold is reached
Synaptic weight: each input neuron is given a weight depending on its importance
Characteristics that are not modeled
Several characteristics of biological neurons are not modeled in neural nets, as their purpose is not clearly understood in brain information processing.
Real neurons also have a refractory period when they do not respond to stimuli after firing.
In a real neuron, the resulting output can go to either one (of a few) axonal bifurcations.
The timing of the various inputs in the dendritic tree is not well understood in the resulting pulse in the axon.
Neural net principles
In a neural net, the resulting vector z, is a function of the different inputs x[n], the weights w[n] and the thresholds t[n]. A neural net is therefore a function approximator.
By comparing a known desired results vector (such as the content of a picture) with the actual output vector, a performance function can be determined to know how well the neural net is performing.
To simplify the performance function, thresholds are eliminated by adding an extra weight w[0] that nullify the threshold, and the step function resulting in {0, 1} values is smoothed to a sigmoid function resulting in the [0, 1] interval.
Backpropagation
Backpropagation is the name of the algorithm generally used to train a neural net.
Varying the weights little by little and with a certain randomization allows the performance function to measure if progress is being made or not, and to improve the weighing accordingly to progress towards an optimal performance.
The amount of computation of the performance function is linearly increased by the depth and squared by the width of the net.