16. Learning: Support Vector Machines

Decision boundaries

Separating positive and negative example with a straight line that is as far as possible from both positive and negative examples, a median that maximizes the space between positive and negative examples.

Constraints are applied to build a support vector (u) and define a constant b that allow to sort positive examples from negative ones. The width of a “street” between the positive and negative values is maximized.

Going through the algebra, the resulting equation show that the optimization depends only on the dot product of pair of samples.

The decision rule that defines if a sample is positive or negative only depends on the dot product of the sample vector and the unknown vector.

No local maximum

Such support vector algorithm can be proven to be evolving in a convex space, meaning that it will never be blocked at a local maximum.

Non linearity

The algorithm cannot find a median between data which cannot be linearly separable. A transformation can however be applied to the space to reorganize the samples so that they can be linearly separable. Certain transformations can however create an over fitting model that becomes useless by only sorting the example data.

11. Learning: Identification Trees, Disorder

Identification Trees


  • Non-numeric data
  • Not all characteristics matter
  • Some do matter but not all of the time
  • Cost: certain tests may be more expensive than others

Occam’s razor

The objective is to build the smallest tree possible (to reduce costs and computation) and because the simplest explanation is always the best.

Testing data

Small data sets

The different tests upon the data can be ranked by the homogeneous groups it produces and the total number of items in each homogeneous group.

Using the most efficient tests first, the remaining ambiguous data is checked through the other tests and so until until all the data is sorted.

Large data sets

For large data sets, no tests may divide the data in homogeneous group. The results of tests must therefore be ranked according to their level of disorder.

If P = Positive results, N = negative results, T = total

Disorder (Set) = – P/T log[2] ( P/T )  – N/T log[2] ( N/T )

The resulting curve of this equation is a parabolic curve with max at y = 1 for x = 1/2 and min at y = 0 for x = {0,1}

So the quality of each test can be defined as follows:

Quality (Test) = Sum[for each set produced] ( Disorder (Set) ) * Number of samples in set / Number of samples handed by test

Decision boundaries

Contrary to nearest neighbors, identification tests always separate the data space in two equal parts parallel to the space axis.

10. Introduction to Learning, Nearest Neighbors


Regularity: “Bulldozer computing”

  • Nearest neighbors: pattern recognition
  • Neural nets: mimic biology
  • Boosting: theory

Constraints: human-like learning

  • One-shot
  • Explanation-based learning

Nearest neighbor

A detection mechanism generates a vector of features. These features are converted in a vector of values that is compared to a library of possibilities to find the closest match in order to recognize patterns, objects, etc.

Method: Standard objects (ex: electric covers) are positioned in a space according to recognizable characteristics (ex: size, hole size). Decision boundaries between the standards are then established in the space to define areas of attribution to a nearest neighbor. Objects are then sorted according to which area the belong to.

Another method of sorting objects (ex: newspaper articles) to recognize could be to compare the angle of their vectors in the space with the vectors of the standard objects.

In the case of a robotic arm, instead of solving equations of angles at the joints which cannot be implemented in real life, due to friction and wear, a table of values is gathered at each position during a learning phase. During the working phase, the closest set of values from the table is used to complete the task.


  • Spread: Values can be concentrated in the space making it difficult to discern objects. Solution: norm the data using statistical analysis.
  • Subject matter: make sure to measure values that do actually make a difference between objects, not values that generate confusing results.
  • Relevancy: use data that is relevant to the matter at hand, not just any data that is independent from the target results.