Computer vision and convolutional neural networks

Computer vision is a key aspect of artificial intelligence that is critical to many applications, from robots movements to self-driving cars and from medical imaging to products recognition in manufacturing plants. This MIT course presents the issues of computer vision and how they are handled with Convolutional Neural Networks together with the latest domains of research and state-of-the-art algorithms architectures.

image classification by a computer

Continue reading “Computer vision and convolutional neural networks”

9. Constraints: Visual Object Recognition

Borders and faces orientation

Discerning borders of objects and face orientation with vectors, an initial computer vision theory seemed plausible but too difficult to implement.

Orthographic projection

In orthographic projection, the correspondence of a system of points of three known objects and one unknown object creates a system of equations with a unique solution of parameters. If this solution can be applied  to all points of the unknown object, the object is recognized.

This works with manufactured objects but not so well with natural objects.

“Goldilocks principle”

Don’t search for features that are too big / complex, and not too small / simple. Not too big, not too small.

Face recognition

Use an integral of different recognizable points on a face: search for a correlation of 2 eyes + 1 nose, 1 nose + 1 mouth, etc.

7. Constraints: Interpreting Line Drawings

Computer vision

Empirical approach

Using lines on pictures of real-world objects, the edges between shapes could serve to identify the number of objects in it.

The different intersections possible generally form two types of trihedral vertexes to identify shapes:

  • arrow vertexes
  • fork vertexes

Theoretical approach

A second approach uses convex and concave lines and boundaries between objects in a more theoretical domain, to identify trihedral vertexes between 3 faces, where objects are always considered in a general position of perspective where unusual cases are not considered.

These constraints create 18 different possibilities of junctions. With this catalog of positions, shapes can be identified to know if an object can exist in the domain defined above… but not in the real world.

Towards robot vision

Adding cracks, shadows, non-trihedral vertexes and light to the theoretical approach, the domain complexity increased with more than a 1000 junctions possibilities.

However, in the same manner as the theoretical approach, an algorithm that identifies junctions one by one with this catalog, a much wider array of objects can be identified to determine what their actual shape is and or if some ambiguity is left.