Humans use their eyes and their brains to see and visually sense the world around them. Computer vision is the science that aims to give a similar, if not better, capability to a machine or computer to see. A human being in a fraction of a second can determine the motion of a cricket ball coming and be able to grab it. You may think that it is a trivial task but believe me, it is one of the most complicated things we have attempted to model. For a machine to be able to see, we need a way to represent the object. A way to know that it is a cricket ball coming so fast at you.
Before i get into the non cool stuff, the picture below shows that, we can use machines for quality control in manufacturing to pick out unwanted objects like stones and Remove bad apples Before they are automatically packaged for export
Now, the object we want a computer to see needs to be represented in a way that makes sense to a computer in terms of shape and appearance. External factors such as the application domain, purpose, and goal determines how the object should be represented. When an object has been represented, then determining the algorithm to use is straight forward.
1) Shape representation
A buffet of methods exist but the art of choosing the method to use depends on the type of object you are trying to represent. Certain methods are for certain types of objects.
it is more suitable for small, simple objects that can be represented using a single point. When using this feature, it is difficult to keep track of which point belongs to which object and this can easily cause detections to be missed.
Primitive geometric shapes such as a rectangle or ellipse are a common approach for both rigid and non rigid objects, although it is more suitable for simple, rigid objects.
Silhouette and contour
It uses the region inside the contour. It is a flexible model with the ability to represent many different object shapes
Articulated shape models
It allows different parts to be held together by joints. Each part from the joint can be represented using simple geometric shapes like ellipses eg a representation of a human being.
This uses the silhouette of an object and applying medial axis transform to make it possible to extract the object skeleton. It can be used to represent cats, humans, etc.
2) Appearance representation
Probability densities of object appearance
An estimation of the probability densities of object appearance features can be computed using the shape model such as contour. The probability densities can either be parametric such as Gaussian distribution (or normal distribution), or nonparametric such as histograms.
These employ Silhouettes or simple geometric shapes to make the model. It is more suitable for objects whose poses does not vary. It is not reliable when appearance features change noticeably.
Active appearance models
Uses a set of landmarks which can either reside on the object boundary or inside the object region. The object appearance is simultaneously modeled by storing an appearance vector for each landmark. This can be in the form of color, texture, or gradient magnitude. This model does however require a training phase where shape and associated appearance is learned from a set of samples.
Multiview appearance models
Unlike templates these models encodes different views of the object. There are different approaches for doing this, one example is generating a subspace from the given views. Examples of subspace approaches that have been used for this purpose are Principal Component Analysis (PCA) and Independent Component Analysis
Feature Image: https://goberoi.com/comparing-the-top-five-computer-vision-apis-98e3e3d7c647