Different Types of Computer Vision Models

The following are the most commonly used different types of Computer Vision Models Networks that can be used when working with AI and Computer Vision Models:

A. The Single Shot Multibox Detector (SSD) model:

It was the first ever integrated model to combine object localization and classification in a single network.
It performs classifications on different convolutional layer feature maps using default bounding boxes.
The model detects objects in images using single deep neural network.
The model discretize the output space of the bounding boxes into a set of default boxes over varying aspect ratios and scales per the feature of the map location.
During prediction time, the network gets to generate scores for the presence of every object category that is present in each default box.
The predicted scores then gets to be produced and adjusts to the box to better match the object shape.
The model tend to combine both the predictions from the multiple feature maps using varying resolutions so that it can naturally handle all the objects of varying sizes.
Here is the link to a research paper.

B. The ResNet Model:

Also referred to as the “residual learning.” It is achieved in the ResNet model architecture is achieved by:

It uses the “skip” layers that pass information forward by a couple of layers.
This type of network is more difficult to train.
The networks are useful in image classification by integrating the low, mid, and high level features and classifiers in an end-to-end multi-layer method.
The levels of these features can further be achieved and enriched by the number of stacked layer “depth“.

Image elaborating on residual learning concept

For more on residual learning, this research paper explains it in details.

C. The YOLO (You Only Look Once)

This is a unified, real-time object detection model.
It is a regression problem that spatially separates the bounding boxes and the associated classes probabilities.
A single neural network gets to predict the bounding boxes and the class probabilities directly from the full images in just one evaluation.
It is fast because of its unified architecture.
The detection pipeline is a sible neural network that optimizes the end-to-end detection performance.
You can read more in this research paper here.

D. Faster R-CNN

It is also a real-time object detection with a region proposal network.
It relies on the region proposal algorithms to hypothesize the object locations.
The model has a reduced running time exposing the region proposal computation.
It simultaneously predicts the object bounding and objects scores at every position.
You can read more here.

E. The MobileNets Model

This is an efficient CNN for mobile and the embedded vision applications.
The model is based on streamlined architecture that uses the depth-wise separable convolutions to help build the light weight deep neural network.
More can be read in this research paper from Cornell University.

F. The Inception Model

Started as a case study of the assessment of the hypothetical output of a sophisticated network topology algorithm that was used to approximate the sparse structure for vision networks and at the sametime covering the hypothesized outcome by dense and readily available components.
The model architecture have been verified using the ILSVRC 2014 classification and detection challenges to significantly outperform the current neural networks.
You can read more about the Inception Model in “Going Deeper with Convolutions” here.