The following are the most commonly used different types of Computer Vision Models Networks that can be used when working with AI and Computer Vision Models:

A. The Single Shot Multibox Detector (SSD) model:

  • It was the first ever integrated model to combine object localization and classification in a single network.
  • It performs classifications on different convolutional layer feature maps using default bounding boxes.
  • The model detects objects in images using single deep neural network.
  • The model discretize the output space of the bounding boxes into a set of default boxes over varying aspect ratios and scales per the feature of the map location.
  • During prediction time, the network gets to generate scores for the presence of every object category that is present in each default box.
  • The predicted scores then gets to be produced and adjusts to the box to better match the object shape.
  • The model tend to combine both the predictions from the multiple feature maps using varying resolutions so that it can naturally handle all the objects of varying sizes.
  • Here is the link to a research paper.

B. The ResNet Model:

Also referred to as the “residual learning.” It is achieved in the ResNet model architecture is achieved by:

  • It uses the “skip” layers that pass information forward by a couple of layers.
  • This type of network is more difficult to train.
  • The networks are useful in image classification by integrating the low, mid, and high level features and classifiers in an end-to-end multi-layer method.
  • The levels of these features can further be achieved and enriched by the number of stacked layer “depth“.
Image elaborating on residual learning concept
  • For more on residual learning, this research paper explains it in details.

C. The YOLO (You Only Look Once)

  • This is a unified, real-time object detection model.
  • It is a regression problem that spatially separates the bounding boxes and the associated classes probabilities.
  • A single neural network gets to predict the bounding boxes and the class probabilities directly from the full images in just one evaluation.
  • It is fast because of its unified architecture.
  • The detection pipeline is a sible neural network that optimizes the end-to-end detection performance.
  • You can read more in this research paper here.

D. Faster R-CNN

  • It is also a real-time object detection with a region proposal network.
  • It relies on the region proposal algorithms to hypothesize the object locations.
  • The model has a reduced running time exposing the region proposal computation.
  • It simultaneously predicts the object bounding and objects scores at every position.
  • You can read more here.

E. The MobileNets Model

  • This is an efficient CNN for mobile and the embedded vision applications.
  • The model is based on streamlined architecture that uses the depth-wise separable convolutions to help build the light weight deep neural network.
  • More can be read in this research paper from Cornell University.

F. The Inception Model

  • Started as a case study of the assessment of the hypothetical output of a sophisticated network topology algorithm that was used to approximate the sparse structure for vision networks and at the sametime covering the hypothesized outcome by dense and readily available components.
  • The model architecture have been verified using the ILSVRC 2014 classification and detection challenges to significantly outperform the current neural networks.
  • You can read more about the Inception Model in “Going Deeper with Convolutions” here.

🙂

That’s it for today 🙂

If you have any question or comment, do not hesitate to ask us.

Quote: The moon looks upon many night flowers; the night flowers see but one moon. – Jean Ingelow