
The following are the most commonly used different types of Computer Vision Models Networks that can be used when working with AI and Computer Vision Models:
A. The Single Shot Multibox Detector (SSD) model:
- It was the first ever integrated model to combine object localization and classification in a single network.
- It performs classifications on different convolutional layer feature maps using default bounding boxes.
- The model detects objects in images using single deep neural network.
- The model discretize the output space of the bounding boxes into a set of default boxes over varying aspect ratios and scales per the feature of the map location.
- During prediction time, the network gets to generate scores for the presence of every object category that is present in each default box.
- The predicted scores then gets to be produced and adjusts to the box to better match the object shape.
- The model tend to combine both the predictions from the multiple feature maps using varying resolutions so that it can naturally handle all the objects of varying sizes.
- Here is the link to a research paper.
B. The ResNet Model:
Also referred to as the “residual learning.” It is achieved in the ResNet model architecture is achieved by:
- It uses the “skip” layers that pass information forward by a couple of layers.
- This type of network is more difficult to train.
- The networks are useful in image classification by integrating the low, mid, and high level features and classifiers in an end-to-end multi-layer method.
- The levels of these features can further be achieved and enriched by the number of stacked layer “depth“.

- For more on residual learning, this research paper explains it in details.
C. The YOLO (You Only Look Once)
- This is a unified, real-time object detection model.
- It is a regression problem that spatially separates the bounding boxes and the associated classes probabilities.
- A single neural network gets to predict the bounding boxes and the class probabilities directly from the full images in just one evaluation.
- It is fast because of its unified architecture.
- The detection pipeline is a sible neural network that optimizes the end-to-end detection performance.
- You can read more in this research paper here.
D. Faster R-CNN
- It is also a real-time object detection with a region proposal network.
- It relies on the region proposal algorithms to hypothesize the object locations.
- The model has a reduced running time exposing the region proposal computation.
- It simultaneously predicts the object bounding and objects scores at every position.
- You can read more here.
E. The MobileNets Model
- This is an efficient CNN for mobile and the embedded vision applications.
- The model is based on streamlined architecture that uses the depth-wise separable convolutions to help build the light weight deep neural network.
- More can be read in this research paper from Cornell University.
F. The Inception Model
- Started as a case study of the assessment of the hypothetical output of a sophisticated network topology algorithm that was used to approximate the sparse structure for vision networks and at the sametime covering the hypothesized outcome by dense and readily available components.
- The model architecture have been verified using the ILSVRC 2014 classification and detection challenges to significantly outperform the current neural networks.
- You can read more about the Inception Model in “Going Deeper with Convolutions” here.
That’s it for today 🙂
If you have any question or comment, do not hesitate to ask us.
Quote: The moon looks upon many night flowers; the night flowers see but one moon. – Jean Ingelow