Current computer vision frameworks show remarkable performance on various challenging computer vision benchmarks, such as image recognition, object detection, semantic image segmentation, etc. The accomplishment of these benchmarks depends on the availability of a lot of time-consuming and expensive data.

Many real-world computer vision applications are concerned with visual classifications absent in standard datasets or with uses of dynamic nature where visual categories or their outlook may change over the long run.

Despite promising results, purely self-supervised methods learn visual representations that are fundamentally inferior compared to those delivered by thoroughly supervised methods. In this way, their practical relevance is restricted, and at this point, self-supervised techniques are of no use.

This post will demonstrate the need for semi-supervised learning to leverage computer vision at a massive scale, the critical ideas of various algorithms for semi-supervised learning through diagrams. I further explore the advantages and drawbacks of regularizations proposed by Semi-Supervised Learning.

Need for Semi-supervised Learning

Supervised learning has been at the forefront of computer vision research and deep learning over the previous decade. In a supervised learning method, individuals are required to label the dataset manually. At that point, models use this information to learn complex hidden connections between the input and Label and build up the ability to predict the Label, given the data. Deep learning models are generally data-hungry and require vast amounts of datasets to accomplish excellent performance.

One significant drawback of supervised and Deep Learning is that it depends on vast amounts of labelled datasets. Massive datasets aren’t available across all the domains as it may be strategically hard and costly to access massive datasets labelled by experts. While the accessibility of labelled datasets can be an expensive and challenging task to accomplish, we typically have many unlabeled datasets, particularly picture and text data. That’s where semi-supervised learning comes into play!

Self-supervised learning typically works admirably for most computer vision tasks because we have substantial image datasets that cover a decent part of possible image space, and generally, weights learned from it are adaptable to custom image classification tasks. Additionally, the pre-trained models are easily accessible on the off chance, hence working with the whole process flawlessly.

However, this methodology won’t function appropriately if the distribution of images in your dataset is unique in relation to the images that the base model was trained on. For instance, if you are dealing with grayscale pictures produced by a medical imaging device, this learning won’t be that successful, and you will require more than several thousand labelled images for training your model to produce a good performance.

In contrast, you may have to deal with many unlabeled datasets for the classification of images. That is the reason the ability to learn from unlabeled datasets is significant. Furthermore, the unlabeled dataset usually is far more valuable in variety and volume than even the biggest labelled datasets. Semi-supervised learning methods have appeared to yield better performance than supervised learning methods on massive image datasets.

Major Semi-supervised Algorithms

Semi-supervised learning algorithms for Computer Vision have been progressing rapidly in the previous few years. The present state of the development in these algorithms is enhancing the previously done work regarding architecture and loss function or introducing hybrid algorithms by mixing various definitions. In this section, we solely focus on semi-supervised algorithms that can be used to accomplish computer vision objectives.

Pseudo Label

Dong-Hyun Lee proposed a very basic and productive algorithm called “Pseudo-Label” in 2013. The ideology is to train a model on a dataset of both labelled and unlabeled images. The model is trained on the labelled images in the normal supervised way with a cross-entropy loss function. A similar model is used to get predictions for a batch of unlabeled images and an extremely confident class is used as the pseudo-label. At that point, the cross-entropy loss is determined by identifying model predictions and the pseudo-label for the unlabeled images. The total loss is a weighted amount of the labelled and unlabeled loss terms as shown below:

L = L labeled + αt ∗ L unlabeled

To ensure the model has been trained enough from the labeled information, the term αt is set to 0 during the first 100 training steps. It is then gradually expanded up to 600 training steps and afterward kept consistent throughout the training process.

Figure 1: Increasing trainings steps at each batch

Additional Material: Good Read with Python Code for Pseudo Label

Generative Algorithms

Generative algorithms focus on the precise construction of images after making them go through a bottleneck. One representation of such algorithms is autoencoders. They decrease the input to a low-dimensional representation space utilizing an encoder network and then recreate the image utilizing the decoder network.

In this method, the actual input turns into the supervised signal (labeled) for training the model. The encoder would then be able to be extracted and utilized as a starting point to build your classifier, using one of the learning strategies.

Essentially, another type of generative algorithms – Generative Adversarial Networks (GANs) – can be utilized for pre-training on unlabeled image datasets. At that point, a differentiator can be adopted and further tweaked for the image classification.

Figure 2: Input & Reconstruction using Generative Algorithms

Additional Material: GAN

Self-Training method

Classification of remote-detecting images is a difficult task because of the restricted accessibility of labeled samples in the dataset for the training process. To take care of the issue of labeled samples, a self-training method was introduced.

Self-training is a mainstream semi-supervised algorithm broadly utilized for the training of supervised classifiers with limited labeled and a huge pool of unlabeled samples in the dataset. The self-training method chooses samples only based on the greatest classification likelihood rule which may improve the classifier precision. The effectiveness of the classifiers trained in the self-training algorithm relies upon the determination of correct, different, and informative samples for the labeled training set.

The proposed approach first groups the unlabeled samples into a few numbers of the dataset. From that point onward, a supervised classifier is trained with very few labelled samples and the trained classifier is used to choose the most confident set. The selected most confident set assists with adding subjective samples into the labelled set for the successful training of the classifiers. This methodology has delivered improvement in the image classification accuracy.

Figure 3: Self Training Formulation

Additional Material: Self-training for decision tree classifiers

Temporal Ensembling

The ensemble prediction algorithms can be leveraged for semi-supervised learning where just a small part of training data is labeled. If we compare the temporal ensembling with the current network of the algorithms being trained, the temporal ensembling prediction is probably going to be nearer to the correct, unknown labels of the unlabeled sources of data. Along these lines, the labels marked this way can be used as training samples for the unlabeled information inputs.

This method depends vigorously on dropout regularization and flexible data augmentation. Undoubtedly, without neither, there would be considerably less reason to put confidence in whatever labeled classification inputs are induced for the unlabeled training dataset. We demonstrate one of the significant approaches to implement the temporal ensembling – π-model.

The key thought behind the π-model is to make two random expansions of an image for both labeled and unlabeled information. At that point, a model with dropout is utilized to predict the label of both these pictures. The square difference of these two predictions is utilized as a consistency loss function. For labeled images, we additionally calculate the cross-entropy loss. The total loss is a weighted amount of these two loss terms. A weight w(t) is applied to choose how much the consistency loss adds up to the total loss.

Figure 4: π-model Architecture

Additional Material: Temporal Ensembling for Semi-Supervised Learning

Benefits of Semi-supervised Learning

We work with a labeled dataset when we are building models using supervised learning. Whereas, the objective of unsupervised learning is of exploratory nature (clustering, compression) while working with unlabeled datasets.

But, the fundamental benefit of semi-supervised learning over the other two types of learning is, with the assistance of semi-supervised learning, we can improve the generalization and performance of our model. In fact, in various situations, data with labels isn’t promptly available. Semi-supervised learning can accomplish near to accurate results on standard tasks with just a negligible amount of the labeled information.

Since datasets with labels are expensive and hard to access, huge datasets (particularly for image classification purposes) may just have a couple of labels present. For example, consider classifying a certain image to its appropriate category. Out of a hundred thousand categories present, the model knows this for 10,000 images, however, the other 90,000 images could belong to any of the categories.

Semi-supervised learning allows us to work on these kinds of datasets without making compromises while choosing either supervised learning or unsupervised learning. Semi-oversee learning is applied everywhere from crawling engines and image classification frameworks to image recognition. However, it’s absolutely impossible to confirm that the algorithm has created labels that are 100% precise, bringing about less dependable results than traditional supervised algorithms.

In semi-supervised learning, we are attempting to tackle a supervised learning approach using labeled information augmented by unlabeled information; the quantity of unlabeled or partially labeled samples is frequently bigger than the number of labeled tests since the former is more affordable and simpler to access. In this way, we will probably overcome one of the issues of supervised learning – having insufficient labeled information. Adding smaller and abundant unlabeled information, we are expecting to build a preferable model over using supervised learning alone. However, semi-supervised learning seems like an incredible approach, we must be careful. Semi-supervised learning isn’t generally the solution to all the problems like that we are searching for – in some cases it works extraordinary, in some it doesn’t.

Drawbacks of Semi-supervised Learning

While it might create the impression that having complete information on the unlabeled distribution can give an extraordinary benefit in the labeled dataset complexity in computer vision practice, as compared to supervised learning, there are indeed some serious limitations and drawbacks of utilizing semi-supervised learning.

  1. Semi-supervised algorithms for thresholds over the real-time classification of images can acquire a factor of two over currently known upper limits on the sample complexity of supervised learning.
  2. There is a possible risk of damaging the learning process when making assumptions about unlabeled information in the dataset. The risk happens when these assumptions don’t fit, if even marginally.
  3. The present status of the art in the unlabeled distribution of the learning don’t give tight upper and lower bounds on the dataset complexity that match, inside a constant factor independent of any hypothesis, or the upper bound of supervised learning, for any unlabeled distribution.


We believe that semi-supervised learning is currently an amazing phenomenon that has the ability to support performance for low-resource datasets. Semi-supervised learning is still at its outset yet it will slowly grow its dimensions in the computer vision domain by empowering learning from smaller and easily accessible and open unlabeled information.

In this post, we got an outline of how semi-supervised algorithms for computer vision have advanced throughout the long term. This is a truly significant line of research that can directly affect businesses and how organizations and individuals can leverage semi-supervised learning for better image classification and object recognition.

Leave a Comment