Mastering Neural Networks for Face Mask Detection: 3 Proven Approaches You Can Use Today
The COVID-19 pandemic has caused a major shift in our daily lives, with face masks becoming a necessity for people around the world. To help enforce mask-wearing in public places, automated systems using machine learning models have been developed to detect whether individuals are wearing masks.
In this blog post, we will explore three different approaches for building a face mask detector using neural networks, a type of machine learning model that can learn to recognize patterns in data. These approaches include -
- Feed Forward neural networks
2. Convolutional neural networks
3. Fine Tuning using pre-trained CNN models
Let’s first talk about Data.
One of the key elements of building a successful face mask detection system is having a good dataset of images to train the neural network model. The dataset should include a diverse range of images of people wearing and not wearing masks in different lighting conditions, angles, and settings.
There are 3 ways you can collect the data-
- You can use publicly available datasets such as the “COVID-19 Face Mask Dataset” or the “Medical Mask Dataset”.
- You can collect and label the images yourself, which can be done using a smartphone or a camera. It’s important to label the images accurately to ensure that the neural network can learn to distinguish between masked and unmasked faces with high accuracy.
- You can use selenium and perform web scraping to search the required images from websites. However, it’s important to note that web scraping can sometimes be unethical or illegal, especially if the images are protected by copyright or other intellectual property rights. Therefore, it’s important to use web scraping tools responsibly and in compliance with relevant laws and regulations.
Next, Import packages, load images, and preprocess them.
You import all the necessary packages. These packages might differ based on the different approaches used.
Then the images should be loaded and preprocessed-
- Resize each image into lower number of pixels, like 32 x 32. Using high resolution leads to larger number of input nodes, thus making the program heavier.
- It then needs to be flattened into 3 dimensions for 3 color channels RGB, changing the number of input nodes to 32 x 32 x 3 = 3072.
- The pixel intensity is then normalized to be between 0 and 1.
- Perform one hot encoding on the nominal labels.
Split the data into training and test.
Using Scikit Learn, split the data set into training and test sets in an 80:20 ratio.
Approach 1: Feed Forward Neural Network Model
A feed forward neural network is a type of artificial neural network in which
- the flow of information travels in one direction, from input to output.
- Node connections do not form a cycle.
- Data enters the network at the point of input, and seeps through every layer before reaching the output.
The custom feed forward neural network model can be designed to have one or more hidden layers with a specified number of neurons. Each neuron in the hidden layer is connected to all the neurons in the input layer, and each neuron in the output layer is connected to all the neurons in the hidden layer.
You can try exploring different combinations of activation functions and hyper parameters, such as-
Lets train and test the models and look at their results.
Model 1 has a test accuracy of around 87%, and the epoch loss/accuracy graph shows no sign of over fitting or underfitting.
Model 2 however, has a test accuracy of around 94%, but the epoch loss/accuracy graph shows very high overfitting.
You know there is a problem of overfitting when there is a huge gap between the validation loss & training loss, and validation accuracy & training accuracy.
The Results.
Approach 2: Convolutional Neural Network Model
CNN is a special case of Feed Forward Neural Network. It makes the explicit assumption that the inputs are images.
- Layers of a CNN are arranged in a 3 D volume in three dimensions width, height, and depth (dimensions like RGB).
- Commonly used layers are Convolutional (CONV), Activation (ACT or RELU), Pooling (POOL), Fully connected (FC) layers.
In our case, the layers would look like this-
You can create a separate class to define the model.
First Layer-
Second Layer-
Third Layer-
Fourth and Fifth Layers-
Let us train and test this model.
This CNN model has a validation accuracy of 98%, and doesn’t look like it has any overfitting problem.
The results-
Some advantages of using CNN are that-
- CNN has better performance.
- It predicts results with higher confidence values.
- It is better at learning minute nuances.
Approach 3: Fine tuning a pre-trained CNN Model.
Fine-tuning is a technique used in machine learning that involves taking a pre-trained model and continuing the training process on a new dataset that is related to the original dataset on which the model was trained. The objective of fine-tuning is to improve the performance of the pre-trained model on the new dataset and adapt it to the specific task at hand.
An excellent tutorial is available here -
The code defines a convolutional neural network (CNN) based on the MobileNetV2 architecture for image classification. The model is constructed in two parts — the base model, which consists of the pre-trained MobileNetV2 model, and the head model, which consists of a set of fully connected layers that are trained for the specific task of image classification.
The base model is loaded using the MobileNetV2 function from the Keras library, and is configured to exclude the head FC layers, so that they can be replaced with a new set of layers for the specific classification task at hand.
The head model is constructed by taking the output of the base model and passing it through a set of layers, including an average pooling layer, a flatten layer, and two dense layers. The final layer uses the softmax activation function to output the class probabilities for the image.
This approach has an accuracy of 99% and shows no sign of overfitting. Hence, is the best performing option so far.
Using similar concepts as face mask detection, machine learning models can be trained to classify whether motorcyclists are wearing helmets in real-time traffic cameras. This technology can help reduce road accidents and promote safe riding habits by automatically detecting non-compliant riders and taking appropriate action.
The conclusion drawn is that machine learning models can be powerful tools for developing automated systems that can detect whether individuals are wearing masks or helmets in real-time, and take appropriate actions if necessary. By leveraging the power of neural networks, we can build effective systems that help promote safety and prevent the spread of COVID-19, as well as reduce road accidents and promote safe riding habits.
Hope this article helped you understand the different ways you can develop a face mask detector using Neural Network. Thanks for reading!