Human Pose estimation Using CNN and Keypoint R-CNN

Introduction

Human pose estimation is a crucial task in computer vision, involving the prediction of the spatial arrangement of a person's body from images or videos. The accurate estimation of human poses has numerous applications, including activity recognition, human-computer interaction, and augmented reality. However, pose estimation is challenging due to variations in human body shapes, poses, and environmental conditions.

In this project, we aim to develop a human pose estimation model using deep learning techniques. Our goal is to accurately predict keypoint coordinates corresponding to anatomical landmarks on the human body, such as joints and limbs. By addressing this problem, we aim to contribute to advancements in computer vision and enable applications that require precise understanding of human movements and interactions

Methodology - CNN

Data Preprocessing:
- Image Resizing and Normalization: All images in the MPII Human Pose dataset are resized to 220x220 pixels to ensure consistent input dimensions for the model. The pixel values are normalized to a range between 0 and 1 to facilitate convergence during training.
- Data Augmentation: Data augmentation techniques such as random rotations, translations, flips, and adjustments in brightness or contrast are applied to increase the variability of the training data. This helps in improving model generalization and robustness.
Model Architecture Design:
- We design a deep convolutional neural network (CNN) architecture tailored for human pose estimation.
- The model architecture consists of multiple convolutional layers followed by max-pooling layers to learn hierarchical features from input images.
- Skip connections or residual connections are incorporated to facilitate information flow across different layers and capture both local and global spatial relationships in the input images.
Training Procedure:
- Dataset Splitting: The preprocessed dataset is split into training and validation sets to monitor model performance and prevent overfitting.
- Optimizer: The Adam optimizer is selected for its adaptive learning rate capabilities, which help in faster and more stable convergence.
- Loss Function: Mean Squared Error (MSE) is used as the loss function to minimize the difference between predicted and ground truth keypoints.
- Learning Rate Scheduling: The learning rate is adjusted during training. For example, it can be reduced after a certain number of epochs or based on validation performance to prevent overshooting the optimal solution.
- Training Iterations: Batches of training data are iteratively fed to the model. The model's weights are updated based on gradients computed from backpropagation.
- Regularization Techniques: Early stopping and learning rate scheduling are employed to prevent overfitting and ensure stable convergence during training.

Results - CNN

Performance Metrics:
- Learning Curves: Plots of training and validation loss over epochs show the model's learning progress and help in diagnosing potential overfitting or underfitting.
- Accuracy Improvements: Regularization techniques and learning rate adjustments lead to significant improvements in model accuracy over the training period.
Training and Validation Performance:
- Training Progress: The learning rate adjustments and regularization techniques applied during training helped in achieving stable convergence and avoiding overfitting.
- Validation Results: The model demonstrated high accuracy in predicting keypoint coordinates on the validation set, as indicated by low MAE and effective visualizations of predicted poses.

Methodology - Key Point R-CNN

Data Preprocessing:
- Image Reading: Input images are read using OpenCV.
- Transformation: Images are transformed to tensors using the torchvision.transforms module, which normalizes and prepares the images for the model.
Model:
- The model used is the keypointrcnn_resnet50_fpn, a pre-trained model designed for object detection and keypoint detection. This model leverages a ResNet-50 backbone with a Feature Pyramid Network (FPN) for robust feature extraction at multiple scales.
Image Processing:
- Images are read and transformed into a format suitable for the model.
- The model processes the image, outputting the detected keypoints and their corresponding scores.
Keypoint and Skeleton Drawing:
- Keypoints are drawn on the images using OpenCV, with confidence scores dictating their visibility.
- Skeletons are drawn by connecting relevant keypoints based on predefined limb connections.

Results - Key Point R-CNN

Extra Work

An web application using flask has also been implemented using key point RCNN where we can upload an image or do live time pose estimation.

Results:

Future Scope

Fine-tuning for Specific Applications: Fine-tuning the pre-trained R-CNN model on specific datasets or for specific applications (e.g., sports, medical analysis) could enhance its performance in those areas.
Semi-supervised Learning: Utilizing semi-supervised or unsupervised learning techniques could help in leveraging large amounts of unlabeled data to further improve the model's performance.
Enhanced Data Augmentation: Implementing more sophisticated data augmentation techniques could help improve the model's robustness to variations in lighting, occlusions, and different poses.

Note

Download the saved-model_MPIIy1.keras from the link below:
https://drive.google.com/file/d/1t9OG8_kjbfmAIF06BTk28A4ygDafKFHP/view?usp=drive_link

Create a folder "uploads" inside client_server to store the images for the flask application
Also:
Create a folder "testresults" inside CNN folder to store the result images for the CNN method

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
CNN		CNN
Results		Results
client_server		client_server
LICENSE		LICENSE
README.md		README.md
keypoint-estimation-rcnn.ipynb		keypoint-estimation-rcnn.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Human Pose estimation Using CNN and Keypoint R-CNN

Introduction

Methodology - CNN

Results - CNN

Methodology - Key Point R-CNN

Results - Key Point R-CNN

Extra Work

Future Scope

Note

About

Releases

Packages

Languages

License

Harish-Balaji-B/Human-Pose-estimation-Using-CNN-and-Keypoint-R-CNN

Folders and files

Latest commit

History

Repository files navigation

Human Pose estimation Using CNN and Keypoint R-CNN

Introduction

Methodology - CNN

Results - CNN

Methodology - Key Point R-CNN

Results - Key Point R-CNN

Extra Work

Future Scope

Note

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages