Skip to content

Latest commit

 

History

History
405 lines (291 loc) · 20.9 KB

README.md

File metadata and controls

405 lines (291 loc) · 20.9 KB

English | 简体中文

PP-HumanSeg

Content

  • 1 Introduction
  • 2 News
  • 3 PP-HumanSeg Models
  • 4 Quick Start
  • 5 Training and Finetuning
  • 6 Deployment

1 Introduction

Human segmentation is a high-frequency application in the field of image segmentation. Generally, human segentation can be classified as portrait segmentation and general human segmentation.

For portrait segmentation and general human segmentation, PaddleSeg releases the PP-HumanSeg models, which has good performance in accuracy, inference speed and robustness. Besides, we can deploy PP-HumanSeg models to products without training Besides, PP-HumanSeg models can be deployed to products at zero cost, and it also support fine-tuning to achieve better performance.

The following is demonstration videos (due to the video is large, the loading will be slightly slow) .We provide full-process application guides from training to deployment, as well as video streaming segmentation and background replacement tutorials. Based on Paddle.js, you can experience the effects of Portrait Snapshot, Video Background Replacement and Barrage Penetration.

2 News

  • [2022-7] Release PP-HumanSeg V2 models. The inference speed of portrait segmentation model is increased by 45.5%, mIoU is increased by 0.63%, and the visualization result is better. The general human segmentation models also have improvement in accuracy and inference speed.
  • [2022-1] Human segmentation paper PP-HumanSeg was published in WACV 2022 Workshop, and open-sourced Connectivity Learning (SCL) method and large-scale video conferencing dataset.
  • [2021-7] Baidu Video Conference can realize one-second joining on the web side. The virtual background function adopts our portrait segmentation model to realize real-time background replacement and background blur function, which protects user privacy and increases the fun in the meeting.
  • [2021-7] Release PP-HumanSeg V1 models, which has a portrait segmentation model and three general human segmentation models

3 PP-HumanSeg Models

3.1 Portrait Segmentation Models

We release self-developed portrait segmentation models for real-time applications such as mobile video and web conferences. These models can be directly integrated into products at zero cost.

  • PP-HumanSegV1-Lite protrait segmentation model: It has good performance in accuracy and model size and the model architecture in url.
  • PP-HumanSegV2-Lite protrait segmentation model: The inference speed is increased by 45.5%, mIoU is increased by 0.63%, and the visualization result is better compared to v1 model. These improvements are relayed on the following innovations.
    • Higher segmentation accuracy: We use the super lightweight models (url) released in PaddleSeg recently. We choose MobileNetV3 as backbone and design the multi-scale feature aggregation model.
    • Faster inference speed: We reduce the input resolution, which reduces the inference time and increases the receptive field.
    • Better robustness: Based on the idea of transfer learning, we first pretrain the model on a large general human segmentation dataset, and then finetune it on a small portrait segmentation dataset.
Model Name Best Input Shape mIou(%) Inference Time on Arm CPU(ms) Modle Size(MB) Config File Links
PP-HumanSegV1-Lite 398x224 96.00 29.68 2.2 cfg Checkpoint | Inference Model (Argmax) | Inference Model (Softmax)
PP-HumanSegV2-Lite 256x144 96.63 15.86 13.5 cfg Checkpoint | Inference Model (Argmax) | Inference Model (Softmax)
Note:
  • Test the segmentation accuracy (mIoU): We test the above models on PP-HumanSeg-14K dataset with the best input shape.
  • Test the inference time: Use PaddleLite, xiaomi9 (Snapdragon 855 CPU), single thread, the best input shape.
  • For the best input shape, the ratio of height and width is 16:9, which is the same as the camera of mobile phone and laptop.
  • The checkpoint is the pretrained weight, which is used for finetune.
  • Inference model is used for deployment.
  • Inference Model (Argmax): The last operation of inference model is argmax, so the output has single channel.
  • Inference Model (Softmax): The last operation of inerence model is softmax, so the output has two channels.
Usage:
  • Portrait segmentation model can be directly integrated into products at zero cost.
  • For mobile phone, there are horizontal and vertical screen. We need to rotate the image to keep the human direction always be vertical.

3.2 General Human Segmentation Models

For general human segmentation task, we first build a big human segmentation dataset, then use the SOTA model in PaddleSeg for training, finally release several general human segmentation models.

  • PP-HumanSegV2-Lite general human segmentation model: It uses the super lightweight models (url) released in PaddleSeg recently. Compared to V1 model, the mIoU is improved by 6.5%.
  • PP-HumanSegV2-Mobile general human segmentation model: It uses the self-develop PP-LiteSeg model. Compared to V1 model, the mIoU is improved by 1.49% and the inference time is reduced by 5.7%.
Model Name Best Input Shape mIou(%) Inference Time on ARM CPU(ms) Inference Time on Nvidia GPU(ms) Config File Links
PP-HumanSegV1-Lite 192x192 86.02 12.3 - cfg Checkpoint | Inference Model (Argmax) | Inference Model (Softmax)
PP-HumanSegV2-Lite 192x192 92.52 15.3 - cfg Checkpoint | Inference Model (Argmax) | Inference Model (Softmax)
PP-HumanSegV1-Mobile 192x192 91.64 - 2.83 cfg Checkpoint | Inference Model (Argmax) | Inference Model (Softmax)
PP-HumanSegV2-Mobile 192x192 93.13 - 2.67 cfg Checkpoint | Inference Model (Argmax) | Inference Model (Softmax)
PP-HumanSegV1-Server 512x512 96.47 - 24.9 cfg Checkpoint | Inference Model (Argmax) | Inference Model (Softmax)
Note:
  • Test the segmentation accuracy (mIoU): After training the models on big human segmentation dataset, we test these models on small Supervisely Person dataset (url).
  • Test the inference time: Use PaddleLite, xiaomi9 (Snapdragon 855 CPU), single thread, the best input shape.
  • The checkpoint is the pretrained weight, which is used for finetune.
  • Inference model is used for deployment.
  • Inference Model (Argmax): The last operation of inference model is argmax, so the output has single channel.
  • Inference Model (Softmax): The last operation of inerence model is softmax, so the output has two channels.
Usage:
  • Since the image of general human segmentation is various, you should evaluate the release model according to the actual scene.
  • If the segmentation accuracy is not satisfied, you should annotate images and finetune the model with pretrained weights.

4 Quick Start

4.1 Prepare Environment

Install PaddlePaddle:

  • PaddlePaddle >= 2.2.0
  • Python >= 3.7+

Due to the high computational cost of the image segmentation model, it is recommended to use PaddleSeg under the GPU version of PaddlePaddle. Please refer to the PaddlePaddle official website for the installation tutorial.

Run the following command to download PaddleSegn and install the required libs.

git clone https://github.com/PaddlePaddle/PaddleSeg
cd PaddleSeg
pip install -r requirements.txt

4.2 Prepare Models and Data

We run following commands under PaddleSeg/contrib/PP-HumanSeg.

cd PaddleSeg/contrib/PP-HumanSeg

Download the inference models and save them in inference_models.

python src/download_inference_models.py

Download and save test data in data.

python src/download_data.py

4.3 Portrait Segmentation

We use src/seg_demo.py to show the portrait segmentation and background replacement.

The input of src/seg_demo.py can be image, video and camera. The input params are as following.

Params Detail Type Required Default Value
config The path of deploy.yaml in infernece model str True -
img_path The path of input image str False -
video_path The path of input video str False -
bg_img_path The path of background image str False -
bg_video_path The path of background video str False -
save_dir The directory for saveing output image and video str False ./output
vertical_screen Indicate the input image and video is vertical screen store_true False False
use_optic_flow Enable the optic_flow function store_true False False
Note:
  • If set img_path, it reads image to predict. If set video_path, it load video to predict.
  • If not set img_path and video_path, it uses camera to shoot video for predicting.
  • It assumes the input image and video are horizontal screen, i.e. the width is bigger than height. If the image and video are vertical screen, please add --vertical_screen.
  • We can use optical flow algorithm to mitigate the video jitter (Require opencv-python > 4.0).

1)Use Image to Test

Read horizontal screen image data/images/portrait_heng.jpg and use PP-HumanSeg to predict. The results are saved in data/images_result/.

# Use PP-HumanSegV2-Lite
python src/seg_demo.py \
  --config inference_models/portrait_pp_humansegv2_lite_256x144_inference_model_with_softmax/deploy.yaml \
  --img_path data/images/portrait_heng.jpg \
  --save_dir data/images_result/portrait_heng_v2.jpg

# Use PP-HumanSegV1-Lite
python src/seg_demo.py \
  --config inference_models/portrait_pp_humansegv1_lite_398x224_inference_model_with_softmax/deploy.yaml \
  --img_path data/images/portrait_heng.jpg \
  --save_dir data/images_result/portrait_heng_v1.jpg

Read vertical screen image data/images/portrait_shu.jpg and use PP-HumanSeg to predict.

python src/seg_demo.py \
  --config inference_models/portrait_pp_humansegv2_lite_256x144_inference_model_with_softmax/deploy.yaml \
  --img_path data/images/portrait_shu.jpg \
  --save_dir data/images_result/portrait_shu_v2.jpg \
  --vertical_screen

Use background image to replace the background of input image.

python src/seg_demo.py \
  --config inference_models/portrait_pp_humansegv2_lite_256x144_inference_model_with_softmax/deploy.yaml \
  --img_path data/images/portrait_heng.jpg \
  --bg_img_path data/images/bg_2.jpg \
  --save_dir data/images_result/portrait_heng_v2_withbg.jpg

python src/seg_demo.py \
  --config inference_models/portrait_pp_humansegv2_lite_256x144_inference_model_with_softmax/deploy.yaml \
  --img_path data/images/portrait_shu.jpg \
  --bg_img_path data/images/bg_1.jpg \
  --save_dir data/images_result/portrait_shu_v2_withbg.jpg \
  --vertical_screen

2)Use Video to Test

Load horizontal screen video data/videos/video_heng.mp4 and use PP-HumanSeg to predict. The results are saved in data/videos_result/.

# Use PP-HumanSegV2-Lite
python src/seg_demo.py \
  --config inference_models/portrait_pp_humansegv2_lite_256x144_inference_model_with_softmax/deploy.yaml \
  --video_path data/videos/video_heng.mp4 \
  --save_dir data/videos_result/video_heng_v2.avi

# Use PP-HumanSegV1-Lite
python src/seg_demo.py \
  --config inference_models/portrait_pp_humansegv1_lite_398x224_inference_model_with_softmax/deploy.yaml \
  --video_path data/videos/video_heng.mp4 \
  --save_dir data/videos_result/video_heng_v1.avi

Load vertical screen video data/videos/video_shu.mp4 and use PP-HumanSeg to predict.

python src/seg_demo.py \
  --config inference_models/portrait_pp_humansegv2_lite_256x144_inference_model_with_softmax/deploy.yaml \
  --video_path data/videos/video_shu.mp4 \
  --save_dir data/videos_result/video_shu_v2.avi \
  --vertical_screen

Use background image to replace the background of input video.

python src/seg_demo.py \
  --config inference_models/portrait_pp_humansegv2_lite_256x144_inference_model_with_softmax/deploy.yaml \
  --video_path data/videos/video_heng.mp4 \
  --bg_img_path data/images/bg_2.jpg \
  --save_dir data/videos_result/video_heng_v2_withbg.avi

Besides, we can use DIS(Dense Inverse Search-basedmethod) algorithm to mitigate the flicker of video.

python src/seg_demo.py \
  --config inference_models/portrait_pp_humansegv2_lite_256x144_inference_model_with_softmax/deploy.yaml \
  --video_path data/videos/video_shu.mp4 \
  --save_dir data/videos_result/video_shu_v2_use_optic_flow.avi \
  --vertical_screen \
  --use_optic_flow

3)Use Camera to Test

Open camera to capture video (horizontal screen) and use PP-HumanSeg to predict.

python src/seg_demo.py \
  --config inference_models/portrait_pp_humansegv2_lite_256x144_inference_model_with_softmax/deploy.yaml

Open camera to capture video (horizontal screen) and use PP-HumanSeg to predict with background image.

python src/seg_demo.py \
  --config inference_models/portrait_pp_humansegv2_lite_256x144_inference_model_with_softmax/deploy.yaml \
  --bg_img_path data/images/bg_2.jpg

The result of video portrait segmentation as follows.

The result of background replacement as follows.

4.4 Online Tutorial

PP-HumanSeg V1 provides an online tutorial (url) in AI Studio.

5 Training and Finetuning

Since the image for segmentation is various, you should evaluate the release model according to the actual scene. If the segmentation accuracy is not satisfied, you should annotate images and finetune the model with pretrained weights.

We use the general human segmentation of PP-HumanSeg to show the training, evaluating and exporting.

5.1 Prepare

Refer to the "Quick Start - Prepare Environment", install Paddle and PaddleSeg.

Run the following command to download mini_supervisely dataset. Refer to the "Quick Start - Prepare Models and Data" for detailed information.

python src/download_data.py

Run the following command to download pretrained models.

python src/download_pretrained_models.py

5.2 Training

The config files are saved in ./configs as follows. We have set the path of pretrained weight in all config files.

configs
├── human_pp_humansegv1_lite.yml
├── human_pp_humansegv2_lite.yml
├── human_pp_humansegv1_mobile.yml
├── human_pp_humansegv2_mobile.yml
├── human_pp_humansegv1_server.yml

Run the following command to start finetuning. The full usage of model training in url.

export CUDA_VISIBLE_DEVICES=0 # Set GPU on Linux
# set CUDA_VISIBLE_DEVICES=0  # Set GPU on Windows
python ../../train.py \
  --config configs/human_pp_humansegv2_lite.yml \
  --save_dir output/human_pp_humansegv2_lite \
  --save_interval 100 --do_eval --use_vdl

5.3 Evaluation

Load model and trained weights and start model evaluation. The full usage of model evaluation in url.

python ../../val.py \
  --config configs/human_pp_humansegv2_lite.yml \
  --model_path pretrained_models/human_pp_humansegv2_lite_192x192_pretrained/model.pdparams

5.4 Prediction

Load model and trained weights and start model prediction. The result are saved in ./data/images_result/added_prediction and ./data/images_result/pseudo_color_prediction

python ../../predict.py \
  --config configs/human_pp_humansegv2_lite.yml \
  --model_path pretrained_models/human_pp_humansegv2_lite_192x192_pretrained/model.pdparams \
  --image_path data/images/human.jpg \
  --save_dir ./data/images_result

5.5 Exporting

Load model and trained weights and export inference model. The full usage of model exporting in url.

python ../../export.py \
  --config configs/human_pp_humansegv2_lite.yml \
  --model_path pretrained_models/human_pp_humansegv2_lite_192x192_pretrained/model.pdparams \
  --save_dir output/human_pp_humansegv2_lite \
  --without_argmax \
  --with_softmax

When set --without_argmax --with_softmax, the last operation of inference model is softmax.

6 Deployment

6.1 Deployment on Mobile Devices

Refer to deployment on edge dvices

6.2 Deployment on Web

Refer to deployment on web