Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DBNET]-Bad performance on long text detection #376

Open
fatfishZhao opened this issue Jul 19, 2021 · 13 comments
Open

[DBNET]-Bad performance on long text detection #376

fatfishZhao opened this issue Jul 19, 2021 · 13 comments

Comments

@fatfishZhao
Copy link
Contributor

Hi, thanks for your great job.
I'm using R50DCN dbnet for chinese text detection. I used about 10k pictures for training based on the pretrain model.
When testing, long text cannot be detected, some examples are in the bottom.
Can you give me some explanation of this performance? How can I fix this problem?
image
image
image
image

@0xCreo
Copy link

0xCreo commented Oct 9, 2021

Same Issue. What's the problem of the origin config?

@fatfishZhao
Copy link
Contributor Author

fatfishZhao commented Nov 1, 2021

Solved. There are 2 problems.

The first one is the original shrink and unclip methods in paper, which is not suitable for long text ( the unclipped box is thinner than ground truth), so I changed these methods by my understanding.

The second one is a bug in the

epsilon = 0.01 * cv2.arcLength(poly, True)

0.01 is too big for long texts. The program will get two points rather than 4 points under this setting.
Set this value to 0.002 is much better.

@gaotongxiao
Copy link
Collaborator

Thanks for sharing your solution. Does the final performance look all good?

@fatfishZhao
Copy link
Contributor Author

Yes, much better.

@fatfishZhao
Copy link
Contributor Author

Actually, I'm confused about PPOCR's results. They also use dbnet and the presentation pictures in the repo is pretty good on long texts. The only different I found in the code between mmocr and ppocr is they use bigger unclip ratio.

@gaotongxiao
Copy link
Collaborator

It's probably because PPOCR uses much more private training data...

@Sanster
Copy link

Sanster commented Nov 3, 2021

I wrote a blog about this issue, if anyone is interested in this issue, check it out link

@fatfishZhao
Copy link
Contributor Author

Here is my replacement for shrink and unclip method.
Using A and L to calculate the font size, and set the shrink/unclip distance to be a fixed ratio of the font size.
image
Note: r is same when shrinking and unclipping.

@0xCreo
Copy link

0xCreo commented Nov 5, 2021

Here is my replacement for shrink and unclip method. Using A and L to calculate the font size, and set the shrink/unclip distance to be a fixed ratio of the font size. image Note: r is same when shrinking and unclipping.

I write a simple test for your formula, but it doesn't work well in different h/w ratio. Did I make a mistake here?
1

@fatfishZhao
Copy link
Contributor Author

I think so. It works well in my project.

@viviayi
Copy link

viviayi commented Apr 11, 2022

I think so. It works well in my project.

Hello, Thank you for sharing your method. I tried in my project, it works well for long text, but I found it unclip too much on short text. Do you have same problem? How you fixed it?

@fatfishZhao
Copy link
Contributor Author

I think so. It works well in my project.

Hello, Thank you for sharing your method. I tried in my project, it works well for long text, but I found it unclip too much on short text. Do you have same problem? How you fixed it?

Hi, what is your r setting, you can try smaller r than 0.4 in paper, like 0.2.

@qiuzhixin9527
Copy link

qiuzhixin9527 commented Apr 20, 2023

@Sanster @fatfishZhao @viviayi @gaotongxiao #
Bad performance on long text detection,
image
image
image

it is my config
model = dict(
type='DBNet',
backbone=dict(
type='mmdet.ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=-1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=False,
style='pytorch',
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'),
stage_with_dcn=(False, True, True, True)),
neck=dict(
type='FPNC',
in_channels=[256, 512, 1024, 2048],
lateral_channels=256,
asf_cfg=dict(attention_type='ScaleChannelSpatial')),
det_head=dict(
type='DBHead',
in_channels=256,
module_loss=dict(type='DBModuleLoss'),
postprocessor=dict(
type='DBPostprocessor', text_repr_type='quad',
epsilon_ratio=0.002)),
data_preprocessor=dict(
type='TextDetDataPreprocessor',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
bgr_to_rgb=True,
pad_size_divisor=32))
train_pipeline = [
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
dict(
type='LoadOCRAnnotations',
with_bbox=True,
with_polygon=True,
with_label=True),
dict(
type='TorchVisionWrapper',
op='ColorJitter',
brightness=0.12549019607843137,
saturation=0.5),
dict(
type='ImgAugWrapper',
args=[['Fliplr', 0.5], {
'cls': 'Affine',
'rotate': [-10, 10]
}, ['Resize', [0.5, 3.0]]]),
dict(type='RandomCrop', min_side_ratio=0.1),
dict(type='Resize', scale=(1024, 1024), keep_ratio=True),
dict(type='Pad', size=(1024, 1024)),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape'))
]
test_pipeline = [
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
dict(type='Resize', scale=(1024, 1024), keep_ratio=True),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor',
'instances'))
]
default_scope = 'mmocr'
env_cfg = dict(
cudnn_benchmark=False,
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
dist_cfg=dict(backend='nccl'))
randomness = dict(seed=None)
default_hooks = dict(
timer=dict(type='IterTimerHook'),
logger=dict(type='LoggerHook', interval=5),
param_scheduler=dict(type='ParamSchedulerHook'),
checkpoint=dict(type='CheckpointHook', interval=20),
sampler_seed=dict(type='DistSamplerSeedHook'),
sync_buffer=dict(type='SyncBuffersHook'),
visualization=dict(
type='VisualizationHook',
interval=1,
enable=False,
show=False,
draw_gt=False,
draw_pred=False))
log_level = 'INFO'
log_processor = dict(type='LogProcessor', window_size=10, by_epoch=True)
load_from = 'https://download.openmmlab.com/mmocr/textdet/dbnetpp/tmp_1.0_pretrain/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-352fec8a.pth'
resume = False
val_evaluator = dict(type='HmeanIOUMetric')
test_evaluator = dict(type='HmeanIOUMetric')
vis_backends = [dict(type='LocalVisBackend')]
visualizer = dict(
type='TextDetLocalVisualizer',
name='visualizer',
vis_backends=[dict(type='LocalVisBackend')])
icdar2015_textdet_data_root = '/home/aipf/work/团队共享目录/zhixin/datasets/icdar2015'
icdar2015_textdet_train = dict(
type='OCRDataset',
data_root='/home/aipf/work/团队共享目录/zhixin/datasets/icdar2015',
ann_file='textdet_train.json',
filter_cfg=dict(filter_empty_gt=True, min_size=32),
pipeline=None)
icdar2015_textdet_test = dict(
type='OCRDataset',
data_root='/home/aipf/work/团队共享目录/zhixin/datasets/icdar2015',
ann_file='textdet_test.json',
test_mode=True,
pipeline=None)
optim_wrapper = dict(
type='OptimWrapper',
optimizer=dict(type='SGD', lr=0.0035, momentum=0.9, weight_decay=0.0001))
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=1200, val_interval=20)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')
param_scheduler = [dict(type='PolyLR', power=0.9, eta_min=1e-07, end=200)]
train_list = [
dict(
type='OCRDataset',
data_root='/home/aipf/work/团队共享目录/zhixin/datasets/icdar2015',
ann_file='textdet_train.json',
filter_cfg=dict(filter_empty_gt=True, min_size=32),
pipeline=None)
]
test_list = [
dict(
type='OCRDataset',
data_root='/home/aipf/work/团队共享目录/zhixin/datasets/icdar2015',
ann_file='textdet_test.json',
test_mode=True,
pipeline=None)
]
train_dataloader = dict(
batch_size=8,
num_workers=8,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type='ConcatDataset',
datasets=[
dict(
type='OCRDataset',
data_root='/home/aipf/work/团队共享目录/zhixin/datasets/icdar2015',
ann_file='textdet_train.json',
filter_cfg=dict(filter_empty_gt=True, min_size=32),
pipeline=None)
],
pipeline=[
dict(
type='LoadImageFromFile',
color_type='color_ignore_orientation'),
dict(
type='LoadOCRAnnotations',
with_bbox=True,
with_polygon=True,
with_label=True),
dict(
type='TorchVisionWrapper',
op='ColorJitter',
brightness=0.12549019607843137,
saturation=0.5),
dict(
type='ImgAugWrapper',
args=[['Fliplr', 0.5], {
'cls': 'Affine',
'rotate': [-10, 10]
}, ['Resize', [0.5, 3.0]]]),
dict(type='RandomCrop', min_side_ratio=0.1),
dict(type='Resize', scale=(1024, 1024), keep_ratio=True),
dict(type='Pad', size=(1024, 1024)),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape'))
]))
val_dataloader = dict(
batch_size=8,
num_workers=8,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type='ConcatDataset',
datasets=[
dict(
type='OCRDataset',
data_root='/home/aipf/work/团队共享目录/zhixin/datasets/icdar2015',
ann_file='textdet_test.json',
test_mode=True,
pipeline=None)
],
pipeline=[
dict(
type='LoadImageFromFile',
color_type='color_ignore_orientation'),
dict(type='Resize', scale=(1024, 1024), keep_ratio=True),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape',
'scale_factor', 'instances'))
]))
test_dataloader = dict(
batch_size=8,
num_workers=8,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type='ConcatDataset',
datasets=[
dict(
type='OCRDataset',
data_root='/home/aipf/work/团队共享目录/zhixin/datasets/icdar2015',
ann_file='textdet_test.json',
test_mode=True,
pipeline=None)
],
pipeline=[
dict(
type='LoadImageFromFile',
color_type='color_ignore_orientation'),
dict(type='Resize', scale=(1024, 1024), keep_ratio=True),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape',
'scale_factor', 'instances'))
]))
launcher = 'none'
work_dir = 'output/dbpp0417_dcnv2'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants