[DBNET]-Bad performance on long text detection #376

fatfishZhao · 2021-07-19T11:59:20Z

Hi, thanks for your great job.
I'm using R50DCN dbnet for chinese text detection. I used about 10k pictures for training based on the pretrain model.
When testing, long text cannot be detected, some examples are in the bottom.
Can you give me some explanation of this performance? How can I fix this problem?

0xCreo · 2021-10-09T01:50:04Z

Same Issue. What's the problem of the origin config?

fatfishZhao · 2021-11-01T09:40:54Z

Solved. There are 2 problems.

The first one is the original shrink and unclip methods in paper, which is not suitable for long text ( the unclipped box is thinner than ground truth), so I changed these methods by my understanding.

The second one is a bug in the

mmocr/mmocr/models/textdet/postprocess/wrapper.py

Line 215 in 80741e1

epsilon = 0.01 * cv2.arcLength(poly, True)

0.01 is too big for long texts. The program will get two points rather than 4 points under this setting.
Set this value to 0.002 is much better.

gaotongxiao · 2021-11-01T09:49:59Z

Thanks for sharing your solution. Does the final performance look all good?

fatfishZhao · 2021-11-01T10:05:43Z

Yes, much better.

fatfishZhao · 2021-11-01T10:10:58Z

Actually, I'm confused about PPOCR's results. They also use dbnet and the presentation pictures in the repo is pretty good on long texts. The only different I found in the code between mmocr and ppocr is they use bigger unclip ratio.

gaotongxiao · 2021-11-01T11:21:39Z

It's probably because PPOCR uses much more private training data...

Sanster · 2021-11-03T13:44:34Z

I wrote a blog about this issue, if anyone is interested in this issue, check it out link

fatfishZhao · 2021-11-05T04:17:42Z

Here is my replacement for shrink and unclip method.
Using A and L to calculate the font size, and set the shrink/unclip distance to be a fixed ratio of the font size.

Note: r is same when shrinking and unclipping.

0xCreo · 2021-11-05T07:15:13Z

Here is my replacement for shrink and unclip method. Using A and L to calculate the font size, and set the shrink/unclip distance to be a fixed ratio of the font size. Note: r is same when shrinking and unclipping.

I write a simple test for your formula, but it doesn't work well in different h/w ratio. Did I make a mistake here?

fatfishZhao · 2021-11-08T04:03:04Z

I think so. It works well in my project.

viviayi · 2022-04-11T07:26:46Z

I think so. It works well in my project.

Hello, Thank you for sharing your method. I tried in my project, it works well for long text, but I found it unclip too much on short text. Do you have same problem? How you fixed it?

fatfishZhao · 2022-05-07T03:50:24Z

I think so. It works well in my project.

Hello, Thank you for sharing your method. I tried in my project, it works well for long text, but I found it unclip too much on short text. Do you have same problem? How you fixed it?

Hi, what is your r setting, you can try smaller r than 0.4 in paper, like 0.2.

qiuzhixin9527 · 2023-04-20T08:00:44Z

@Sanster @fatfishZhao @viviayi @gaotongxiao #
Bad performance on long text detection,

it is my config
model = dict(
type='DBNet',
backbone=dict(
type='mmdet.ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=-1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=False,
style='pytorch',
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'),
stage_with_dcn=(False, True, True, True)),
neck=dict(
type='FPNC',
in_channels=[256, 512, 1024, 2048],
lateral_channels=256,
asf_cfg=dict(attention_type='ScaleChannelSpatial')),
det_head=dict(
type='DBHead',
in_channels=256,
module_loss=dict(type='DBModuleLoss'),
postprocessor=dict(
type='DBPostprocessor', text_repr_type='quad',
epsilon_ratio=0.002)),
data_preprocessor=dict(
type='TextDetDataPreprocessor',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
bgr_to_rgb=True,
pad_size_divisor=32))
train_pipeline = [
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
dict(
type='LoadOCRAnnotations',
with_bbox=True,
with_polygon=True,
with_label=True),
dict(
type='TorchVisionWrapper',
op='ColorJitter',
brightness=0.12549019607843137,
saturation=0.5),
dict(
type='ImgAugWrapper',
args=[['Fliplr', 0.5], {
'cls': 'Affine',
'rotate': [-10, 10]
}, ['Resize', [0.5, 3.0]]]),
dict(type='RandomCrop', min_side_ratio=0.1),
dict(type='Resize', scale=(1024, 1024), keep_ratio=True),
dict(type='Pad', size=(1024, 1024)),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape'))
]
test_pipeline = [
dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
dict(type='Resize', scale=(1024, 1024), keep_ratio=True),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape', 'scale_factor',
'instances'))
]
default_scope = 'mmocr'
env_cfg = dict(
cudnn_benchmark=False,
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
dist_cfg=dict(backend='nccl'))
randomness = dict(seed=None)
default_hooks = dict(
timer=dict(type='IterTimerHook'),
logger=dict(type='LoggerHook', interval=5),
param_scheduler=dict(type='ParamSchedulerHook'),
checkpoint=dict(type='CheckpointHook', interval=20),
sampler_seed=dict(type='DistSamplerSeedHook'),
sync_buffer=dict(type='SyncBuffersHook'),
visualization=dict(
type='VisualizationHook',
interval=1,
enable=False,
show=False,
draw_gt=False,
draw_pred=False))
log_level = 'INFO'
log_processor = dict(type='LogProcessor', window_size=10, by_epoch=True)
load_from = 'https://download.openmmlab.com/mmocr/textdet/dbnetpp/tmp_1.0_pretrain/dbnetpp_r50dcnv2_fpnc_100k_iter_synthtext-20220502-352fec8a.pth'
resume = False
val_evaluator = dict(type='HmeanIOUMetric')
test_evaluator = dict(type='HmeanIOUMetric')
vis_backends = [dict(type='LocalVisBackend')]
visualizer = dict(
type='TextDetLocalVisualizer',
name='visualizer',
vis_backends=[dict(type='LocalVisBackend')])
icdar2015_textdet_data_root = '/home/aipf/work/团队共享目录/zhixin/datasets/icdar2015'
icdar2015_textdet_train = dict(
type='OCRDataset',
data_root='/home/aipf/work/团队共享目录/zhixin/datasets/icdar2015',
ann_file='textdet_train.json',
filter_cfg=dict(filter_empty_gt=True, min_size=32),
pipeline=None)
icdar2015_textdet_test = dict(
type='OCRDataset',
data_root='/home/aipf/work/团队共享目录/zhixin/datasets/icdar2015',
ann_file='textdet_test.json',
test_mode=True,
pipeline=None)
optim_wrapper = dict(
type='OptimWrapper',
optimizer=dict(type='SGD', lr=0.0035, momentum=0.9, weight_decay=0.0001))
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=1200, val_interval=20)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')
param_scheduler = [dict(type='PolyLR', power=0.9, eta_min=1e-07, end=200)]
train_list = [
dict(
type='OCRDataset',
data_root='/home/aipf/work/团队共享目录/zhixin/datasets/icdar2015',
ann_file='textdet_train.json',
filter_cfg=dict(filter_empty_gt=True, min_size=32),
pipeline=None)
]
test_list = [
dict(
type='OCRDataset',
data_root='/home/aipf/work/团队共享目录/zhixin/datasets/icdar2015',
ann_file='textdet_test.json',
test_mode=True,
pipeline=None)
]
train_dataloader = dict(
batch_size=8,
num_workers=8,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type='ConcatDataset',
datasets=[
dict(
type='OCRDataset',
data_root='/home/aipf/work/团队共享目录/zhixin/datasets/icdar2015',
ann_file='textdet_train.json',
filter_cfg=dict(filter_empty_gt=True, min_size=32),
pipeline=None)
],
pipeline=[
dict(
type='LoadImageFromFile',
color_type='color_ignore_orientation'),
dict(
type='LoadOCRAnnotations',
with_bbox=True,
with_polygon=True,
with_label=True),
dict(
type='TorchVisionWrapper',
op='ColorJitter',
brightness=0.12549019607843137,
saturation=0.5),
dict(
type='ImgAugWrapper',
args=[['Fliplr', 0.5], {
'cls': 'Affine',
'rotate': [-10, 10]
}, ['Resize', [0.5, 3.0]]]),
dict(type='RandomCrop', min_side_ratio=0.1),
dict(type='Resize', scale=(1024, 1024), keep_ratio=True),
dict(type='Pad', size=(1024, 1024)),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape'))
]))
val_dataloader = dict(
batch_size=8,
num_workers=8,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type='ConcatDataset',
datasets=[
dict(
type='OCRDataset',
data_root='/home/aipf/work/团队共享目录/zhixin/datasets/icdar2015',
ann_file='textdet_test.json',
test_mode=True,
pipeline=None)
],
pipeline=[
dict(
type='LoadImageFromFile',
color_type='color_ignore_orientation'),
dict(type='Resize', scale=(1024, 1024), keep_ratio=True),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape',
'scale_factor', 'instances'))
]))
test_dataloader = dict(
batch_size=8,
num_workers=8,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type='ConcatDataset',
datasets=[
dict(
type='OCRDataset',
data_root='/home/aipf/work/团队共享目录/zhixin/datasets/icdar2015',
ann_file='textdet_test.json',
test_mode=True,
pipeline=None)
],
pipeline=[
dict(
type='LoadImageFromFile',
color_type='color_ignore_orientation'),
dict(type='Resize', scale=(1024, 1024), keep_ratio=True),
dict(
type='LoadOCRAnnotations',
with_polygon=True,
with_bbox=True,
with_label=True),
dict(
type='PackTextDetInputs',
meta_keys=('img_path', 'ori_shape', 'img_shape',
'scale_factor', 'instances'))
]))
launcher = 'none'
work_dir = 'output/dbpp0417_dcnv2'

gaotongxiao added the community discussion label Nov 12, 2021

gaotongxiao mentioned this issue Jun 27, 2022

Why hmean is alway 0.000 when train toy dataset by textsnake or dbnet #1109

Closed

balandongiv mentioned this issue Jul 14, 2022

Best practice for training text detection model #1154

Open

qiuzhixin9527 mentioned this issue Apr 20, 2023

dbnet++ Poor detection of long text #1869

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DBNET]-Bad performance on long text detection #376

[DBNET]-Bad performance on long text detection #376

fatfishZhao commented Jul 19, 2021

0xCreo commented Oct 9, 2021

fatfishZhao commented Nov 1, 2021 •

edited

Loading

gaotongxiao commented Nov 1, 2021

fatfishZhao commented Nov 1, 2021

fatfishZhao commented Nov 1, 2021

gaotongxiao commented Nov 1, 2021

Sanster commented Nov 3, 2021

fatfishZhao commented Nov 5, 2021

0xCreo commented Nov 5, 2021

fatfishZhao commented Nov 8, 2021

viviayi commented Apr 11, 2022

fatfishZhao commented May 7, 2022

qiuzhixin9527 commented Apr 20, 2023 •

edited

Loading

[DBNET]-Bad performance on long text detection #376

[DBNET]-Bad performance on long text detection #376

Comments

fatfishZhao commented Jul 19, 2021

0xCreo commented Oct 9, 2021

fatfishZhao commented Nov 1, 2021 • edited Loading

gaotongxiao commented Nov 1, 2021

fatfishZhao commented Nov 1, 2021

fatfishZhao commented Nov 1, 2021

gaotongxiao commented Nov 1, 2021

Sanster commented Nov 3, 2021

fatfishZhao commented Nov 5, 2021

0xCreo commented Nov 5, 2021

fatfishZhao commented Nov 8, 2021

viviayi commented Apr 11, 2022

fatfishZhao commented May 7, 2022

qiuzhixin9527 commented Apr 20, 2023 • edited Loading

fatfishZhao commented Nov 1, 2021 •

edited

Loading

qiuzhixin9527 commented Apr 20, 2023 •

edited

Loading