Error while training. #8867

yonisoft · 2023-11-11T22:41:01Z

I'm trying to train yolo-v4-tiny with rtx 4090 on windows cuda version 12.1, installed darknet with vcpkg.
Training it with colab worked but with my pc i have the problem.
This is the command:
darknet detector train data/obj.data cfg/yolov4-tiny-custom.cfg yolov4-tiny.conv.29 -dont_show -map

Yolov4 tiny config:
'[net]
#filters=(classes+5)x3

Testing

#batch=1
#subdivisions=1

Training

batch=64
subdivisions=16
width=640
height=640
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.00261
burn_in=1000
max_batches = 6000
policy=steps
steps=4800,5400
scales=.1,.1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[route]
layers=-1
groups=2
group_id=1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[route]
layers = -1,-2

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[route]
layers = -6,-1

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[route]
layers=-1
groups=2
group_id=1

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[route]
layers = -1,-2

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[route]
layers = -6,-1

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[route]
layers=-1
groups=2
group_id=1

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[route]
layers = -1,-2

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[route]
layers = -6,-1

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

##################################

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=24
activation=linear

[yolo]
mask = 3,4,5
anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
classes=3
num=6
jitter=.3
scale_x_y = 1.05
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
ignore_thresh = .7
truth_thresh = 1
random=0
resize=1.5
nms_kind=greedynms
beta_nms=0.6

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 23

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=24
activation=linear

[yolo]
mask = 0,1,2
anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
classes=3
num=6
jitter=.3
scale_x_y = 1.05
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
ignore_thresh = .7
truth_thresh = 1
random=0
resize=1.5
nms_kind=greedynms
beta_nms=0.6
`

The error is:
`
(next mAP calculation at 1000 iterations)
1000: 0.092072, 0.090814 avg loss, 0.002610 rate, 0.218000 seconds, 64000 images, 0.307626 hours left

calculation mAP (mean average precision)...
Detection layer: 30 - type = 28
Detection layer: 37 - type = 28
4
cuDNN status Error in: file: C:\Users\yoni1\Desktop\vcpkg\buildtrees\darknet\src\e778426c57-96aa9384e0.clean\src\convolutional_kernels.cu : forward_convolutional_layer_gpu() : line: 555 : build time: Nov 7 2023 - 01:45:54

cuDNN Error: CUDNN_STATUS_BAD_PARAM`

yonisoft · 2023-11-12T17:20:20Z

Solved by changing:
subdivisions=16 from 16 to 64
or just taking of -map

ramanrewati · 2024-06-06T17:40:51Z

@yonisoft can you please share me the colab notebook

stephanecharette · 2024-06-06T17:45:38Z

@yonisoft and @ramanrewati:

This error is fixed in the new Darknet/YOLO repo: https://github.com/hank-ai/darknet#table-of-
contents

ramanrewati · 2024-06-06T17:48:41Z

Thanks, I'm trying this one rn,hope I don't run into errors again 🙂

ramanrewati · 2024-06-06T17:55:42Z

@stephanecharette can I get the link to colab notebook, don't know how will the new repo work

stephanecharette · 2024-06-06T17:57:19Z

What colab notebook?

stephanecharette · 2024-06-06T17:58:02Z

https://discord.com/channels/741676058666860635/1184564987511963829

ramanrewati · 2024-06-06T18:00:37Z

What colab notebook?

The one I can train the yolo v4 from. Checking discord rn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while training. #8867

Error while training. #8867

yonisoft commented Nov 11, 2023 •

edited

Loading

yonisoft commented Nov 12, 2023

ramanrewati commented Jun 6, 2024

stephanecharette commented Jun 6, 2024

ramanrewati commented Jun 6, 2024

ramanrewati commented Jun 6, 2024

stephanecharette commented Jun 6, 2024

stephanecharette commented Jun 6, 2024

ramanrewati commented Jun 6, 2024

Error while training. #8867

Error while training. #8867

Comments

yonisoft commented Nov 11, 2023 • edited Loading

Testing

Training

yonisoft commented Nov 12, 2023

ramanrewati commented Jun 6, 2024

stephanecharette commented Jun 6, 2024

ramanrewati commented Jun 6, 2024

ramanrewati commented Jun 6, 2024

stephanecharette commented Jun 6, 2024

stephanecharette commented Jun 6, 2024

ramanrewati commented Jun 6, 2024

yonisoft commented Nov 11, 2023 •

edited

Loading