Extreme Small Deep Field-Weighted Factorization Machine - xsDeepFwFM
Acceleration and Compression of DeepFwFM.
DeepFwFM:
@inproceedings{deeplight,
title={DeepLight: Deep Lightweight Feature Interactions for Accelerating CTR Predictions in Ad Serving},
author={Wei Deng and Junwei Pan and Tian Zhou and Deguang Kong and Aaron Flores and Guang Lin},
booktitle={International Conference on Web Search and Data Mining (WSDM'21)},
year={2021}
}
In this repository additional model compression and acceleration will be conducted.
Used techniques:
QR Embeddings
Quantization
Knowledge Distillation
Evaluation on the Criteo dataset and the Twitter dataset given by the RecSys 2020 Challenge.
Split: first 6 days for training & last day for 50% validation and 50% test
Epochs: 50 with early stopping
Batch Size (Training): 2048
Intel Xeon E3-1231v3 & NVIDIA GTX 970
Model
LogLoss
AUC
# Parameters
LR
0.4614
0.7899
1,086,811
FM
0.4555
0.7971
11,954,911
DeepFM
0.4475
0.8056
12,434,912
xDeepFM
0.4454
0.8078
15,508,958
Model
1 (CPU)
64
128
256
512
512 (GPU)
1024
2048
4096
Notes
LR
0.046
0.050
0.078
0.116
0.178
0.285
0.320
0.353
0.419
FM
0.398
0.640
0.770
1.138
2.018
2.130
2.319
2.498
2.604
DeepFM
1.062
20.000
38.186
72.644
142.776
4.215
3.806
4.646
5.914
xDeepFM
14.964
1267.092
2531.474
5011.780
10080.782
20.278
29.985
60.795
119.035
Embedding
Quantization
# Deep Nodes
LogLoss
AUC
PRAUC
RCE
# Parameters
Size (MB)
Notes
-
-
(0,0,0)
0.4473
0.8058
0.6106
21.71
11,956,823
47.84
-
-
(400,400,400)
0.4446
0.8086
0.6159
22.18
12,436,824
49.780
QR
-
(400,400,400)
0.4452
0.8080
0.6150
22.08
7,008,374
28.073
2 collisions
QR
-
(400,400,400)
0.4460
0.8069
0.6139
21.93
4,294,354
17.217
4 collisions
QR
-
(400,400,400)
0.4496
0.8031
0.6076
21.31
1,771,504
7.125
60 collisions
-
Dynamic
(400,400,400)
0.4446
0.8086
0.6159
22.17
12,436,824
48.35
-
Static
(400,400,400)
0.4448
0.8085
0.6157
22.16
12,436,824
24.46
-
QAT
(400,400,400)
0.4459
0.8073
0.6135
21.94
12,436,824
24.46
-
-
(200,200,200)
0.4446
0.8086
0.6160
22.18
11,028,101
44.138
KD, a=0.1, t=3
-
-
(200,200,200)
0.4449
0.8083
0.6154
22.13
11,028,101
44.138
no KD
-
-
(100,100,100)
0.4450
0.8082
0.6152
22.11
10,928,101
43.736
KD, a=0.1, t=3
-
-
(100,100,100)
0.4453
0.8078
0.6146
22.06
10,928,101
43.736
no KD TODO
Embeddings - Latency (ms)
Embedding
# Deep Nodes
1 (CPU)
64
128
256
512
512 (GPU)
1024
2048
4096
Notes
Embedding
(0,0,0)
1.681
5.788
9.834
18.228
36.250
5.103
6.657
9.867
15.849
Embedding
(400,400,400)
3.652
72.418
143.632
279.482
549.400
7.029
7.806
11.757
18.015
QR EmbeddingBag
(400,400,400)
2.482
76.736
151.032
301.912
609.880
6.772
9.772
14.440
24.746
2 collisions TODO measure again
QR EmbeddingBag
(400,400,400)
4.142
67.244
129.276
253.910
504.246
9.995
10.356
14.040
20.046
4 collisions
QR EmbeddingBag
(400,400,400)
4.284
39.628
75.712
147.116
288.920
9.275
10.506
14.431
21.580
60 collisions
Quantization - Latency (ms)
Quantization
1 (CPU)
64
128
256
512
None
3.652
72.418
143.632
279.482
549.400
Dynamic
2.876
9.220
14.284
25.728
49.082
Static
5.062
10.139
16.034
27.146
49.352
QAT
5.457
11.141
16.404
26.892
46.676
Knowledge Distillation - Latency (ms)
Embedding
# Deep Nodes
1 (CPU)
64
128
256
512
512 (GPU)
1024
2048
4096
EmbeddingBag
(400,400,400)
3.652
72.418
143.632
279.482
549.400
7.029
7.806
11.757
18.015
EmbeddingBag
(200,200,200)
0.618
10.626
18.752
37.428
73.060
2.489
2.803
3.175
3.886
EmbeddingBag
(100,100,100)
0.536
2.812
4.818
8.454
16.022
2.520
2.817
3.094
3.325
Embedding
Quantization
# Deep Nodes
LogLoss
AUC
PRAUC
RCE
# Parameters
Size (MB)
Notes
QR Embedding
None
(200,200,200)
0.4449
0.8083
0.6155
22.13
5,599,651
22.431
KD + 2 coll
QR Embedding
None
(200,200,200)
0.4455
0.8076
0.6143
22.02
2,885,631
11.575
KD + 4 coll
QR Embedding
None
(100,100,100)
0.4459
0.8072
0.6138
21.96
2,785,631
11.172
KD + 4 coll
Embedding
Static
(200,200,200)
0.4448
0.8082
0.6159
22.15
11,028,101
19.781
KD
Embedding
Static
(100,100,100)
0.4452
0.8078
0.6151
22.07
10,928,101
19.671
KD
QR Embedding
Static
(200,200,200)
0.4451
0.8080
0.6153
22.09
5,599,651
21.947
KD + 2 coll
Embedding
Quantization
# Deep Nodes
1 (CPU)
64
128
256
512
Notes
QR Embedding
None
(200,200,200)
1.486
10.458
19.296
37.062
72.264
KD + 4 coll
QR Embedding
None
(100,100,100)
1.260
3.058
4.778
8.518
16.136
KD + 4 coll
Embedding
Static
(200,200,200)
6.132
6.826
6.772
7.134
9.414
KD
Embedding
Static
(100,100,100)
5.432
5.726
6.120
6.290
7.240
KD
QR Embedding
Static
(200,200,200)
3.898
4.414
4.888
5.544
7.062
KD + 2 coll
Twitter
Threshold: 15
Epochs: 50 with early stopping
Dropout: 0.2
Model
LogLoss
AUC
PRAUC
RCE
# Parameters
Size (MB)
Notes
FwFM
0.3248
0.9368
0.9049
52.19
131,425,764
525.721
Like
FwFM
0.2202
0.8754
0.5223
31.35
131,425,764
525.721
Retweet
FwFM
0.0988
0.8463
0.1388
16.90
131,425,764
525.721
Reply
FwFM
0.0360
0.8231
0.0456
11.43
131,425,764
525.721
Retweet with comment
DeepFwFM
0.3147
0.9385
0.9077
53.67
131,937,765
527.786
Like
DeepFwFM
0.2210
0.8731
0.5121
31.08
131,937,765
527.786
Retweet
DeepFwFM
0.0965
0.8466
0.1372
18.82
131,937,765
527.786
Reply
DeepFwFM
0.0355
0.8183
0.0416
12.56
131,937,765
527.786
Retweet with comment
xsDeepFwFM
0.3226
0.9372
0.9046
52.51
59,917,741
239.705
Like
xsDeepFwFM
0.2200
0.8739
0.5155
31.42
59,917,741
239.705
Retweet
xsDeepFwFM
0.0961
0.8479
0.1424
19.11
59,917,741
239.705
Reply
xsDeepFwFM
0.0358
0.8124
0.0388
11.92
59,917,741
239.705
Retweet with comment
Model
1 (CPU)
64
128
256
512
FwFM
0.780
19.816
39.906
80.346
160.170
DeepFwFM
2.024
74.256
147.038
290.154
578.644
xsDeepFwFM
5.544
6.160
6.764
7.648
10.464