This is a deep learning tutorial which is summarized to help someone who want to join to deep learning group

  • This is a deep learning tutorial!!! More state-of-the-art papers and methods will be updated.

Book List

Chinese Book

  • 《机器学习实战》--Peter Harrington

  • 《机器学习》--周志华

  • 《统计学习方法》--李航

  • 《神经网络与深度学习》--邱锡鹏.link

  • 《深度学习》--Ian GoodFellow, Yoshua Bengio et al. link

English Book

Courses List

Paper List


Computer Vision

Image Revolution

[0] LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998d). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.(LeNet-5):star::star::star::star::star:

[1] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. (AlexNet, Deep Learning Breakthrough) ⭐⭐⭐⭐⭐

[2] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).(VGGNet,Neural Networks become very deep!) ⭐⭐⭐

[3] Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.(GoogLeNet) ⭐⭐⭐

[4] He, Kaiming, et al. "Deep residual learning for image recognition." arXiv preprint arXiv:1512.03385 (2015).(ResNet,Very very deep networks, CVPR best paper) ⭐⭐⭐⭐⭐

Object Detection

[0] Evan Shelhamer, Jonathan Long, Trevor Darrell:Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (2017).FCN

[1] Ross B. Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik:Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. CVPR 2014.RCNN

 [2] Ross Girshick, Redmond.Fast R-CNN: Fast Region-based Convolutional Networks for object detection. ICCV 2015.Fast RCNN

[3] Shaoqing Ren, Kaiming He, Ross B. Girshick, Jian Sun:Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015.Faster RCNN

[4] Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross B. Girshick:Mask R-CNN. CVPR (2017).Mask RCNN

[5] Object Detection Summary

Semantic Segmentation

 [0] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation.” in CVPR, 2015.:star::star::star::star::star:

[1] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. "Semantic image segmentation with deep convolutional nets and fully connected crfs." In ICLR, 2015.:star::star::star::star::star:

[2] Pinheiro, P.O., Collobert, R., Dollar, P. "Learning to segment object candidates." In: NIPS. 2015.

[3] Dai, J., He, K., Sun, J. "Instance-aware semantic segmentation via multi-task network cascades." in CVPR. 2016

[4] Dai, J., He, K., Sun, J. "Instance-sensitive Fully Convolutional Networks." arXiv preprint arXiv:1603.08678(2016).

[5] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille:Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. CoRR abs/1412.7062 (2014). deeplab1, deeplab1_ppt

[6] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille:DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. CoRR abs/1606.00915 (2016). deeplab2

[7] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille: DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4): 834-848 (2018). deeplab3


[0] Dong, Chao, et al. "Image super-resolution using deep convolutional networks." IEEE transactions on pattern analysis and machine intelligence 38.2 (2016): 295-307.SRCN

[1] Kim, Jiwon, Jung Kwon Lee, and Kyoung Mu Lee. "Deeply-recursive convolutional network for image super-resolution." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 2016.DRCN

[2] Shi, Wenzhe, et al. "Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 2016.ESPCN

[3] Caballero, Jose, et al. "Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation." arXiv preprint arXiv:1611.05250 (2016).VESPCN

[4] Ledig, Christian, et al. "Photo-realistic single image super-resolution using a generative adversarial network." arXiv preprint arXiv:1609.04802 (2016).SRGAN

[5] Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen:Progressive Growing of GANs for Improved Quality, Stability, and Variation. CoRR abs/1710.10196 (2017).PGGAN

[6] Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro:High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. CoRR abs/1711.11585 (2017).Pix2PixHD

[7] Haris M, Shakhnarovich G, Ukita N. Deep Back-Projection Networks For Super-Resolution[J]. arXiv preprint arXiv:1803.02735, 2018.DBPN supplementary material

Deep Learning in SLAM

Depth and Pose

 [0] Keisuke Tateno, Federico Tombari, Iro Laina, Nassir Navab:CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction. cvpr(2017):star::star::star::star::star:

[1] Vikram Mohanty, Shubh Agrawal, Shaswat Datta, Arna Ghosh, Vishnu Dutt Sharma, Debashish Chakravarty:DeepVO: A Deep Learning approach for Monocular Visual Odometry. CoRR abs/1611.06069 (2016)

[2] Sen Wang, Ronald Clark, Hongkai Wen, Niki Trigoni:DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks. ICRA 2017: 2043-2050

[3] Benjamin Ummenhofer, Huizhong Zhou, Jonas Uhrig, Nikolaus Mayer, Eddy Ilg, Alexey Dosovitskiy, Thomas Brox:DeMoN: Depth and Motion Network for Learning Monocular Stereo. CoRR abs/1612.02401 (2016)

[4] Florian Walch, Caner Hazirbas, Laura Leal-Taixé, Torsten Sattler, Sebastian Hilsenbeck, Daniel Cremers:Image-based Localization with Spatial LSTMs. CoRR abs/1611.07890 (2016)

[5] Alex Kendall, Roberto Cipolla:Geometric loss functions for camera pose regression with deep learning. CoRR abs/1704.00390 (2017)

[6] Kishore Reddy Konda, Roland Memisevic:Learning Visual Odometry with a Convolutional Network. VISAPP (1) 2015: 486-490

[7] Yevhen Kuznietsov, Jörg Stückler, Bastian Leibe:Semi-Supervised Deep Learning for Monocular Depth Map Prediction. CoRR abs/1702.02706 (2017):star::star::star::star::star:

[8] Ruihao Li, Sen Wang, Zhiqiang Long, Dongbing Gu:UnDeepVO: Monocular Visual Odometry through Unsupervised Deep Learning. CoRR abs/1709.06841 (2017)

[9] Kishore Reddy Konda, Roland Memisevic:Unsupervised learning of depth and motion. CoRR abs/1312.3429 (2013)

[10] Tinghui Zhou, Matthew Brown, Noah Snavely, David G. Lowe:Unsupervised Learning of Depth and Ego-Motion from Video. CoRR abs/1704.07813 (2017):star::star::star::star::star:

[11] Clément Godard, Oisin Mac Aodha, Gabriel J. Brostow:Unsupervised Monocular Depth Estimation with Left-Right Consistency. CoRR abs/1609.03677 (2016):star::star::star::star::star:

Optical Flow

[12] Slow Flow: Exploiting High-Speed Cameras for Accurate and Diverse Optical Flow Reference Data

[13] Anurag Ranjan, Michael J. Black:Optical Flow Estimation using a Spatial Pyramid Network. CoRR abs/1611.00850 (2016)

[14] Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, Thomas Brox:FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. CoRR abs/1612.01925 (2016):star::star::star::star::star:

Future of SLAM

[0] The Future of Real-Time SLAM and Deep Learning vs SLAM.SLAM

Other state-of-the-art Paper

[0] Dan C. Ciresan, Ueli Meier, Jonathan Masci, Luca Maria Gambardella, Jürgen Schmidhuber:High-Performance Neural Networks for Visual Object Classification. CoRR abs/1102.0183 (2011)

[1] T Miyato, S Maeda, M Koyama, K Nakae, S Ishii:Distributional Smoothing With Virtual Adversarial Training. CS(2015)

[2] Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton:Dynamic Routing Between Capsules. NIPS (2017):star::star::star::star::star:

[3] Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen:Progressive Growing of GANs for Improved Quality, Stability, and Variation. ICLR(2018)

[4] Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Shin Ishii:Virtual Adversarial Training: a Regularization Method for Supervised and Semi-supervised Learning. CoRR abs/1704.03976 (2017)

Report of Computer Vision

[0] A Year in Computer Vision. cv

Natural Language Processing

Speech Recognization

Reinforcement Learning

Transfer Learning


[0] 迁移学习简明手册. link


Unsupervised Model

[0] Le, Quoc V. "Building high-level features using large scale unsupervised learning." 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 2013.(Milestone, Andrew Ng, Google Brain Project, Cat)

[1] Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv preprint arXiv:1312.6114 (2013).(VAE) ⭐⭐⭐⭐⭐

[2] Goodfellow, Ian, et al. "Generative adversarial nets." Advances in Neural Information Processing Systems. 2014.(GAN,super cool idea) ⭐⭐⭐⭐⭐

[3] Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).(DCGAN) ⭐⭐⭐⭐⭐

[4] Gregor, Karol, et al. "DRAW: A recurrent neural network for image generation." arXiv preprint arXiv:1502.04623 (2015). [pdf] (VAE with attention, outstanding work) ⭐⭐⭐⭐⭐

[5] Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. "Pixel recurrent neural networks." arXiv preprint arXiv:1601.06759 (2016). (PixelRNN)

[6] Oord, Aaron van den, et al. "Conditional image generation with PixelCNN decoders." arXiv preprint arXiv:1606.05328 (2016).

[7] Aäron van den Oord, Nal Kalchbrenner, Lasse Espeholt, Koray Kavukcuoglu, Oriol Vinyals, Alex Graves: Conditional Image Generation with PixelCNN Decoders. NIPS 2016: 4790-4798.pixelCNN

[8] Tim Salimans, Andrej Karpathy, Xi Chen, Diederik P. Kingma: PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications. CoRR abs/1701.05517 (2017).PixelCNN++


[0] Graves, Alex. "Generating sequences with recurrent neural networks." arXiv preprint arXiv:1308.0850 (2013).(LSTM, very nice generating result, show the power of RNN)

[1] Cho, Kyunghyun, et al. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv:1406.1078 (2014).(First Seq-to-Seq Paper)

[2] Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.(Outstanding Work) ⭐⭐⭐⭐⭐

[3] Bahdanau, Dzmitry, KyungHyun Cho, and Yoshua Bengio. "Neural Machine Translation by Jointly Learning to Align and Translate." arXiv preprint arXiv:1409.0473 (2014).

[4] Vinyals, Oriol, and Quoc Le. "A neural conversational model." arXiv preprint arXiv:1506.05869 (2015).(Seq-to-Seq on Chatbot)

[5] Understanding LSTM Networks ⭐⭐⭐⭐⭐

CNN(Convolutional Neural Networks)

[0] Dilated Convolutional Kernel - Fisher Yu, Vladlen Koltun:Multi-Scale Context Aggregation by Dilated Convolutions. ICLR(2016)

[1] Deformable Convolutional Kernel - Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei:Deformable Convolutional Networks. CoRR abs/1703.06211 (2017)

[2] Convolution Operations. link

[3] Convolution Analyzer. link

[4] What Do We Understand About Convolutional Networks? link

Lightly Convolution Neural Networks

[0] Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, Kurt Keutzer:SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360 (2016). SqueezeNet

[1] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam:MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR abs/1704.04861 (2017). MobileNets

[2] Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen: Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation. CoRR abs/1801.04381 (2018). MobileNets_V2

[3] François Chollet:Xception: Deep Learning with Depthwise Separable Convolutions. CVPR 2017: 1800-1807. Xception

[4] Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun:ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. CoRR abs/1707.01083 (2017). ShuffleNet

[5] Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le: Learning Transferable Architectures for Scalable Image Recognition. CoRR abs/1707.07012 (2017). NasNet

[6] Robert J. Wang, Xiang Li, Shuang Ao, Charles X. Ling:Pelee: A Real-Time Object Detection System on Mobile Devices. CoRR abs/1804.06882 (2018). PeleeNet

Model Constraints

[0] Hinton, Geoffrey E., et al. "Improving neural networks by preventing co-adaptation of feature detectors." arXiv preprint arXiv:1207.0580 (2012). (Dropout)

[1] Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." Journal of Machine Learning Research 15.1 (2014): 1929-1958.

[2] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).(An outstanding Work in 2015)

[3] Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. "Layer normalization." arXiv preprint arXiv:1607.06450 (2016). (Update of Batch Normalization)

[4] Courbariaux, Matthieu, et al. "Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to+ 1 or−1." (New Model,Fast)

[5] Jaderberg, Max, et al. "Decoupled neural interfaces using synthetic gradients." arXiv preprint arXiv:1608.05343 (2016). (Innovation of Training Method,Amazing Work) ⭐⭐⭐⭐⭐

[6] Chen, Tianqi, Ian Goodfellow, and Jonathon Shlens. "Net2net: Accelerating learning via knowledge transfer." arXiv preprint arXiv:1511.05641 (2015). (Modify previously trained network to reduce training epochs)

[7] Wei, Tao, et al. "Network Morphism." arXiv preprint arXiv:1603.01670 (2016). (Modify previously trained network to reduce training epochs)

[8] Han, Song, Huizi Mao, and William J. Dally. "Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding." CoRR, abs/1510.00149 2 (2015). (ICLR best paper, new direction to make NN running fast,DeePhi Tech Startup) ⭐⭐⭐⭐⭐

[9] Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 1MB model size." arXiv preprint arXiv:1602.07360 (2016).(Also a new direction to optimize NN,DeePhi Tech Startup)


Optimization Methods

[0] Sebastian Ruder:An overview of gradient descent optimization algorithms. CoRR abs/1609.04747 (2016):star::star::star::star::star:

[1] Back Propagation Algorithm

[2] Andrychowicz, Marcin, et al. "Learning to learn by gradient descent by gradient descent." arXiv preprint arXiv:1606.04474 (2016).(Neural Optimizer,Amazing Work)

Optimization Functions

  • Momentum
  • Nesterov accelerated gradient
  • Adagrad
  • Adadelta
  • RMSprop
  • Adam
  • AdaMax
  • Nadam

⭐⭐⭐⭐⭐Adam is a better choice

Types of Activation

  • sigmoid
  • hard sigmoid
  • tanh
  • relu
  • lerelu
  • elu
  • selu
  • prelu
  • maxout
  • swish
  • softplus
  • softshrink
  • softsign
  • tanhshrink
  • softmin
  • softmax
  • logsoftmax
  • softmax2d
  • etc.

relu, lerelu, tanh, sigmoid is recommanded strongly!!!

Journals and Periardical

Machine Learning and Theories

  • NIPS
  • ICML
  • ICLR

 Computer Vision

  • CVPR
  • ICCV
  • ECCV

Neural Language Processing

  • ACL

Artifical Intelligence

  • AAAI

Public Accounts

  • 机器之心
  • 新智元

Deep Learning Framework(open source framework)

New Architecture

  • Convolution Neural Networks

  • Recurrent Neural Networks

  • Generative Adversarial Networks

  • Capsules(Dynamic Routing Between Capsules--by Hinton)



    official codes

 - DenseNet:Densely Connected Convolutional Networks. DenseNet

 - DiracNets: Training Very Deep Neural Networks Without Skip-Connections. DiracNet

  • Non-local Neural Networks. Non-Local Nets

  • Convolutional Neural Networks with Alternately Updated Clique. CliqueNet

Other Sources

Generative Adversarial Networks:(GAN):

  • GAN Paper

  • GAN Tricks

  • GAN Tutorial 2018CVPR

  • From GAN to WGAN

  • GAN Codes




  • GAN Performance Report

  • GAN video

  • 10 papers for GAN(strongly recommend)

    • Progressive Growing of GANs for Improved Quality, Stability, and Variation
    • Spectral Normalization for Generative Adversarial Networks
    • cGANs with Projection Discriminator
    • High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
    • Are GANs Created Equal? A Large-Scale Study
    • Improved Training of Wasserstein GANs
    • StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks
    • Privacy-preserving generative deep neural networks support clinical data sharing
    • Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks
    • Gradient descent GAN optimization is locally stable
  • Something interesting about GAN

    (1) cycle-gan

    (2) progressive-grow gan

Deep Architecture Genealogy

Python Resources

Computer Vision


Geometry and SLAM

Object Detection

Face Datasets

Dehazing Datasets

Deraining Datasets

