Awesome Referring Expression Comprehension

Inspired by awesome-grounding and this survey.

A curated list of research papers in Referring Expression Comprehension (REC). Link to the code and website if available is also present.

Paper List

Survey

Referring Expression Comprehension : A Survey of Methods and Datasets. Yanyuan Qiao, Chaorui Deng, and Qi Wu. arXiv, 2020. [Paper]

Dataset

[RefCOCOg] Generation and Comprehension of Unambiguous Object Descriptions. Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan Yuille, and Kevin Murphy. CVPR, 2016. [Paper] [Code]
[RefCOCO, RefCOCO+] Modeling context in referring expressions. Licheng Yu, Patrick Poirson, Shan Yang, Alexander C. Berg, and Tamara L. Berg. ECCV, 2016. [Paper] [Code]
[CLEVR-Ref+] CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions. Runtao Liu, Chenxi Liu, Yutong Bai, and Alan Yuille. CVPR, 2019. [Paper] [Code] [Website]
[Cops-Ref] Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension. Zhenfang Chen, Peng Wang, Lin Ma, Kwan-Yee K. Wong, and Qi Wu. CVPR, 2020. [Paper]~~[Code]~~
[Ref-Reasoning] Graph-Structured Referring Expression Reasoning in The Wild. Sibei Yang, Guanbin Li, and Yizhou Yu. CVPR, 2020. [Paper] [Code] [Website]

arXiv

(TransVG) TransVG: End-to-End Visual Grounding with Transformers. Jiajun Deng, Zhengyuan Yang, Tianlang Chen, Wengang Zhou, Houqiang Li. arXiv, 2021. [Paper]
(ECIFA) Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge. Peng Wang, Dongyang Liu, Hui Li, and Qi Wu. arXiv, 2020. [Paper]
(JVGN) Joint Visual Grounding with Language Scene Graphs. Daqing Liu, Hanwang Zhang, Zheng-Jun Zha, Meng Wang, and Qianru Sun. arXiv, 2019. [Paper] (I am an author of the paper)
A Real-time Global Inference Network for One-stage Referring Expression Comprehension. Yiyi Zhou et al. arXiv, 2019. [Paper] [Code]
(SGG) Real-Time Referring Expression Comprehension by Single-Stage Grounding Network. Xinpeng Chen, Lin Ma, Jingyuan Chen, Zequn Jie, Wei Liu, and Jiebo Luo. arXiv, 2018. [Paper]

2020

Improving One-stage Visual Grounding by Recursive Sub-query Construction. Zhengyuan Yang, Tianlang Chen, Liwei Wang, and Jiebo Luo. ECCV, 2020. [Paper] [Code]
(LSCM) Linguistic Structure Guided Context Modeling for Referring Image Segmentation. Tianrui Hui et al. ECCV, 2020. [Paper]
(BiLingUNet) BiLingUNet: Image Segmentation by Modulating Top-Down and Bottom-Up Visual Processing with Referring Expressions. Ozan Arkan Can, İlker Kesen, and Deniz Yuret. ECCV, 2020. [Paper]
(SGMN) Graph-Structured Referring Expression Reasoning in The Wild. Sibei Yang, Guanbin Li, and Yizhou Yu. CVPR, 2020. [Paper] [Code] [Website]
(MCN) Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation. Gen Luo et al. CVPR, 2020. [Paper] [Code]
(RCCF) A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension. Yue Liao et al. CVPR, 2020. [Paper]
(LCMCG) Learning Cross-modal Context Graph for Visual Grounding. Yongfei Liu, Bo Wan, Xiaodan Zhu, and Xuming He. AAAI, 2020. [Code]

2019

(NMTree) Learning to Assemble Neural Module Tree Networks for Visual Grounding. Daqing Liu, Hanwang Zhang, Feng Wu, and Zheng-Jun Zha. ICCV, 2019. [Paper] [Code] (I am an author of the paper)
(RvG-Tree) Learning to Compose and Reason with Language Tree Structures for Visual Grounding. Richang Hong, Daqing Liu, Xiaoyu Mo, Xiangnan He, and Hanwang Zhang. TPAMI, 2019. [Paper] (I am an author of the paper)
(FAOA) A Fast and Accurate One-Stage Approach to Visual Grounding. Zhengyuan Yang, Boqing Gong, Liwei Wang, Wenbing Huang, Dong Yu, and Jiebo Luo. ICCV, 2019. [Paper] [Code]
(DGA) Dynamic Graph Attention for Referring Expression Comprehension. Sibei Yang, Li Guanbin, and Yu Yizhou. ICCV, 2019. [Paper] [Code]
(LCGN) Language-Conditioned Graph Networks for Relational Reasoning. Ronghang Hu, Anna Rohrbach, Trevor Darrell, and Kate Saenko. ICCV, 2019. [Paper] [Code]
See-through-text grouping for referring image segmentation. DIng Jie Chen, Songhao Jia, Yi Chen Lo, Hwann Tzong Chen, and Tyng Luh Liu. ICCV, 2019. [Paper]
(CMRIN) Cross-Modal Relationship Inference for Grounding Referring Expressions. Sibei Yang, Guanbin Li, and Yizhou Yu. CVPR, 2019. [Paper]
(CM-Att-Erase) Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing. Xihui Liu, Zihao Wang, Jing Shao, Xiaogang Wang, and Hongsheng Li. CVPR, 2019. [Paper]
(CMSA) Cross-Modal Self-Attention Network for Referring Image Segmentation. Linwei Ye, Mrigank Rochan, Zhi Liu, and Yang Wang. CVPR, 2019. [Paper] [Code]

2018

(Multi-hop FiLM) Visual Reasoning with Multi-hop Feature Modulation. Florian Strub, Mathieu Seurin, Ethan Perez, and Harm De Vries. ECCV, 2018. [Paper]
(DDPN) Rethinking diversified and discriminative proposal generation for visual grounding. Zhou Yu, Jun Yu, Chenchao Xiang, Zhou Zhao, Qi Tian, and Dacheng Tao. IJCAI, 2018. [Paper] [Code]
(MAttNet) MAttNet: Modular Attention Network for Referring Expression Comprehension. Licheng Yu *et al.* CVPR, 2018. [Paper] [Code] [Website]
(AccumAttn) Visual Grounding via Accumulated Attention. Chaorui Deng, Qi Wu, Qingyao Wu, Fuyuan Hu, Fan Lyu, and Mingkui Tan. CVPR, 2018. [Paper]
(ParalAttn) Parallel Attention: A Unified Framework for Visual Object Discovery Through Dialogs and Queries. Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, and Anton Van Den Hengel. CVPR, 2018. [Paper] [Code]
(LGRAN) Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks. Peng Wang, Qi Wu, Jiewei Cao, Chunhua Shen, Lianli Gao, and Anton van den Hengel. CVPR, 2018. [Paper]
(VariContext) Grounding Referring Expressions in Images by Variational Context. Hanwang Zhang, Yulei Niu, and Shih-Fu Chang. CVPR, 2018. [Paper] [Code]
(GroundNet) Using Syntax to Ground Referring Expressions in Natural Images. Volkan Cirik, Taylor Berg-Kirkpatrick, and Louis-Philippe Morency. AAAI, 2018. [Paper] [Code]

2017

Recurrent Multimodal Interaction for Referring Image Segmentation. Chenxi Liu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, and Alan Yuille. ICCV, 2017. [Paper] [Code]
(Attribute) Referring Expression Generation and Comprehension via Attributes. Jingyu Liu, Liang Wang, and Ming-Hsuan Yang. ICCV, 2017. [Paper]
(CMN) Modeling relationships in referential expressions with compositional modular networks. Ronghang Hu, Marcus Rohrbach, Jacob Andreas, Trevor Darrell, and Kate Saenko. CVPR, 2017. [Paper] [Code]
(Spe+Lis+RI) A Joint Speaker-Listener-Reinforcer Model for Referring Expressions. Licheng Yu, Hao Tan, Mohit Bansal, and Tamara L. Berg. CVPR, 2017. [Paper] [Code] [Website]
Comprehension-guided referring expressions. Ruotian Luo and Gregory Shakhnarovich. CVPR, 2017. [Paper] [Code]

2016

(MCB) Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding. Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach. EMNLP, 2016. [Paper] [Code]
(NegBag) Modeling context between objects for referring expression understanding. Varun K. Nagaraja, Vlad I. Morariu, and Larry S. Davis. ECCV, 2016. [Paper] [Code]
(VisDif) Modeling context in referring expressions. Licheng Yu, Patrick Poirson, Shan Yang, Alexander C. Berg, and Tamara L. Berg. ECCV, 2016. [Paper] [Code]
(SCRC) Natural Language Object Retrieval. Ronghang Hu, Huazhe Xu, Marcus Rohrbach, Jiashi Feng, Kate Saenko, and Trevor Darrell. CVPR, 2016. [Paper] [Code] [Website]
(MMI) Generation and Comprehension of Unambiguous Object Descriptions. Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan Yuille, and Kevin Murphy. CVPR, 2016. [Paper] [Code]

Contributing

Please feel free to contact me via email (liudq@mail.ustc.edu.cn) or open an issue or submit a pull request.

To add a new paper via pull request:

Fork the repo, edit README.md.

Put the new paper at the correct chronological position as the following format:

- **Paper Title**. *Author(s)*. Conference, Year. [[Paper]](link) [[Code]](link) [[Website]](link)

Send a pull request. Ideally, I will review the request within a week.

Acknowledgement

This repo is maintained by Daqing LIU.

Other Awesome Vision-Language lists: Awesome Vision-Languge Navigation, Awesome-Video-Captioning.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Referring Expression Comprehension

Table of Contents

Paper List

Survey

Dataset

arXiv

2020

2019

2018

2017

2016

Contributing

Acknowledgement

About

Releases

Packages

License

daqingliu/awesome-rec

Folders and files

Latest commit

History

Repository files navigation

Awesome Referring Expression Comprehension

Table of Contents

Paper List

Survey

Dataset

arXiv

2020

2019

2018

2017

2016

Contributing

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages