Skip to content

ASU-APG/serum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 

Repository files navigation

SERUM @ WACV 2023

SEmantic Data Engineering for Robustness Under Multimodal Settings


Website for SERUM Tutorial at WACV 2023, January 7, 2PM to 5PM

Hosted by Tejas Gokhale and Yezhou Yang (Arizona State University)

Agenda

In the past decade, we have witnessed a paradigm shift in computer vision -- the connection between vision and language (V+L) is now an integral part of AI. V+L comprises of human-interactive tasks such as visual question answering, image captioning, visual dialog, visual entailment and grounding, V+L navigation, and text-to-image generation. This field has already had an impact on other research communities such as NLP, robotics, graphics, and direct industrial implications for software, arts, media, and journalism. As V+L models become widely adopted, new types of challenges and failure modes are emerging, that have not been studied by previous work on robustness. Multi-modal tasks involving both vision and language (V+L) inputs, open up intriguing domain discrepancies that can affect model performance of test time.

In this tutorial, we will show how semantic data transformation -- i.e. data transformation guided by the knowledge of logical and semantic features of natural language, can

  • help improve the robustness of V+L models,
  • enable weakly supervised learning in cases with limited or no human-annotated datasets,
  • enhance the quality of outputs in generative settings such as captioning, and
  • guide multi-modal knowledge retrieval for knowledge-based visual question answering.

Tentative Schedule

Time (UTC-10) Topic Presenter
1400--1415 Welcome and Introduction Yezhou Yang
(Associate Professor, ASU
1415--1515 Plenary Talk:
Towards Building Multimodal Foundation Models
Zhe Gan
(Staff Research Scientist, Apple
1515--1600 Robust Semantic Vision with Knowledge-Guided Data Transformation Tejas Gokhale
(Ph.D. Candidate, ASU)
1600--1620 Enhancing Video Captioning with Commonsense Descriptions Yezhou Yang
(Associate Professor, ASU)
1620--1645 Visual-Retriever-Reader for Knowledge-based Question Answering Man Luo
(Ph.D. Candidate, ASU)
1645--1700 Concluding Remarks Tejas Gokhale
(Ph.D. Candidate, ASU)

This website will be updated closer to the event date.

We acknowledge support from NSF Robust Intelligence grant #2132724

About

Website for SERUM Tutorial at WACV 2023

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published