Skip to content

Latest commit

 

History

History
128 lines (76 loc) · 4.07 KB

1-introduction.md

File metadata and controls

128 lines (76 loc) · 4.07 KB

Introduction

What is MultiModal?

Sensory Modalities

1-what-is-multiModal

Multimodal Behaviors and Signals

1-02

What is a Modality?

Modality: Modality refers to the way in which something expressed or perceived.

What is Multimodal?

A dictionary definition: with multiple modalities.

A research-oriented definition: MultiModal is the science of heterogeneous and interconnected data.

Heterogeneous Modalities

Information present in different modalities will often show diverse qualities, structures and representations.

1-03

Dimensions of Heterogeneity – Examples:

  • Structure: static, temporal, spatial, hierarchical, invariances
  • Representation space: discrete, continuous, interpretable
  • Information: entropy, density, information overlap, range
  • Granularity: sampling rate, resolution, precision
  • Noise: uncertainty, signal-to-noise ratio, missing data
  • Relevance: task relevance, context dependence

Interconnected Modalities

  • Connections
  • Cross-modal interactions

Dimensions of Cross-modal Interactions:

  • Additive, multiplicative, non-additive
  • Bimodal, trimodal, high-modal
  • Equivalence, correspondence, dependency
  • Dominance, entailment, divergence
  • Modulation, attention, transfer
  • Causality, influences, directionality

Prior Research in “Multimodal”

Four eras of multimodal research:

  • The “behavioral” era (1970s until late 1980s)
    • language and Gestures
  • The “computational” era (late 1980s until 2000)
    • Audio-Visual Speech Recognition (AVSR)
    • Multimodal/multisensory interfaces
    • Multimedia Computing
  • The “interaction” era (2000 - 2010)
    • Modeling Human Multimodal Interaction: AMI Project, CHIL Project, CALO Project(Siri), SSP Project.
  • The “deep learning” era (2010s until ...)
    • Main focus of this tutorial: last 5 years

Multimodal Machine Learning

What is Multimodal Machine Learning?

Multimodal Machine Learning (ML) is the study of computer algorithms that learn and improve through the use and experience of data from multiple modalities

Multimodal Artificial Intelligence (AI) studies computer agents able to demonstrate intelligence capabilities such as understanding, reasoning and planning, through multimodal experiences, and data

Multimodal AI is a superset of Multimodal ML

Multimodal Machine Learning

1-04

Multimodal Technical Challenges – Surveys, Tutorials and Courses

1-05

Challenge 1: Representation

Definition: Learning representations that reflect cross-modal interactions between individual elements, across different modalities

This is a core building block for most multimodal modeling problems!

1-06

Challenge 2: Alignment

Definition: Identifying and modeling cross-modal connections between all elements of multiple modalities, building from the data structure

Most modalities have internal structure with multiple elements

1-07

Challenge 3: Reasoning

Definition: Combining knowledge, usually through multiple inferential steps, exploiting multimodal alignment and problem structure

1-08

1-09

Challenge 4: Generation

Definition: Learning a generative process to produce raw modalities that reflects cross-modal interactions, structure and coherence

1-10

Challenge 5: Transference

Definition: Transfer knowledge between modalities, usually to help the target modality which may be noisy or with limited resources

1-11

1-12

Challenge 6: Quantification

Definition: Empirical and theoretical study to better understand heterogeneity, cross-modal interactions and the multimodal learning process

1-13

Core Multimodal Challenges

1-14