forked from slds-lmu/seminar_multimodal_dl
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path03-00-further.Rmd
7 lines (4 loc) · 1.43 KB
/
03-00-further.Rmd
1
2
3
4
5
6
7
# Further Topics {#c03-00-further}
*Authors: Marco Moldovan, Rickmer Schulte, Philipp Koch*
*Supervisor: Rasmus Hvingelby*
So far we have learned about multimodal models for text and 2D images. Text and images can be seen as merely snapshots of the sensory stimulus that we humans perceive constantly. If we view the research field of multimodal deep learning as a means to approach human-level capabilities of perceiving and processing real-world signals then we have to consider lots of other modalities in a trainable model other than textual representation of language or static images. Besides introducing further modalities that are frequently encountered in multi-modal deep learning, the following chapter will also aim to bridge the gap between the two fundamental sources of data, namely structured and unstructured data. Investigating modeling approaches from both classical statistics and more recent deep learning we will examine the strengths and weaknesses of those and will discover that a combination of both may be a promising path for future research. Going from multiple modalities to multiple tasks, the last section will then broaden our view of multi-modal deep learning by examining multi-purpose modals. Discussing cutting-edge research topics such as the newly proposed Pathways, we will discuss current achievements and limitations of the new modeling that might lead our way towards the ultimate goal of AGI in multi-modal deep learning.