Topics

Multimodal AI

Models that see, hear, and read at once.

Multimodal AI describes models that take in and reason across more than one kind of data — text, images, audio, and video — in a single system, aligning those modalities so the model can answer about an image, transcribe and act on speech, or ground a response in a chart.

7 episodes

Explainer on this topic

Guests on this topic

Paige BaileyVikram ChatterjiAtindriyo SanyalRodrigo CoutinhoLogan KilpatrickChip HuyenVivienne ZhangSara HookerCraig WileyBrian RaymondBob van LuijtJoão Moura