AI Disruption

AI Disruption

Share this post

AI Disruption
AI Disruption
Multimodal: Integrating Large Language Models with Dall-E/Stable Diffusion API(Development of Large Model Applications 17)

Multimodal: Integrating Large Language Models with Dall-E/Stable Diffusion API(Development of Large Model Applications 17)

Master multimodal development with practical lessons. Learn how to integrate large language models with Dall-E and Stable Diffusion for rich interactive experiences.

Meng Li's avatar
Meng Li
Jul 23, 2024
∙ Paid
1

Share this post

AI Disruption
AI Disruption
Multimodal: Integrating Large Language Models with Dall-E/Stable Diffusion API(Development of Large Model Applications 17)
1
Share

Hello everyone, welcome to the "Development of Large Model Applications" column.

Table of Contents

Table of Contents

Meng Li
·
June 7, 2024
Read full story

Starting today, we begin practical lessons on multimodal development.

In May 2024, OpenAI released GPT-4o.

GPT-4o and Multimodal

OpenAI announced that GPT-4o ("o" stands for "omni") is a step towards more natural human-computer interaction. It accepts any combination of text, audio, images, and video as input and generates any combination of text, audio, and images as output.

It can respond to audio input in as little as 232 milliseconds, averaging 320 milliseconds, similar to human conversational response times.

It performs as well as GPT-4 Turbo in English and code text and shows significant improvements in non-English text. It's also faster.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Meng Li
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share