
Beyond LLMs: Where the next trillion-dollar AI opportunities will be built
The Future of AI: Unlocking Trillion-Dollar Opportunities Beyond LLMs with Multimodal World Models
(This article was generated with AI and it’s based on a AI-generated transcription of a real talk on stage. While we strive for accuracy, we encourage readers to verify important information.)
TechCrunch Senior Reporter Rebecca Bellan discussed AI’s future with Amit Jain, Co-Founder & CEO of Luma AI. Mr. Jain stated that while Large Language Models (LLMs) are valuable, their understanding is limited to text, lacking real-world comprehension. He noted that LLMs, trained on finite text data, cannot perform physical tasks like driving a swimming robot. The next major opportunity lies in multimodal AI, combining text, audio, video, and images.
Unlike the scarcity of text data, there’s an immense amount of video, audio, and images that demonstrate physics and real-world behaviors. Integrating these into a single model, rather than merely adding visual interpretation to LLMs, will create significantly more powerful AI systems capable of operating in the physical world. Mr. Jain clarified that true “world models” require deep understanding of physics and language logic, not just interactivity.
He emphasized the “bitter lesson” of AI: only general methods leveraging vast compute and data succeed, making specialized 3D data approaches impractical due to data scarcity. Luma AI is focusing on “intelligent, agentic world models.” Agents are AI systems that autonomously complete end-to-end tasks, moving beyond single-output models to execute complex projects like creating a 30-second advertisement from a high-level prompt.
Human interaction remains crucial, allowing users to guide and refine agent tasks. Mr. Jain explained that agent capabilities will continuously expand, tackling more complex tasks with higher completion rates, gradually progressing towards Artificial General Intelligence (AGI). This iterative improvement involves scaling models, enhancing data, and teaching them more sophisticated behaviors.
Regarding job displacement in film, Mr. Jain dismissed it as a “hoax,” arguing the real issue is a shortage of creatives skilled in AI tools. Luma AI trains studio and agency staff, identifying a “people problem” as AI lacks inherent creative judgment. He attributed potential job losses to “poor leadership” failing to adapt to industry shifts, citing the film industry’s struggle with streaming and diverse content demands.
Mr. Jain contended that the world needs substantially more content, not less, as evidenced by the growth of platforms like Netflix and TikTok. Luma AI’s strategic plan involves three clear steps: first, solving “generation” to reproduce the world in pixels and language; second, solving “understanding” for interpretation and long-range reasoning; and finally, solving “operation,” which directly leads to general-purpose robotics.
The combination of generation and understanding will create a “robot brain” capable of reasoning and simulating scenarios. For instance, a robot tasked with retrieving a jacket from under a dog would generate and evaluate thousands of potential actions, like distracting the dog with a toy, to achieve its goal safely. This intelligent world model approach is essential for developing truly autonomous robotics.
