
AI compute at the breaking point
Unlocking Real-Time AI: d-Matrix Pioneers Low-Latency Inference at Web Summit Lisbon 2025
(This article was generated with AI and it’s based on a AI-generated transcription of a real talk on stage. While we strive for accuracy, we encourage readers to verify important information.)
Sid Sheth, Founder & CEO of d-Matrix, addressed Web Summit Lisbon 2025, detailing the critical shift in AI computing from training to inference. He noted that while the last decade focused on GPU-intensive model training, the current era, driven by innovations like the transformer model in 2017 and the widespread adoption of ChatGPT in 2022, now prioritizes deploying these models for real-world applications.
The advent of reasoning models in 2024, enabling AI to “think” before responding, and the “Deep Seek moment” in early 2025, which released numerous open-source reasoning models, rapidly accelerated application development. This spurred a critical demand for low-latency inference by late 2025, as users required real-time interactivity for voice, video, and agentic applications, moving beyond slower, offline experiences.
Mr. Sheth explained that inference is diverse, requiring optimization for various metrics like latency, cost, and energy efficiency. He highlighted that GPUs, a 30-year-old architecture, struggle with low-latency inference due to the “memory wall.” This involves constant, energy-intensive data movement between separate compute and HBM memory blocks, creating a significant bottleneck, especially as AI models demand vast data access.
To overcome the memory wall, d-Matrix developed an innovative in-memory computing substrate. This solution fundamentally fuses compute and memory, allowing data to reside directly where computation occurs. This design drastically minimizes data movement, thereby significantly reducing latency and energy consumption, making d-Matrix’s technology exceptionally well-suited for highly interactive AI applications.
d-Matrix’s first product, Corsair, incorporates this substrate, featuring two chips with eight chiplets and almost 2 gigabytes of SRAM, offering substantially more memory capacity than competitors. The Jetstream card then connects multiple Corsair compute cards, forming a scalable network for large models. Future products like Raptor will further enhance capacity and bandwidth through 3D memory stacking.
d-Matrix collaborates with partners like Super Micro for servers, Arista for switches, and Broadcom for switching silicon to build comprehensive, high-performance integrated systems. This approach ensures superior low-latency inference and allows numerous users to access applications simultaneously without performance degradation, addressing critical TCO, Opex, and Capex challenges for data centers. Mr. Sheth projected the low-latency inference market to reach $100 billion within five years.
Concluding, Mr. Sheth expressed gratitude for the Qatar Investment Authority’s participation in d-Matrix’s recent $275 million funding round, which valued the company at $2 billion. He emphasized Qatar’s integral role in d-Matrix’s journey and reiterated the company’s commitment to helping Qatar leverage bleeding-edge solutions on the global stage.
