HomeCompaniesExla

Exla

An SDK to run transformer models anywhere

Exla aggressively quantizes AI models to minimize memory usage and maximize inference speed. Whether you're deploying LLMs, VLMs, VLAs, or custom models, Exla reduces memory footprint by up to 80% and accelerates inference by 3–20x - all with just a few lines of code. https://cal.com/exla-ai/schedule
Exla
Founded:2025
0
Status:
Active
Group Partner:Brad Flora
Active Founders

Pranav Nair, Co-Founder

CTO at Exla. Previously OS Kernel Engineer at Apple, ensuring over a billion Apple devices can sleep/hibernate as the lead of kernel power management. B.S. Computer Science from Purdue.
Pranav Nair
Pranav Nair
Exla

Viraat Das, Founder

CEO @ Exla. Previously machine learning engineer @ Amazon.
Viraat Das
Viraat Das
Exla
Company Launches
Exla – Run datacenter models on edge devices
See original launch post ›

Hey YC! We’re Viraat and Pranav – cofounders of Exla.

TL;DR

The Exla SDK optimizes models for edge devices (e.g. NVIDIA Jetsons), cutting memory usage by up to 80%, with 3-20x faster inference. We’re focusing on optimizing and deploying LLMs, VLMs, VLAs, and other CV models on the edge.
Here’s Viraat showcasing our SDK

‼️ The Problem

Frontier models are unlocking new applications on constrained edge devices – Vision-Language Models in manufacturing defect detection, Vision-Language-Action Models to control robots via natural language, and LLMs to power in-car assistants are a few examples.

But these models are now shy of a trillion parameters, and with the emergence of inference-time scaling, they are more computationally demanding than ever. This limits their adoption to edge devices with beefy GPUs and sufficiently large VRAM, and even then, a Jetson Orin Nano Super is completely saturated attempting to run a 13B model, leaving little room for other tasks.

🏎️ The Solution: The Exla SDK

We’re building mixed-precision low-bit quantization software to dramatically cut the compute footprint of these models, leading to 80% less memory usage, 3-20x faster inference, and reduced energy consumption.

Today we’re starting with the Exla SDK which applies our optimizations to a catalog of transformer-based and CV models, with growing support for your custom models. We’re primarily targeting deployment on NVIDIA Jetsons, followed by CPU-based platforms like Raspberry Pis and other embedded platforms.

Our roadmap includes building custom silicon that takes advantage of the quirks of low bit compute – which we expect to bring in another order of magnitude in compute savings. We’re bringing frontier models everywhere.

🥷 The Team

We met each other on the first day of college under questionable circumstances and have built several projects together since. Exla is the latest in that series!

Viraat graduated in 2.5 years and joined Amazon as machine learning engineer, where he worked on building personalized search and infrastructure to optimize models. In a previous life, he used to marathon around the world. Now he marathons at home, coding.

Pranav previously worked at Apple as an OS engineer, hacking the iOS/macOS kernel to improve the sleep/wake experience for over a billion devices. In his non-existent free time, he tends to his 5 year old baby – an operating system he’s built from scratch.

🙏 Our Ask!

Please reach out to founders@exla.ai if you’re facing issues optimizing your models on Jetsons, other edge devices, or on-prem deployments! We’re happy to onboard you to our private beta.

We’re particularly looking to solve model optimization at companies working on robotics, manufacturing & industrial automation, and camera-based systems – your help would make a world of difference <3