Homeā€ŗCompaniesā€ŗOuterport

VMware for GPUs

Outerport lets companies use their GPUs more efficiently by making it easy for them to be swapped from task to task. Just like how VMWare made it easy to put multiple users on a single server machine, we make it easy to put multiple AI models on a single GPU. Rather than having separate sets of GPUs for each task, you can buy less GPUs and make better use of it. Hot swap foundation model weights instantly, minimize cold starts, scale horizontally, maintain version control, secure your models on a central registry, perform A/B tests, and save 40% on GPU costs.

Outerport
Founded:2024
Team Size:2
Location:San Francisco
Group Partner:Brad Flora

Active Founders

Towaki Takikawa, Co-Founder / CEO

Ex-Research Scientist at NVIDIA Research, worked on high-performance systems for machine learning and graphics.

Towaki Takikawa
Towaki Takikawa
Outerport

Allen Wang, Co-founder / CTO

I previously worked at Tome (LLM for sales), Embark (self-driving), Vector Institute (ML research), Facebook (ads), and LinkedIn (infra). CS and math alum from UWaterloo.

Company Launches

TL;DR: Instant model hot swaps, fast cold starts, automatic model updates, predictive LLM scaling, secure access control, all on your infrastructure or private cloud.

Get started now at https://outerport.com!

āŒ The Problem

Horizontal scaling of LLM inference is difficult. Preparing a server for LLM inference roughly involves the following steps:

  1. Downloading a model (usually big, like 10s of GBs!)
  2. Moving the model from disk to RAM
  3. Moving the model from RAM to GPU memory

When implemented naively, just these 3 steps can take around ~4 minutes for a small 7B parameter LLM. To optimize this, you need to implement things like model chunking, parallel downloads, network streaming into memory, and use local SSDs. Even after doing all of this, model loading can take upwards of ~30 seconds, a long time to keep impatient customers waiting.

šŸ—ļø Outerport

Outerport achieves aĀ ~2 secondĀ model load time by keeping models warm in a pinned memory cache daemon, with predictive orchestration to figure out where & when to keep models warm. We provide what many serverless providers have figured out for container images but specialized for model weights which bring new sets of challenges.

Hereā€™s a live demo of the model hotswapping:

https://www.youtube.com/watch?v=YoA2elVvo_o

With Outerport, you can also get:

  • Annoying model storage details taken care of- like chunking, streaming, compression, and encryption.
  • Ease of use: justĀ pushĀ to our model registry andĀ pullĀ to get them fast.
  • A modern web dashboard to monitor and audit your registry and deployments.
  • Self-hosted options for additional security and flexibility.
  • 24/7 customer support from us. šŸ˜„

Overall system architecture:

About Us

We (Towaki and Allen) bring experience in ML infrastructure and systems from NVIDIA, Tome, LinkedIn, and Meta. Allen shipped fine-tuned LLM inference features to 10s of millions of customers at his previous startup, and Towaki worked on writing GPU code & optimizing 3D foundation model training at NVIDIA.

Now we want to unlock this capability to everyone else- ping us at founders@outerport.com, book a demo atĀ https://outerport.com.

Our ask:Ā If you are or know someone who fits any of the bills below, weā€™d love to talk! Please reach out toĀ founders@outerport.com or book a demo atĀ https://outerport.com.

  • Anyone doing fine-tuned LLM or diffusion model inference on-prem or on rented machines.
  • Anyone operating a GPU datacenter or an LLM inference service.
  • Anyone concerned about security & compliance for model weights.
  • Anyone working on LLM inference for regulated industries (finance, banks, pharmaceuticals, etc).