Homeā€ŗCompaniesā€ŗOuterport

Outerport

Chain-of-memory reasoning for document understanding

Outerport helps AI read documents the way humans do - building up understanding section by section and checking facts along the way, instead of just matching similar phrases. For example, when a team needs to confirm a specific requirement in a 200-page compliance policy document, Outerport pinpoints the right section, provides the reasoning, and presents a fact-based summary. Outerport is built for performance- run thousands of queries against your critical documents, with reasoning behind every retrieval. No more faulty vector RAG.
Outerport
Founded:2024
Team Size:2
Location:San Francisco
Group Partner:Brad Flora

Active Founders

Towaki Takikawa, Co-Founder / CEO

Ex-Research Scientist at NVIDIA Research, worked on generative AI, compression, retrieval, and high-performance systems for neural graphics.
Towaki Takikawa
Towaki Takikawa
Outerport

Allen Wang, Co-founder / CTO

I previously worked at Tome (LLM for sales), Embark (self-driving), Vector Institute (ML research), Facebook (ads), and LinkedIn (infra). CS and math alum from UWaterloo.

Company Launches

TL;DR: Instant model hot swaps, fast cold starts, automatic model updates, predictive LLM scaling, secure access control, all on your infrastructure or private cloud.

Get started now at https://outerport.com!

āŒ The Problem

Horizontal scaling of LLM inference is difficult. Preparing a server for LLM inference roughly involves the following steps:

  1. Downloading a model (usually big, like 10s of GBs!)
  2. Moving the model from disk to RAM
  3. Moving the model from RAM to GPU memory

When implemented naively, just these 3 steps can take around ~4 minutes for a small 7B parameter LLM. To optimize this, you need to implement things like model chunking, parallel downloads, network streaming into memory, and use local SSDs. Even after doing all of this, model loading can take upwards of ~30 seconds, a long time to keep impatient customers waiting.

šŸ—ļø Outerport

Outerport achieves aĀ ~2 secondĀ model load time by keeping models warm in a pinned memory cache daemon, with predictive orchestration to figure out where & when to keep models warm. We provide what many serverless providers have figured out for container images but specialized for model weights which bring new sets of challenges.

Hereā€™s a live demo of the model hotswapping:

https://www.youtube.com/watch?v=YoA2elVvo_o

With Outerport, you can also get:

  • Annoying model storage details taken care of- like chunking, streaming, compression, and encryption.
  • Ease of use: justĀ pushĀ to our model registry andĀ pullĀ to get them fast.
  • A modern web dashboard to monitor and audit your registry and deployments.
  • Self-hosted options for additional security and flexibility.
  • 24/7 customer support from us. šŸ˜„

Overall system architecture:

About Us

We (Towaki and Allen) bring experience in ML infrastructure and systems from NVIDIA, Tome, LinkedIn, and Meta. Allen shipped fine-tuned LLM inference features to 10s of millions of customers at his previous startup, and Towaki worked on writing GPU code & optimizing 3D foundation model training at NVIDIA.

Now we want to unlock this capability to everyone else- ping us at founders@outerport.com, book a demo atĀ https://outerport.com.

Our ask:Ā If you are or know someone who fits any of the bills below, weā€™d love to talk! Please reach out toĀ founders@outerport.com or book a demo atĀ https://outerport.com.

  • Anyone doing fine-tuned LLM or diffusion model inference on-prem or on rented machines.
  • Anyone operating a GPU datacenter or an LLM inference service.
  • Anyone concerned about security & compliance for model weights.
  • Anyone working on LLM inference for regulated industries (finance, banks, pharmaceuticals, etc).