HomeCompaniesnCompass Technologies

nCompass Technologies

Deploy hardware accelerated AI models with only one line of code

nCompass is a platform for acceleration and hosting of open-source and custom AI models. We provide low-latency AI deployment without rate-limiting you. All with just one line of code.
nCompass Technologies
Founded:2023
Team Size:2
Location:
Group Partner:Dalton Caldwell

Active Founders

Aditya Rajagopal, Founder

I am a recent PhD graduate from Imperial College London with experience in machine learning algorithms, compilers and hardware architectures. I've worked in compiler teams at Qualcomm and Huawei as well as served as a reviewer for ICML. My co-founder and I are building nCompass which is a platform for accelerating and hosting both open-source and custom large AI models. Our focus is on providing rate unlimited and low latency large AI inference with only one line of code.

Diederik Vink, Founder

I'm a recent Imperial College London PhD Graduate where I specialized in reconfigurable hardware architectures for accelerated machine learning and reduced precision training algorithms. I have worked as an AI feasibility consultant prototyping and evaluating AI spin-outs. We are building nCompass, a platform for accelerating and hosting both open-source and custom large AI models. Our focus is on providing rate-unlimited and low latency large AI inference with only one line of code.

Company Launches

Tl;dr:

We’ve built an AI model inferencing system that can serve requests at scale like no other and now we’re releasing it to the public as a rate-limit-free API. We serve any open-source LLM and can also deploy optimized versions of your custom fine-tuned LLM with cost-effective autoscaling. Sign up here, create an API key, get $100 of credit on us, and run as many requests as you like!

The Problem

Deploying AI models in production requires expensive infrastructure. Serving more than ~10req/s using open source inference engines like vLLM on a single GPU results in terrible quality of service. Time-to-first-token skyrockets to more than 10s, and end-to-end latency degrades even more!

The common solution: horizontally scale up GPUs.

The problem: GPU’s are expensive and hard to find.

Why should you care

  1. API user: These high infrastructure costs are the reason you suffer rate limits when using existing API providers.
  2. Deploying on-prem: Your infrastructure costs might be the reason a PoC doesn’t move to production.

Our Solution

We’ve built an AI inference serving system that can sustain 100s of requests per second while maintaining a time-to-first-token of <1s on ~30% fewer GPUs when compared to NVIDIA’s NIMs containers and up to 2x fewer GPUs when compared to vLLM.

This enables us to provide a rate-limit-free API while maintaining a high quality of service. Alternatively, we can provide this as a cost-effective on-prem deployment solution, ensuring your infrastructure costs don’t blow up with requests served. We support any open source model and can host your custom fine-tuned model as an API with autoscaling enabled as well.

Tutorials

Shout out

To be able to build such a scalable and available system, we needed a top-quality hardware provider. We wanted to use this as an opportunity to shout out Ori Global Cloud, a key partner in this journey, to enable a serverless Kubernetes platform for AI inference at scale. Ori Serverless Kubernetes is an infrastructure service that combines powerful scalability, simple management, and affordability to help AI-focused startups realize their wildest AI ambitions. Reach out to Ori for exclusive GPU cloud deals!

Asks

Our pricing is transparent and can be found here: https://console.ncompass.tech/public-pricing

Other Company Launches

nCompass Technologies: Realtime audio denoising

nCompass's newest model is a realtime audio denoiser that can remove voices from the background of audio streams.
Read Launch ›

nCompass Technologies - Low-latency deployment of AI models made easy

nCompass is an API that requires only one-line-of-code to integrate low latency versions of open-source/custom models into your AI pipeline.
Read Launch ›