nCompass Technologies - Low-latency deployment of AI models made easy

nCompass is an API that requires only one-line-of-code to integrate low latency versions of open-source/custom models into your AI pipeline.

Aditya Rajagopal

nCompass Technologies

a year ago

https://www.ncompass.tech

#ai#api#hardware#open_source#cloud_computing

tl;dr If unpredictable response times and rate limits of OpenAI are causing your tool’s user experience to suffer, nCompass allows you to effortlessly tap into the world of open-source AI models while ensuring that the served models meet your target budget and performance requirements.

—

Hey all, we are Diederik and Aditya, the co-founders of nCompass, a platform for simplified hosting and acceleration of open-source and custom LLMs.

The Problem

LLM-based products that use closed-source model providers like OpenAI suffer from slow response times and rate limits.

Open-source models are a great alternative, but hosting a model yourself is a lot of extra work and maintenance which distracts you from your core business.

Our solution

nCompass provides an API that allows you to integrate accelerated versions of any open-source or custom model of your choice into your AI pipeline. We support OpenAI style chat templates, work with all web frameworks, and have a time-based pricing model that results in a predictable compute cost for users.

How it works

We serve models to users with a simple 3-step process:

Select your desired open-source / custom model
Provide your performance requirements
Set a budget you are not willing to exceed

We set up the deployment that meets these requirements and provide you with a single API Key that you can then use to integrate the model with a single line of code.

We support any model currently hosted on Hugging Face, with some highlights being:

Mistral-7B : 160ms Time-To-First-Token @ 86 tok/s
Mixtral-8x7B : 300ms Time-To-First-Token @ 64 tok/s

Demo

https://www.youtube.com/watch?v=sdHVji8QGOg

Also, check out our GitHub repository for code examples.

The team

Since we met in undergrad (9 years ago) through to our PhDs at Imperial College London, we’ve worked on every project together. Our PhDs focused on hardware acceleration of large-scale machine learning models covering all levels of the stack from algorithms and compilers down to digital hardware design.

Asks

Book a demo
Warm intros to anyone you know who requires accelerated and/or hosted versions of open-source models.

Our emails are aditya.rajagopal@ncompass.tech and diederik.vink@ncompass.tech