HomeCompaniesFelafax

Felafax

Building AI Infra for non-NVIDIA GPUs

Felafax is building AI infra for non-NVIDIA GPUs. With our ML experience from Google and Meta, we built a new AI stack that is 2x more cost-efficient and performant without needing Nvidia’s CUDA.
Felafax
Founded:2024
Team Size:2
Location:San Francisco
Group Partner:David Lieb

Active Founders

Nithin Sonti, Founder

Building AI Infra for non-NVIDIA GPUs and democratizing large-scale AI training! Previously, ML Engineer at Google/Youtube and NVIDIA.
Nithin Sonti
Nithin Sonti
Felafax

Nikhil Sonti, Founder

Building AI Infra for non-NVIDIA GPUs. Previously, spent over 6 years at Meta, worked on ML inference infra for serving ranking models for Newsfeed, Reels, and Watch. Before that, worked at Microsoft for 3 years.
Nikhil Sonti
Nikhil Sonti
Felafax

Company Launches

TL;DR: We are building an open-source AI platform for non-NVIDIA GPUs. Today, we are launching one of the pieces, a seamless UI to spin up a TPU cluster of any size and providing an out-of-box notebook to fine-tune LLaMa 3.1 models. Try us at felafax.ai or check out our github!

👋 Introduction

Hi everyone, we're Nikhil and Nithin, twin brothers behind Felafax AI. Before this, we spent half a decade at Google and Meta building AI infrastructure. Drawing on our experience, we are creating an ML stack from the ground up. Our goal is to deliver high performance and provide an easy workflow for training models on non-NVIDIA hardware like TPU, AWS Trainium, AMD GPU, and Intel GPU.

🧨 The Problem

  • The ML ecosystem for non-NVIDIA GPUs is underdeveloped. However, alternative chipsets like Google TPUs offer a much better price-to-performance ratio; TPUs are 30% cheaper to use.
  • The cloud layer for spinning up AI workloads is painful. Training requires installing the right low-level dependencies (infamous CUDA errors), attaching persistent storage, waiting 20 minutes for the machine to boot up… the list goes on.
  • Models are getting bigger (like Llama 405B) and don't fit on a single GPU, requiring complex multi-GPU orchestration.

🥳 The Solution

Today, we're launching a cloud layer to make it easy to spin up AI training clusters of any size, from 8 TPU cores to 2048 cores. We provide:

  • Effortless Setup: Out-of-the-box templates for PyTorch XLA and JAX to get you up and running quickly.
  • LLaMa Fine-tuning, Simplified: Dive straight into fine-tuning LLaMa 3.1 models (8B, 70B, and 405B) with pre-built notebooks. We've handled the tricky multi-TPU orchestration for you.

In the coming weeks, we will also launch our open-source AI platform built on top of JAX and OpenXLA (an alternative to NVIDIA's CUDA stack). We will support AI training across a variety of non-NVIDIA hardware (Google TPU, AWS Trainium, AMD and Intel GPU) and offer the same performance as NVIDIA at 30% lower cost. Follow us on Twitter, LinkedIn and Github or updates!

🙏 How You Can Help

  1. Try our seamless cloud layer for spinning up VMs for AI training – you get $200 credits to start off - app.felafax.ai
  2. Try fine-tuning LLaMa 3.1 models for your use case.
  3. If you are an ML startup or an enterprise that would like a seamless platform for your in-house ML training, reach out to us (calendar).