HomeCompaniesTrainLoop

Reasoning Fine-Tuning

TrainLoop makes it effortless for developers to supercharge LLM performance through reinforcement learning..
TrainLoop
Founded:2025
Team Size:2
Status:
Active
Location:San Francisco
Group Partner:Michael Seibel
Active Founders

Jackson Stokes, Founder

Cofounder at TrainLoop; I optimize language models. Previously, I worked on kernel level optimizations for Gemini at Google, and framework level optimizations on video models at Google Research.
Jackson Stokes
Jackson Stokes
TrainLoop

Mason Pierce, Cofounder / CTO

My background is in applying AI models to business use-cases and have always found fine-tuning to be way harder than it should be - My mission is to unlock fine-tuning for all builders :)
Mason Pierce
Mason Pierce
TrainLoop
Company Launches
TrainLoop: Unlock Next-Level Reasoning through Fine-Tuning
See original launch post ›

Unreliable RAG or code generation? We can help.

Reasoning models have been all the rage lately because they beat generic benchmarks. The problem is that your business isn’t a generic benchmark - it’s a set of specific vertical tasks like codegen, compliance, legal or healthcare. Massive companies like Google and OpenAI have internal tools to train their models, but those aren’t available to the people that need it: the developers deploying these models into production.

We’ve personally been involved on both sides: Jackson optimized the Gemini models at Google and Mason hit the limits of off shelf models while leading engineering at Second (YC W23).

So we created TrainLoop, packaging the same RL techniques big AI labs use into an accessible platform. Our process is three simple steps:

  1. Data Curation: Our lightweight SDK (just three lines of code) gathers training signals from actual usage.
  2. Training: We build a reward model that teaches your LLM what output you prefer.
  3. Inference: Deploy automatically and call your model via standard APIs.

https://youtu.be/XhbxHOzsxRE

Ready to Level Up Your Model?

It’s time to move past “prompt-hell” and unreliable outputs. Join our alpha to make your language model an expert in your business and unlock production-ready performance.