Replicate

Run machine learning models in the cloud

Engineering Manager - Models

$230K - $300K
Location
Remote - United States / San Francisco, CA, US / Remote (US; CA)
Job Type
Full-time
Experience
11+ years
Apply to Replicate and hundreds of other fast-growing YC startups with a single profile.
Apply to role ›

About the role

Replicate makes it easy for software engineers to run and customize machine learning models in the cloud. With a library of thousands of open-source models, you can get started with one line of code—or fine-tune and deploy your own models when you need something custom. We handle the infrastructure, so you can focus on building. Our team comes from places like Docker, GitHub, and NVIDIA, and we’re obsessed with making AI as intuitive as deploying a web app. We build in public, ship fast, and care about getting the details right.

The models team at Replicate keeps our public model library stocked with all the latest generative AI models. We make sure the most popular models are fast, reliable, and easy to use. We also add features to models — things people ask for and things they didn’t know they needed.

We’re looking for an engineering manager to help guide this team of 6–8 engineers working where open-source AI meets high-performance computing. You’ll grow and support the team, shape technical strategy, and stay hands-on with the work. The team focuses on three things:

  1. Turn research into APIs. We make it easy and fast to package models with cog and run them on Replicate.

  2. Make models faster. CUDA, quantization, parallelism — we use whatever works to make models faster and cheaper to run.

  3. Build new model features. This is the creative part. This could mean making video models trainable, adding capabilities like inpainting, outpainting, or ControlNet-style conditioning to the latest model drops, or inventing novel ways to use models that capture attention and unlock new value.

We’re deeply committed to open source. We don’t just build for Replicate — we share what we build with the community. That might mean contributing upstream, open sourcing internal tools, or writing about what we’ve learned.

About you

You’re excited about models, model performance, and AI infrastructure. You’ve led engineering teams, but you still like writing code. You’re comfortable guiding a group of strong engineers, setting technical direction, and solving hard problems alongside them. You care about open source and like collaborating with the AI community.

What you’ll be doing

  • Leading and growing a team focused on deploying, packaging, optimizing, and improving generative models.

  • Building tools and workflows to help model creators ship their work on Replicate.

  • Pushing the limits of model performance through quantization, algorithmic improvements, custom CUDA kernels, and other algorithmic optimizations.

  • Experimenting with creative ideas to make models more useful and powerful.

  • Helping set the direction for short-term projects and long-term bets.

  • Encouraging open-source contributions and contributing yourself.

You should apply if...

  • You’re obsessed with optimizations, performance, and measurements for language or media machine-learning models.

  • You’ve worked with open-source model ecosystems and want to make them better.

  • You’ve led teams before but still enjoy doing technical work.

  • You’re familiar with model inference performance and how to optimize it.

  • You want to make generative AI tools more accessible to developers and creators.

  • You’re active in the generative AI or open-source infrastructure community.

This is a chance to work on some of the most interesting problems in AI infrastructure while contributing to and collaborating with the open-source communities that make it all possible.

This role can be remote anywhere in the US (or other countries that align with US time zones) or in-person. If you're local to the Bay Area, we would like you to work out of our San Francisco office at least 3 days a week.

About Replicate

What we're doing

Machine learning can now do some extraordinary things: it can understand the world, drive cars, write code, make art.

But, it is still extremely hard to use. Research is typically published as a PDF, with scraps of code on GitHub and weights on Google Drive (if you’re lucky!). It is near-impossible to take that work and apply it to a real-world problem, unless you are an expert.

We’re making machine learning accessible to everyone. People creating machine learning models should be able to share them in a way that other people can use, and people who want to use machine learning should be able to do it without getting a PhD.

With great power also comes great responsibility. We believe that with better tools and safeguards, we will make this powerful technology safer and easier to understand.

How we work

We're a bunch of hackers, engineers, researchers, and artists.

We obsess about the details of API design and the right words for things. We're defining how AI works so we'd better get it right.

We make fast and reliable infrastructure. That's what a good infrastructure product is. We're not afraid to build things from scratch to make it the fastest.

We use AI for work. We use AI for play. We find unexplored parts of the map and create new techniques ourselves. We open-source it all.

We build in public, for the community. We want AI to work like open-source software so everyone benefits from it.

We're led by engineers. We all write code. (Or, we get ChatGPT to help.) There aren’t any meetings about meetings.

We've worked at places like Docker, Dropbox, GitHub, Heroku, NVIDIA, Scale AI, and Spotify. We've created technologies like Docker Compose and OpenAPI.

We're here to build a big company. We're ambitious and hard-working. We're not here to just build nice things.

Replicate
Founded:2019
Batch:W20
Team Size:27
Status:
Active
Location:San Francisco
Founders
Ben Firshman
Ben Firshman
Founder
Andreas Jansson
Andreas Jansson
Founder