HomeCompaniessync.

sync.

AI lipsync tool for video content creators

at sync. we're making video as fluid and editable as a word document how much time would you save if you could *record every video in a single take?* no more re-recording yourself because you didn't like what you said, or how you said it. just shoot once, revise yourself to do exactly what you want, and post. that's all. this is the future of video: *AI modified >> AI generated* we're playing at the edge of science + fiction. our team is young, hungry, uniquely experienced, and advised by some of the greatest research minds + startup operators in the world. we're driven to solve impossible problems, impossibly fast. our founders are the original team behind the open sourced wav2lip — the most prolific lip-sync model to date w/ over 10k+ GitHub stars. we [1] train state-of-the-art generative models, [2] productize + host them for scale [3] grow virally through content [4] upsell into enterprise

Jobs at sync.

San Francisco
$165K - $250K
0.10% - 1.30%
6+ years
Bengaluru
₹1M - ₹10M INR
0.10% - 0.50%
3+ years
San Francisco
$130K - $200K
0.30% - 1.00%
6+ years
Bengaluru
₹1M - ₹10M INR
0.15% - 0.75%
3+ years
San Francisco
$165K - $240K
0.20% - 1.20%
6+ years
sync.
Founded:2023
Team Size:13
Location:San Francisco
Group Partner:Nicolas Dessaigne

Active Founders

Prady Modukuru, Founder

ceo & cofounder at sync. labs | product engineer obsessed w/ networks of people + products. before startups, I helped incubate, launch, and scale AI-powered cybersecurity products at Microsoft impacting over 500M consumers + $1T worth of publicly traded companies – and became the youngest product leader in my org.
Prady Modukuru
Prady Modukuru
sync.

Prajwal K R, Founder

Co-founder & Chief Scientist at sync. labs. Ph.D. from University of Oxford with Prof. Andrew Zisserman. Authored multiple breakthrough research papers (incl. Wav2Lip) on understanding and generating humans in video.
Prajwal K R
Prajwal K R
sync.

Rudrabha Mukhopadhyay, Founder

I am the co-founder and CTO of Sync Labs. At Sync, We’re building audio-visual models to understand, modify, and synthesize humans in video. I am one of the primary authors of Wav2Lip, one of the most prolific lip syncing models in the world published in 2020. I have done my PhD at IIIT Hyderabad on Audio-visual deep learning and have been involved in several important projects in the community.
Rudrabha Mukhopadhyay
Rudrabha Mukhopadhyay
sync.

Pavan Reddy, Founder

Driving Sales/Operations and Finance/ Strategic roadmap at sync. 2x VC backed entrepreneur. IIT Madras Alumnus. Worked with IIIT Hyderabad in productising research work as per market orientation. Key strength is connecting dots/ identifying patterns across different fields to unlock the value
Pavan Reddy
Pavan Reddy
sync.

Company Launches

TL;DR: we’ve built a state-of-the-art lip-sync model – and we’re building towards real-time face-to-face conversations w/ AI indistinguishable from humans 🦾

try our playground here: https://app.synclabs.so/playground

how does it work?

theoretically, our models can support any language — they learn phoneme / viseme mappings (the most basic unit / “token” of how sounds we make map to the shapes our mouths make to create them). it’s simple, but a start towards learning a foundational understanding of humans from video.

why is this useful?

[1] we can dissolve language as a barrier

check out how we used it to dub the entire 2-hour Tucker Carlson interview with Putin speaking fluent English.

imagine millions gaining access to knowledge, entertainment, and connection — regardless of their native tongue.

realtime at the edge takes us further — live multilingual broadcasts + video calls, even walking around Tokyo w/ a Vision Pro 2 speaking English while everyone else Japanese.

[2] we can move the human-computer interface beyond text-based-chat

keyboard / mice are lossy + low bandwidth. human communication is rich and goes beyond just the words we say. what if we could compute w/ a face-to-face interaction?

maybe embedding context around expressions + body language in inputs / outputs would help us interact w/ computers in a more human way. this thread of research is exciting.

[3] and more

powerful models small enough to run at the edge could unlock a lot:

eg.

extreme compression for face-to-face video streaming

enhanced, spatial-aware transcription w/ lip-reading

detecting deepfakes in the wild

on-device real-time video translation

etc.

who are we?

Prady Modukuru [CEO] | Led product for a research team at Microsoft that made Defender a $350M+ product, took MSR research into production moving it from bottom of market to #1 in industry evals.

Rudrabha Mukhopadhyay [CTO] | PhD CVIT @ IIIT-Hyderabad, co-authored wav2lip / 20+ major publications + 1200+ citations in the last 5 years.

Prajwal K R [CRO] | PhD, VGG @ University of Oxford, w/ Andrew Zisserman, prev. Research Scientist @ Meta, authored multiple breakthrough research papers (incl. Wav2Lip) on understanding and generating humans in video

Pavan Reddy [COO/CFO] 2x venture-backed founder/operator, built the first smart air purifier in India, prev. monetizing sota research @ IIIT-Hyderabad, engineering @ IIT Madras

how did we meet?

Prajwal + Rudrabha worked together at IIIT-hyderabad — and became famous by shipping the world’s first model that could sync the lips in a video to any audio in the wild in any language, no training required.

they formed a company w/ Pavan and then worked w/ the university to monetize state-of-the-art research coming out of the labs and bring it to market.

Prady met everyone online — first by hacking together a viral app around their open source models, then collaborating on product + research for fun, to cofounding sync. + going mega-viral.

Since then we’ve hacked irl across 4 different countries, across the US coasts, and moved into a hacker house in SF together.

what’s our ask?

try out our playground and API and let us know how we can make it easier to understand and simpler to use 😄

play around here: https://app.synclabs.so/playground

Company Photo

Company Photo

Hear from the founders

How did your company get started? (i.e., How did the founders meet? How did you come up with the idea? How did you decide to be a founder?)

**how did we meet?**Prajwal + Rudrabha worked together at IIIT-hyderabad — and became famous by shipping the world’s first model that could sync the lips in a video to any audio in the wild in any language, no training required.They formed a company w/ Pavan and then worked w/ the university to monetize state-of-the-art research coming out of the labs and bring it to market.Prady met everyone online — first by hacking together a viral app around their open source models, then collaborating on product + research for fun, to cofounding sync. + going mega-viral.Since then we’ve hacked irl across 4 different countries, across the US coasts, and moved into a hacker house in SF together.

Selected answers from sync.'s original YC application for the W24 Batch

Describe what your company does in 50 characters or less.

lipsync video to audio in any language in one-shot

How long have each of you been working on this? How much of that has been full-time? Please explain.

Prady + Pavan have been full-time on sync since June 2023

Rudrabha has been contributing greatly while finishing his PhD + joined full-time starting October 2023

Prajwal is finishing up his PhD and is joining fulltime once he completes in May of 2024 – his supervisor is Professor Andrew Zisserman (190+ citations / foremost expert in the field we are playing in. His proximity helps us stay sota + learn from the bleeding edge.

What is your company going to make? Please describe your product and what it does or will do.

we're building generative models to modify / synthesize humans in video + hosting production APIs to let anyone plug them into their own apps / platforms / services.

today we're focused on visual dubbing – we built + launched an updated lip-synchronizing model to let anyone lip-sync a video to an audio in any language in near real-time for HD videos.

as part of the AI translation stack we're used as a post processing step to sync the lips in a video to the new dubbed audio track – this lets everyone around the world experience content like it was made for them in their native language (no more bad / misaligned dubs).

in the future we plan to build + host a suite of production ready models to modify + generate a full-human body digitally in video (ex. facial expressions, head + hand + eye movements, etc.) that can be used for anything from seamless localization of content (cross-language) to generative videos

YC W24 Application Video

YC W24 Demo Day Video