AI lipsync tool for video content creators
sync. is a team of artists, engineers, and scientists building foundation models to edit and modify people in video. Founded by the creators of Wav2lip and backed by legendary investors, including YC, Google, and visionaries Nat Friedman and Daniel Gross, we've raised 6 million dollars in our seed round to evolve how we create and consume media.
Within months of launch our flagship lipsync API scaled to millions in revenue and powers video translation, dubbing, and dialogue replacement workflows for thousands of editors, developers, and businesses around the world.
That's only the beginning, we're building a creative suite to give anyone Photoshop-like control over humans video – zero-shot understanding and fine-grained editing of expressions, gestures, movement, identity, and more.
Everyone has a story to tell, but not everyone's a storyteller – yet. We're looking for talented and driven individuals from all backgrounds to build inspired tools that amplify human creativity.
About the role
We're looking for an exceptional Research Scientist to help us build generative video models that understand and modify humans in video. You'll work directly with the creators of Wav2lip, pushing the boundaries of what's possible in fine-grained video control and video understanding.
What you'll work on
Design and scale cutting-edge generative models that can seamlessly control and edit different attributes of humans in video
Pioneer new architectures for zero-shot video representation learning
Explore novel problems and capabilities to push the limits of "what is possible"
Tackle core challenges in precise video editing that large generative models struggle with
Build primitives that capture and express human idiosyncrasies
What you'll need
3+ years research experience in generative AI, computer vision, or deep learning
Deep expertise in training generative models in PyTorch
Curiosity and the drive to understand the unknown
Proven record of implementing and adapting cutting-edge papers
Proven ability to take research from concept to production
Track record of breakthrough technical innovation
Preferred qualifications
Expertise in recent SOTA generative architectures (e.g. Diffusion)
History working with face/human editing in video
Notable publications or open-source contributions
Expertise in model optimization and scalability
Our goal is to keep the team lean, hungry, and shipping fast.
These are the qualities we embody and look for:
[1] Raw intelligence: we tackle complex problems and push the boundaries of what's possible.
[2] Boundless curiosity: we're always learning, exploring new technologies, and questioning assumptions.
[3] Exceptional resolve: we persevere through challenges and never lose sight of our goals.
[4] High agency: we take ownership of our work and drive initiatives forward autonomously.
[5] Outlier hustle: we work smart and hard, going above and beyond to achieve extraordinary results.
[6] Obsessively data-driven: we base our decisions on solid data and measurable outcomes.
[7] Radical candor: we communicate openly and honestly, providing direct feedback to help each other grow.
at sync. we're making video as fluid and editable as a word document.
how much time would you save if you could record every video in a single take?
no more re-recording yourself because you didn't like what you said, or how you said it.
just shoot once, revise yourself to do exactly what you want, and post. that's all.
this is the future of video: AI modified >> AI generated
we're playing at the edge of science + fiction.
our team is young, hungry, uniquely experienced, and advised by some of the greatest research minds + startup operators in the world. we're driven to solve impossible problems, impossibly fast.
our founders are the original team behind the open sourced wav2lip — the most prolific lip-sync model to date w/ over 9k+ GitHub stars.
we’re at a stage today in computer vision where we were w/ NLP two years ago — have a bunch of disparate, specialized models (eg. Sentiment classification, translation, summarization, etc), but LLMs (a generalized large language model) displaced them.
we’re taking the same approach – curating high quality datasets + training a series of specialized models to accomplish specific tasks, while building up to towards a more generalized approach for one model to rule them all.
post batch our growth is e^x – we need help asap to scale up our infra, training, and product velocity.
we look for the following: [1] raw intelligence [2] boundless curiosity [3] exceptional resolve [4] high agency [5] outlier hustle