We were both engineers at Tesla when one day, we had a fun idea to make a YouTube video of Cybertrucks in Palo Alto. The idea was simple: we’d record cars going by and for each Cybertruck we counted, we’d buy a share of Tesla stock. It had all the makings of a viral video, but after recording, we came back to hours of raw footage of cars driving by. It quickly dawned on us that we’d have to scrub through this entire raw footage to count the Cybertrucks. As we began editing, we experienced firsthand the frustration of trying to accomplish simple tasks in video editors like DaVinci Resolve and Adobe Premiere Pro. Features are hidden behind menus, buttons, and icons, and we often found ourselves Googling or asking ChatGPT how to do certain edits. We thought that now — with multimodal AI — we could accelerate some of this process. And better yet, if the AI is built into your video editor, it could automatically apply edits based off what it sees and hears in your video. The idea quickly snowballed and we began our side quest to build the Cursor for Video Editing.
We spent the first month of YC building our entire video editor and making our multimodal chat copilot powerful and stateful. But after talking to users, we quickly realized that the chat interface has some limitations for video, primarily because of chat’s inherent prompt-response sequential UX. We spent month 2 of YC going back to first principles to create a new, agentic paradigm for video editing. The result: a node-based canvas which enables you to create and run your own multimodal video editing agents. The idea is that canvas will run your video editing on autopilot, and get you 80-90% of the way there. Then, you can go into the editor & chat with your video for the final touches and polish.