About the Role

We are looking for an experienced engineer to lead our large-scale data processing efforts. In this role, you will be responsible for designing, building, and maintaining robust distributed systems that process terabytes of image and video data used to train state-of-the art generative models.

Key Responsibilities

Design, implement, and optimize complex data processing pipelines responsible for ingesting and transforming large media datasets.
Manage containerized applications on Kubernetes; deploy and scale distributed systems leveraging Ray to process tasks and orchestrate compute workloads.
Implement and deploy state-of-the-art ML models for data cleaning, processing, and preparation
Ensure data quality, diversity, and proper annotation (including captioning) for training readiness
Work closely in the model development loop to update data as necessitated by the training trajectory

Ideal Experiences

Deep understanding of Python and various file systems for data intensive manipulation and analysis
Demonstrable experience deploying, managing, and scaling containerized applications on Kubernetes clusters.
Hands-on experience with distributed computing engines such as Ray, including task scheduling, fault tolerance, and resource management.
Experience with image and video processing libraries (e.g., OpenCV, FFmpeg)
Experience working with large image/video datasets, including efficient data handling, transformation, and feature extraction.
Familiarity with data annotation and captioning processes for ML training datasets

Dream3D

Member of the Technical Staff - Image / Video Data Engineer

About the role

About the Role

Key Responsibilities

Ideal Experiences

About Dream3D