HUD is building evals for the drop-in worker
Sam Altman and Dario Amodei have both said that we’ll have virtual coworkers by the end of the year.
The issue is, current agents aren’t reliable enough to perform at a human-level at these tasks. We solve this by making it easy to build evals and run them at scale for top agent developers like a foundation lab (nda), Browser Use and Silverstream.
Our platform works by making it easy for domain-specific experts to create evals for their area of expertise. We're working with experts from every occupation from PMs to Accountants to Financial Analysts to make evals that the labs can then be used to evaluate and eventually train computer agents using RL.