Guide Labs: Interpretable foundation models

Foundation models that can explain their reasoning, and are easy to align

Julius Adebayo

a year ago

https://www.guidelabs.ai/

At Guide Labs, we build interpretable foundation models that can reliably explain their reasoning and are easy to align.

The Problem: foundation models are black-boxes and difficult to align

Current transformer-based large language models (LLMs) and diffusion generative models are largely inscrutable and do not provide reliable explanations for their output. In medicine, lending, and drug discovery, it is not enough to only provide an answer; domain experts would also like to know why the model arrived at its output.

Current foundation models don’t explain their outputs. Would you trust a black-box model to propose medications for your illness or decide whether you should get a job interview?
You can't debug a system you don't understand: When you call a model API and the response is incorrect, what do you do? Change the prompt? What part of your prompt should you change? Switch to a new model API?
Difficult to reliably align or control model outputs: Even when you've identified the cause of the problem. How do you control the model so that it no longer makes the mistake you identified?

Our Solution

We’ve developed interpretable foundation models that can explain their reasoning, and are easy to align.

These models:

provide human-understandable explanations;
indicate what part of the prompt is important; and,
specify which tokens led to the model's output.

Using all these explanations, we can:

identify the part of the prompt that causes the model to err;
isolate the samples that cause those errors; and,
use explanations to control and align the model to fix its errors.

About Us

We are interpretability researchers and engineers that have been responsible for major advances in the field. We have set out to rethink the way machine learning models, AI agents, and systems broadly are developed.

Our Ask

Please reach out to us at info@guidelabs.ai.