Langfuse

Open source LLM engineering platform

W23

Active

artificial-intelligence

Open source LLM engineering platform

Traces, evals, prompt management and metrics to debug and improve your LLM application. Onboard via https://langfuse.com Langfuse helps you build and improve LLM applications across the entire lifecycle: - Develop: Observability, Langfuse UI & Prompt Management - Monitor: Traces, Analytics, Metrics & Evaluations - Test: Experiments, Releases & Datasets We are hiring: https://langfuse.com/careers

Active Founders

Marc Klingen

Co-Founder & CEO

Maximilian Deichmann

Founder

Clemens Rawert

Founder

🤯 The Problem

Building production-grade LLM applications is challenging because of the probabilistic nature of LLMs and the multiple layers of scaffolding required to get complex workflows into production.
Developers need to debug their applications because of increasingly complex abstractions like chains, agents with tools, and advanced prompts. Understanding how an application executes and identifying the root causes of problems can be arduous. Additionally, monitoring costs and latencies is crucial since LLMs can incur high inference expense and take time to respond to users, making it important to track model usage and costs across applications.

Assessing the quality of LLM outputs also poses challenges. Outputs may be inaccurate, unhelpful, poorly formatted, or hallucinated, complicating the process of ensuring reliability and accuracy. Quickly identifying and debugging issues in complex LLM applications is essential but often difficult. Furthermore, building high-quality datasets for fine-tuning and testing requires capturing the full context of LLM executions..

✅ Our Solution

https://www.youtube.com/watch?v=2E8iTvGo9Hs

Langfuse addresses these challenges by providing an open-source platform to debug and improve LLM applications.
Langfuse captures the full context of your application, tracing the complete execution flow—including API calls, retrieved context, prompts, parallelism, and more. By enabling hierarchical representations through nested traces, Langfuse helps you understand complex logic built around LLM calls. Langfuse also offers full multi-modal support, including audio, images, and attachments.

Langfuse measures cost and latency, breaking down metrics by user, session, feature, model, and prompt version, allowing for detailed analysis. To assess output quality, Langfuse facilitates the collection of user feedback, performs automated LLM-as-a-judge evaluations, and supports manual data labeling within the platform. It also offers prompt management features, allowing you to handle prompts effectively and perform prompt experiments over new ideas and systematically track success.

For testing and experimentation, Langfuse supports versioning your application and running tests of expected inputs and outputs via curated datasets. This provides quantitative insights into the impact of changes, helping you understand and improve your LLM applications more effectively.

🎬 Getting Started (Tracing OpenAI with Langfuse):

Below is a brief example highlighting how you can integrate with Langfuse. You can also try out Langfuse through our interactive live demo or our walkthrough video - more here.

(Not using OpenAI? Langfuse can be used with any model or framework through our Python Decorator and JS-TS SDK. Langfuse also natively integrates with popular frameworks such as Langchain, LlamaIndex, LiteLLM and more. )

Step 1: Create a New Project in Langfuse

Sign up for Langfuse Cloud or self-host Langfuse OSS.
Create a new project within Langfuse.
Generate API credentials via the project settings.

Step 2: Log Your First LLM Call to Langfuse

The @observe() decorator makes it easy to trace any Python LLM application. In this quickstart, we use the Langfuse OpenAI integration to automatically capture all model parameters such as cost and token usage.

%pip install langfuse openai

# Get keys for your project from the project settings page https://cloud.langfuse.com
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-..."
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-..."
os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com" # 🇪🇺 EU region
# os.environ["LANGFUSE_HOST"] = "https://us.cloud.langfuse.com" # 🇺🇸 US region

from langfuse.openai import openai # OpenAI integration
from langfuse.decorators import observe

@observe() # Langfuse decorator
def story():
    return openai.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
          {"role": "system", "content": "You are a great storyteller."},
          {"role": "user", "content": "Once upon a time in a galaxy far, far away..."}
        ],
    ).choices[0].message.content
 
@observe()
def main():
    return story()
 
main()

Step 3: See your Traces in Langfuse

Log into the Langfuse UI to view the created trace. You can now take it further by managing your prompts through Langfuse or by starting to test or evaluate your LLM executions (more below).

See this example trace in the Langfuse UI: https://cloud.langfuse.com/project/cloramnkj0002jz088vzn1ja4/traces/fac231bc-90ee-490a-aa32-78c4269474e3