Langfuse is the open-source LLM engineering platform designed to help teams build production-grade LLM applications faster. We started building Langfuse in SF during Y Combinator's W23 batch - just when GPT 4 was initially released.
Today, Langfuse is used by tens of thousands of developers and is one of the most popular LLMOps platforms globally.
Building production-grade LLM applications is challenging because of the probabilistic nature of LLMs and the multiple layers of scaffolding required to get complex workflows into production.
Developers need to debug their applications because of increasingly complex abstractions like chains, agents with tools, and advanced prompts. Understanding how an application executes and identifying the root causes of problems can be arduous. Additionally, monitoring costs and latencies is crucial since LLMs can incur high inference expense and take time to respond to users, making it important to track model usage and costs across applications.
Assessing the quality of LLM outputs also poses challenges. Outputs may be inaccurate, unhelpful, poorly formatted, or hallucinated, complicating the process of ensuring reliability and accuracy. Quickly identifying and debugging issues in complex LLM applications is essential but often difficult. Furthermore, building high-quality datasets for fine-tuning and testing requires capturing the full context of LLM executions..
https://www.youtube.com/watch?v=2E8iTvGo9Hs
Langfuse addresses these challenges by providing an open-source platform to debug and improve LLM applications.
Langfuse captures the full context of your application, tracing the complete execution flow—including API calls, retrieved context, prompts, parallelism, and more. By enabling hierarchical representations through nested traces, Langfuse helps you understand complex logic built around LLM calls. Langfuse also offers full multi-modal support, including audio, images, and attachments.
Langfuse measures cost and latency, breaking down metrics by user, session, feature, model, and prompt version, allowing for detailed analysis. To assess output quality, Langfuse facilitates the collection of user feedback, performs automated LLM-as-a-judge evaluations, and supports manual data labeling within the platform. It also offers prompt management features, allowing you to handle prompts effectively and perform prompt experiments over new ideas and systematically track success.
For testing and experimentation, Langfuse supports versioning your application and running tests of expected inputs and outputs via curated datasets. This provides quantitative insights into the impact of changes, helping you understand and improve your LLM applications more effectively.
Below is a brief example highlighting how you can integrate with Langfuse. You can also try out Langfuse through our interactive live demo or our walkthrough video - more here.
(Not using OpenAI? Langfuse can be used with any model or framework through our Python Decorator and JS-TS SDK. Langfuse also natively integrates with popular frameworks such as Langchain, LlamaIndex, LiteLLM and more. )
The @observe() decorator makes it easy to trace any Python LLM application. In this quickstart, we use the Langfuse OpenAI integration to automatically capture all model parameters such as cost and token usage.
%pip install langfuse openai
# Get keys for your project from the project settings page https://cloud.langfuse.com
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-..."
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-..."
os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com" # 🇪🇺 EU region
# os.environ["LANGFUSE_HOST"] = "https://us.cloud.langfuse.com" # 🇺🇸 US region
from langfuse.openai import openai # OpenAI integration
from langfuse.decorators import observe
@observe() # Langfuse decorator
def story():
return openai.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a great storyteller."},
{"role": "user", "content": "Once upon a time in a galaxy far, far away..."}
],
).choices[0].message.content
@observe()
def main():
return story()
main()
Log into the Langfuse UI to view the created trace. You can now take it further by managing your prompts through Langfuse or by starting to test or evaluate your LLM executions (more below).
See this example trace in the Langfuse UI: https://cloud.langfuse.com/project/cloramnkj0002jz088vzn1ja4/traces/fac231bc-90ee-490a-aa32-78c4269474e3
After you are set up in Langfuse you can now: