back

Time: 5 minute read

Created: August 12, 2024

Author: Lina Lam

4 Essential Helicone Features to Optimize Your AI App's Performance

Helicone empowers AI engineers and LLM developers to optimize their applications' performance. This guide provides step-by-step instructions for integrating and making the most of Helicone’s features — available on all Helicone plans.

This blog post is for you if:

  • You're building or maintaining an AI application
  • You need to improve response times, reduce costs, or enhance reliability
  • You want data-driven insights to guide your optimization efforts
  • You're looking for practical, implementable solutions

4 essential features in Helicone to optimize your AI app

We will focus on the 4 essential Helicone features: custom properties, sessions, prompts, and caching, how each feature works, why it matters, and how to implement it in your development workflow. If you're ready to follow the practical steps, read on.


Getting started: integrating with 1-line of code

Whether you're prototyping or maintaining a production app, Helicone's one-line integration lets you focus on building, not configuring.

Integrating with any provider

Integrating Helicone with your AI app requires changing just a single line of code. The platform is compatible with various AI models and APIs - you can find the complete list of integrations in our docs.

By simply updating the base URL, you can easily switch between different models like GPT-4 and LLaMA. Here's an example:

# Before
baseURL = "https://api.openai.com/v1"

# Change the URL to use Helicone
baseURL = "https://oai.helicone.ai/v1"

1. Custom Properties to Segment Requests

Custom Properties helps you tailor the LLM analytics to your needs. Custom Properties lets you segment requests, allowing you to make more data-driven improvements and targeted optimizations.

In a nutshell,Custom Properties can be implemented using headers, and supports any type of custom metadata.

How it works

  1. Add custom headers to your requests using this format: Helicone-Property-[Name]: [value] where Nameis the name of your custom property. For example:

    headers = {
        "Helicone-Property-Session": "121",
        "Helicone-Property-App": "mobile",
        "Helicone-Property-Conversation": "support_issue_2"
    }
    

    More info about Custom Properties in the docs.

  2. Now you can segment incoming requests on the Dashboard, Requests, or Properties page. For example:

  • Filter for specific prompt chains on Requests page to analyze costs and latency. Filter by custom properties on Helicone's Request page
  • Analyze "unit economics" (e.g., average cost per conversation) on Properties page. Analyze unit economics using custom properties in Helicone
  • Filter for all requests that meet a criteria on Dashboard page. Segment requests and metrics by custom properties on Helicone's dashboard

Applications

  • Segment your requests by app versions to monitor and compare performance.
  • Segment your requests by free/paid users to better understand their behaviours and patterns.
  • Segment your requests by feature to understand usage pattern and optimize resource allocation.
  • For detailed instructions, check out the docs.

2. Sessions to Debug Complex Workflows

Helicone's Sessions feature allows developers to group and visualize multi-step LLM interactions, providing invaluable insights into complex AI workflows. This feature is available on all plans and is currently in beta.

In a nutshell, you can group related requests for a more holistic analysis. You can also track request flows across multiple traces.

How it works

  1. Add the following three headers to your requests.

    headers = {
        "Helicone-Session-Id": session_uuid,  # The session id you want to track
        "Helicone-Session-Path": "/abstract",  # The path of the session
        "Helicone-Session-Name": "Course Plan"  # The name of the session
    }
    
  2. Use the Helicone dashboard to visualize and analyze your sessions. For example:

  • Use case 1: Reconstruct conversation flows or multi-stage tasks in the Chat view Reconstruct conversation flows or multi-stage in chat view
  • Use case 2: Analyze performance across the entire interaction sequence in the Tree view Analyze performance across entire interaction sequences in tree view
  • Use case 3: Identify bottlenecks in your AI workflows in the Span view Identify bottlenecks in your AI workflows in span view
  • Use case 4: Gain deeper insights into user behavior with conversation context.

Applications

Imagine creating an AI app that creates a course outline.

const session = randomUUID();

openai.chat.completions.create(
  {
    messages: [
      {
        role: "user",
        content: "Generate an abstract for a course on space.",
      },
    ],
    model: "gpt-4",
  },
  {
    headers: {
      "Helicone-Session-Id": session,
      "Helicone-Session-Path": "/abstract",
      "Helicone-Session-Name": "Course Plan",
    },
  }
);

This setup allows you to track the entire course creation process, from abstract to detailed lessons, as a single session.

For developers working on applications with complex, multi-step AI interactions, Sessions provides a powerful tool for understanding and optimizing your AI workflows.

  • For detailed instructions, check out the docs.

3. Version, Experiment and Optimize Prompts

Helicone's Prompt Management feature offers developers a powerful tool to version, track, and optimize their AI prompts without disrupting their existing workflow. This feature is available on all plans and is currently in beta.

In a nutshell, Helicone will automatically version your prompt whenever it's modified in the codebase. You can run experiments using real-time data, and test prompts toprevent prompt regressions.

How it works

  1. Set up Helicone in proxy mode. Use one of the methods in the Starter Guide.

  2. Use the hpf (Helicone Prompt Format) function to identify input variables

    import { hpf } from "@helicone/prompts";
    
    ...
    
    content: hpf`Write a story about ${{ character }}`,
    
  3. Assign a unique ID to your prompt using a header. Here's the doc on Prompt Management & Experiments. For example:

    headers: {
      "Helicone-Prompt-Id": "prompt_story",
    },
    

Example implementation

import { hpf } from "@helicone/prompts";

const chatCompletion = await openai.chat.completions.create(
  {
    messages: [
      {
        role: "user",
        content: hpf`Write a story about ${{ character }}`,
      },
    ],
    model: "gpt-3.5-turbo",
  },
  {
    headers: {
      "Helicone-Prompt-Id": "prompt_story",
    },
  }
);

Benefits

  • Track prompt iterations over time
  • Maintain datasets of inputs and outputs for each prompt version
  • Easily run A/B tests on different prompt versions
  • Identify and rollback problematic changes quickly

Applications

Imagine you are developing a chatbot and want to improve its responses. With prompt management in Helicone, you can:

  1. Version different phrasings of your prompts
  2. Test these versions against historical data
  3. Analyze performance metrics for each version
  4. Deploy the best-performing prompt to production

The Prompt & Experiment feature is not only for developers to iterate and experiment with prompts, but also enables non-technical team members to participate in prompt design without touching the codebase.


4. Caching to Boost Performance and Cut Costs

Helicone's LLM Caching feature offers developers a powerful way to reduce latency and save costs on LLM calls. By caching responses on the edge, this feature can significantly improve your AI app's performance.

In a nutshell, Helicone's LLM Caching feature is available on all plans (up to 20 caches per bucket for non-enterprise plans), utilizes Cloudflare Workers for low-latency storage, and is configurable cache duration and bucket sizes.

How it works

  1. Enable caching with a simple header:

    headers = {
        "Helicone-Cache-Enabled": "true"
    }
    
  2. Customize caching behavior. For detailed description on how to configure the headers, visit the doc.

    headers = {
        "Helicone-Cache-Enabled": "true",
        "Cache-Control": "max-age=3600",  # 1 hour cache
        "Helicone-Cache-Bucket-Max-Size": "3",
        "Helicone-Cache-Seed": "user-123"
    }
    
  3. (optional) Extract cache status from response headers:

    cache_hit = response.headers.get('Helicone-Cache')
    cache_bucket_idx = response.headers.get('Helicone-Cache-Bucket-Idx')
    

Benefits

  • Faster response times for common queries
  • Reduced load on backend resources
  • Lower costs by minimizing redundant LLM calls
  • Insights into frequently accessed data

Applications

Imagine you're building a customer support chatbot. With LLM Caching:

  1. Common questions get instant responses from cache
  2. You save on API costs for repetitive queries
  3. Your app maintains consistent responses for similar inputs
  4. You can analyze cache hits to identify popular topics

Example implementation

client = OpenAI(
    api_key="<OPENAI_API_KEY>",
    base_url="https://oai.helicone.ai/v1",
    default_headers={
        "Helicone-Auth": f"Bearer <API_KEY>",
        "Helicone-Cache-Enabled": "true",
        "Cache-Control": "max-age=2592000",
        "Helicone-Cache-Bucket-Max-Size": "3",
    }
)

chat_completion_raw = client.chat.completions.with_raw_response.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello world!"}]
)

cache_hit = chat_completion_raw.http_response.headers.get('Helicone-Cache')
print(f"Cache status: {cache_hit}")  # Will print "HIT" or "MISS"

LLM Caching provides a faster response time and valuable insights into usage patterns, allowing developers to further refine their AI systems.

  • For detailed instructions, check out the docs.

Conclusion

If you are an building with AI, LLM observability tools can help you:

  • ✅ Improve your AI model output
  • ✅ Reduce latency and costs
  • ✅ Gain deeper insights into your user interactions
  • ✅ Debug AI development workflows

Helicone aims to provide all the essential tools to help you make the right improvements and deliver better AI experiences. Interested in checking out other features? Here's a list of headers to get you started. Happy optimizing!

You might be interested in:


Questions or feedback?

Are the information out of date? Please raise an issue or contact us, we'd love to hear from you!