LangSmith Walkthrough

LangChain makes it easy to prototype LLM applications and Agents. However, delivering LLM applications to production can be deceptively difficult. You will have to iterate on your prompts, chains, and other components to build a high-quality product.

LangSmith makes it easy to debug, test, and continuously improve your LLM applications.

When might this come in handy? You may find it useful when you want to:

Quickly debug a new chain, agent, or set of tools
Create and manage datasets for fine-tuning, few-shot prompting, and evaluation
Run regression tests on your application to confidently develop
Capture production analytics for product insights and continuous improvements

Prerequisites

Create a LangSmith account and create an API key (see bottom left corner). Familiarize yourself with the platform by looking through the docs

Note LangSmith is in closed beta; we're in the process of rolling it out to more users. However, you can fill out the form on the website for expedited access.

Now, let's get started!

Log runs to LangSmith

First, configure your environment variables to tell LangChain to log traces. This is done by setting the LANGCHAIN_TRACING_V2 environment variable to true. You can tell LangChain which project to log to by setting the LANGCHAIN_PROJECT environment variable (if this isn't set, runs will be logged to the default project). This will automatically create the project for you if it doesn't exist. You must also set the LANGCHAIN_ENDPOINT and LANGCHAIN_API_KEY environment variables.

For more information on other ways to set up tracing, please reference the LangSmith documentation.

However, in this example, we will use environment variables.

npm install @langchain/openai @langchain/community langsmith uuid

import { v4 as uuidv4 } from "uuid";
const uniqueId = uuidv4().slice(0, 8);
process.env.LANGCHAIN_TRACING_V2 = "true";
process.env.LANGCHAIN_PROJECT = `JS Tracing Walkthrough - ${uniqueId}`;
process.env.LANGCHAIN_ENDPOINT = "https://api.smith.langchain.com";
process.env.LANGCHAIN_API_KEY = "<YOUR-API-KEY>"; // Replace with your API key

// For the chain in this tutorial
process.env.OPENAI_API_KEY = "<YOUR-OPENAI-API-KEY>";
// You can make an API key here: https://app.tavily.com/sign-in
process.env.TAVILY_API_KEY = "<YOUR-TAVILY-API-KEY>";

Create the langsmith client to interact with the API

import { Client } from "langsmith";

const client = new Client();

Create a LangChain component and log runs to the platform. In this example, we will create an OpenAI function calling agent with access to a general search tool (Tavily). The agent's prompt can be viewed in the Hub here.

import { AgentExecutor, createOpenAIFunctionsAgent } from "langchain/agents";
import { pull } from "langchain/hub";
import { TavilySearchResults } from "@langchain/community/tools/tavily_search";
import { ChatOpenAI } from "@langchain/openai";
import type { ChatPromptTemplate } from "@langchain/core/prompts";

const tools = [new TavilySearchResults()];

// Get the prompt to use - you can modify this!
// If you want to see the prompt in full, you can at:
// https://smith.langchain.com/hub/hwchase17/openai-functions-agent
const prompt = await pull<ChatPromptTemplate>(
  "hwchase17/openai-functions-agent"
);

const llm = new ChatOpenAI({
  model: "gpt-3.5-turbo-1106",
  temperature: 0,
});

const agent = await createOpenAIFunctionsAgent({
  llm,
  tools,
  prompt,
});

const agentExecutor = new AgentExecutor({
  agent,
  tools,
});

You can run the executor on multiple inputs concurrently, reducing latency. The runs are logged to LangSmith in the background.

// Same input structure that our declared `agentExecutor` takes
const inputs = [
  { input: "What is LangChain?" },
  { input: "What's LangSmith?" },
  { input: "When was Llama-v2 released?" },
  { input: "What is the langsmith cookbook?" },
  { input: "When did langchain first announce the hub?" },
];

const results = await agentExecutor.batch(inputs);
console.log(results.slice(0, 2));

[
  {
    input: 'What is LangChain?',
    output: 'LangChain is a framework that allows developers to build applications with Language Model Systems (LLMs) through composability. It provides a set of tools and modules for building context-aware language model systems, including features for retrieval augmented generation, analyzing structured data, chatbots, and more. LangChain also offers the LangChain Expression Language (LCEL) to create custom chains and adapt language models to specific business contexts. It was launched as an open-source project in October 2022 and has gained popularity for its capabilities in the field of generative AI and language model integration. You can find more information about LangChain on their [official website](https://www.langchain.com/).'
  },
  {
    input: "What's LangSmith?",
    output: 'LangSmith is a unified platform designed to help developers with debugging, testing, evaluating, and monitoring chains and intelligent agents built on any LLM (Language Model) framework. It provides full visibility into model inputs and outputs, facilitates dataset creation from existing logs, and seamlessly integrates logging/debugging workflows with testing/evaluation workflows. LangSmith aims to bridge the gap between prototype and production, offering a single, fully-integrated hub for developers to work from. It also assists in tracing and evaluating complex agent prompt chains, reducing the time required for debugging and refinement. LangSmith is part of the LangChain ecosystem, which is an open-source framework for building with LLMs.'
  }
]

After setting up your environment, your agent traces should appear in the Projects section on the LangSmith app. Congratulations!

If the agent is not effectively using the tools, evaluate it to establish a baseline.

Evaluate the Chain

LangSmith allows you to test and evaluate your LLM applications. Follow these steps to benchmark your agent:

1. Create a LangSmith dataset

Use the LangSmith client to create a dataset with input questions and corresponding labels.

For more information on datasets, including how to create them from CSVs or other files or how to create them in the platform, please refer to the LangSmith documentation.

// Same structure as our `agentExecutor` output
const referenceOutputs = [
  {
    output:
      "LangChain is an open-source framework for building applications using large language models. It is also the name of the company building LangSmith.",
  },
  {
    output:
      "LangSmith is a unified platform for debugging, testing, and monitoring language model applications and agents powered by LangChain",
  },
  { output: "July 18, 2023" },
  {
    output:
      "The langsmith cookbook is a github repository containing detailed examples of how to use LangSmith to debug, evaluate, and monitor large language model-powered applications.",
  },
  { output: "September 5, 2023" },
];

const datasetName = `lcjs-qa-${uniqueId}`;
const dataset = await client.createDataset(datasetName);

await Promise.all(
  inputs.map(async (input, i) => {
    await client.createExample(input, referenceOutputs[i], {
      datasetId: dataset.id,
    });
  })
);

2. Configure evaluation

Manually comparing the results of chains in the UI is effective, but it can be time consuming. It can be helpful to use automated metrics and AI-assisted feedback to evaluate your component's performance.

Below, we will create a custom run evaluator that performs a simple check on the run output to see whether the LLM is sure about what it has generated. You can perform much more sophisticated custom checks as well, including calling out to external APIs.

import type { RunEvalType, DynamicRunEvaluatorParams } from "langchain/smith";

// An illustrative custom evaluator example
const notUnsure = async ({
  run,
  example,
  input,
  prediction,
  reference,
}: DynamicRunEvaluatorParams) => {
  if (typeof prediction?.output !== "string") {
    throw new Error(
      "Invalid prediction format for this evaluator. Please check your chain's outputs and try again."
    );
  }
  return {
    key: "not_unsure",
    score: !prediction.output.includes("not sure"),
  };
};

const evaluators: RunEvalType[] = [
  // LangChain's built-in evaluators
  LabeledCriteria("correctness"),

  // (Optional) Format the raw input and output from the chain and example correctly
  Criteria("conciseness", {
    formatEvaluatorInputs: (run) => ({
      input: run.rawInput.question,
      prediction: run.rawPrediction.output,
      reference: run.rawReferenceOutput.answer,
    }),
  }),

  // Custom evaluators can be user-defined RunEvaluator's
  // or a compatible function
  notUnsure,
];

For prebuilt LangChain evaluators, passing formatEvaluatorInputs function will format the raw input and output from the chain and example correctly. These will most often be strings.

This is not required for custom evaluators, which can perform their own parsing of run inputs and outputs.

3. Run the Benchmark

Use the runOnDataset function to evaluate your model. This will:

Fetch example rows from the specified dataset.
Run your chain, agent (or any custom function) on each example.
Apply evaluators to the resulting run traces and corresponding reference examples to generate automated feedback.

The results will be visible in the LangSmith app.

import { runOnDataset } from "langchain/smith";

await runOnDataset(agentExecutor, datasetName, {
  evaluators,

  // (Optional) Provide a name of the evaluator run to be
  // displayed in the LangSmith UI
  projectName: "Name of the evaluation run",
});

Predicting: ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 100.00% | 5/5
Completed
Running Evaluators: ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 100.00% | 5/5

Review the test results

You can review the test results tracing UI below by clicking the URL in the output above or navigating to the "Testing & Datasets" page in LangSmith "lcjs-qa-{uniqueId}" dataset.

This will show the new runs and the feedback logged from the selected evaluators. You can also explore a summary of the results in tabular format below.

Conclusion

Congratulations! You have successfully traced and evaluated a chain using LangSmith!

This was a quick guide to get started, but there are many more ways to use LangSmith to speed up your developer flow and produce better results.

For more information on how you can get the most out of LangSmith, check out LangSmith documentation, and please reach out with questions, feature requests, or feedback at support@langchain.dev.

LangSmith Walkthrough

Prerequisites​

Log runs to LangSmith​

Evaluate the Chain​

1. Create a LangSmith dataset​

2. Configure evaluation​

3. Run the Benchmark​

Review the test results​

Conclusion​

Help us out by providing feedback on this documentation page: