Custom MRKL agent
This notebook goes through how to create your own custom Modular Reasoning, Knowledge and Language (MRKL, pronounced โmiracleโ) agent using LCEL.
A MRKL agent consists of three parts:
- Tools: The tools the agent has available to use.
Runnable
: TheRunnable
that produces the text that is parsed in a certain way to determine which action to take.- The agent class itself: this parses the output of the LLMChain to determine which action to take.
In this notebook we walk through how to create a custom MRKL agent by creating a custom Runnable
.
Custom Runnable
โ
The first way to create a custom agent is to use a custom Runnable
.
Most of the work in creating the custom Runnable
comes down to the input and outputs. Because we're using a custom Runnable
we are not provided with any pre-built input/output parsers.
Instead, we must create our own to format the inputs and outputs a the way we define.
Additionally, we need an agent_scratchpad
input variable to put notes on previous actions and observations.
This should almost always be the final part of the prompt. This is a very important step, because without the agent_scratchpad
the agent will have no context on the previous actions it has taken.
To ensure the prompt we create contains the appropriate instructions and input variables, we'll create a helper function which takes in a list of input variables, and returns the final formatted prompt. We will also do something similar with the output parser, ensuring our input prompts and outputs are always formatted the same way.
The first step is to import all the necessary modules.
- npm
- Yarn
- pnpm
npm install @langchain/openai
yarn add @langchain/openai
pnpm add @langchain/openai
import { AgentExecutor } from "langchain/agents";
import { ChatOpenAI } from "@langchain/openai";
import { PromptTemplate } from "langchain/prompts";
import {
AgentAction,
AgentFinish,
AgentStep,
BaseMessage,
InputValues,
SystemMessage,
} from "langchain/schema";
import { RunnableSequence } from "@langchain/core/runnables";
import { SerpAPI } from "langchain/tools";
import { Calculator } from "langchain/tools/calculator";
Next, we'll instantiate our chat model and tools.
/**
* Instantiate the chat model and bind the stop token
* @important The stop token must be set, if not the LLM will happily continue generating text forever.
*/
const model = new ChatOpenAI({ temperature: 0 }).bind({
stop: ["\nObservation"],
});
/** Define the tools */
const tools = [
new SerpAPI(process.env.SERPAPI_API_KEY, {
location: "Austin,Texas,United States",
hl: "en",
gl: "us",
}),
new Calculator(),
];
After this, we can define our prompts using the PromptTemplate
class. This will come in handy later when formatting our prompts as it provides a helper .format()
method we'll use.
const PREFIX = new PromptTemplate({
template: `Answer the following questions as best you can, but speaking as a pirate might speak. You have access to the following tools:
{tools}`,
inputVariables: ["tools"],
});
/**
* @important This prompt is used as an example for the LLM on how it should
* respond to the user.
*/
const TOOL_INSTRUCTIONS = new PromptTemplate({
template: `Use the following format in your response:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question`,
inputVariables: ["tool_names"],
});
const SUFFIX = new PromptTemplate({
template: `Begin! Remember to speak as a pirate when giving your final answer. Use lots of "Args
Question: {input}
{agent_scratchpad}`,
inputVariables: ["input", "agent_scratchpad"],
});
Once we've defined our prompts we can create our custom input prompt formatter.
This function will take in an arbatary list of inputs and return a formatted prompt.
Inside we're first checking input
and agent_scratchpad
exist on the input.
Then, we're formatting the agent_scratchpad
as a string in a way the LLM can easily interpret.
Finally, we call the .format()
methods on all our prompts, passing in the input variables. With these, we can then return the final prompt inside a SystemMessage
.
async function formatPrompt(inputValues: InputValues) {
/** Verify input and agent_scratchpad exist in the input object. */
if (!("input" in inputValues) || !("agent_scratchpad" in inputValues)) {
throw new Error(
`Missing input or agent_scratchpad in input object: ${JSON.stringify(
inputValues
)}`
);
}
const input = inputValues.input as string;
/** agent_scratchpad will be undefined on the first iteration. */
const agentScratchpad = (inputValues.agent_scratchpad ?? []) as AgentStep[];
/** Convert the list of AgentStep's into a more agent friendly string format. */
const formattedScratchpad = agentScratchpad.reduce(
(thoughts, { action, observation }) =>
thoughts +
[action.log, `\nObservation: ${observation}`, "Thought:"].join("\n"),
""
);
/** Format our prompts by passing in the input variables */
const formattedPrefix = await PREFIX.format({
tools: tools
.map((tool) => `Name: ${tool.name}\nDescription: ${tool.description}`)
.join("\n"),
});
const formattedToolInstructions = await TOOL_INSTRUCTIONS.format({
tool_names: tools.map((tool) => tool.name).join(", "),
});
const formattedSuffix = await SUFFIX.format({
input,
agent_scratchpad: formattedScratchpad,
});
/** Join all the prompts together, and return as an instance of `SystemMessage` */
const formatted = [
formattedPrefix,
formattedToolInstructions,
formattedSuffix,
].join("\n");
return [new SystemMessage(formatted)];
}
If we're curious as to what the final prompt looks like, we can console log it before returning. It should look like this:
console.log(new SystemMessage(formatted));
Answer the following questions as best you can, but speaking as a pirate might speak. You have access to the following tools:
Name: search
Description: a search engine. useful for when you need to answer questions about current events. input should be a search query.
Name: calculator
Description: Useful for getting the result of a math expression. The input to this tool should be a valid mathematical expression that could be executed by a simple calculator.
Use the following format in your response:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [search, calculator]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin! Remember to speak as a pirate when giving your final answer. Use lots of "Args
Question: How many people live in canada as of 2023?
If you want to dive deeper and see the exact steps, inputs, outputs and more you can use LangSmith to visualize your agent.
After defining our input prompt formatter, we can define our output parser.
This function takes in a message in the form of BaseMessage
, parses it and either returns an instance of AgentAction
if there is more work to be done, or AgentFinish
if the agent is done.
function customOutputParser(message: BaseMessage): AgentAction | AgentFinish {
const text = message.content;
/** If the input includes "Final Answer" return as an instance of `AgentFinish` */
if (text.includes("Final Answer:")) {
const parts = text.split("Final Answer:");
const input = parts[parts.length - 1].trim();
const finalAnswers = { output: input };
return { log: text, returnValues: finalAnswers };
}
/** Use RegEx to extract any actions and their values */
const match = /Action: (.*)\nAction Input: (.*)/s.exec(text);
if (!match) {
throw new Error(`Could not parse LLM output: ${text}`);
}
/** Return as an instance of `AgentAction` */
return {
tool: match[1].trim(),
toolInput: match[2].trim().replace(/^"+|"+$/g, ""),
log: text,
};
}
After this, all that is left is to chain all the pieces together using a RunnableSequence
and pass it through to the AgentExecutor
so we can execute our agent.
const runnable = RunnableSequence.from([
{
input: (i: InputValues) => i.input,
agent_scratchpad: (i: InputValues) => i.steps,
},
formatPrompt,
model,
customOutputParser,
]);
/** Pass our runnable to `AgentExecutor` to make our agent executable */
const executor = AgentExecutor.fromAgentAndTools({
agent: runnable,
tools,
});
Once we have our executor
calling the agent is simple!
console.log("Loaded agent.");
const input = `How many people live in canada as of 2023?`;
console.log(`Executing with input "${input}"...`);
const result = await executor.invoke({ input });
console.log(`Got output ${result.output}`);
/**
Loaded agent.
Executing with input "How many people live in canada as of 2023?"...
Got output Arrr, there be 38,781,291 people livin' in Canada in 2023.
*/