Skip to main content

Mapping values to database

In this guide we’ll go over strategies to improve graph database query generation by mapping values from user inputs to database. When using the built-in graph chains, the LLM is aware of the graph schema, but has no information about the values of properties stored in the database. Therefore, we can introduce a new step in graph database QA system to accurately map values.

Setup

Install dependencies

yarn add langchain @langchain/community @langchain/openai neo4j-driver zod

Set environment variables

We’ll use OpenAI in this example:

OPENAI_API_KEY=your-api-key

# Optional, use LangSmith for best-in-class observability
LANGSMITH_API_KEY=your-api-key
LANGCHAIN_TRACING_V2=true

Next, we need to define Neo4j credentials. Follow these installation steps to set up a Neo4j database.

NEO4J_URI="bolt://localhost:7687"
NEO4J_USERNAME="neo4j"
NEO4J_PASSWORD="password"

The below example will create a connection with a Neo4j database and will populate it with example data about movies and their actors.

import "neo4j-driver";
import { Neo4jGraph } from "@langchain/community/graphs/neo4j_graph";

const url = Deno.env.get("NEO4J_URI");
const username = Deno.env.get("NEO4J_USER");
const password = Deno.env.get("NEO4J_PASSWORD");
const graph = await Neo4jGraph.initialize({ url, username, password });

// Import movie information
const moviesQuery = `LOAD CSV WITH HEADERS FROM
'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv'
AS row
MERGE (m:Movie {id:row.movieId})
SET m.released = date(row.released),
m.title = row.title,
m.imdbRating = toFloat(row.imdbRating)
FOREACH (director in split(row.director, '|') |
MERGE (p:Person {name:trim(director)})
MERGE (p)-[:DIRECTED]->(m))
FOREACH (actor in split(row.actors, '|') |
MERGE (p:Person {name:trim(actor)})
MERGE (p)-[:ACTED_IN]->(m))
FOREACH (genre in split(row.genres, '|') |
MERGE (g:Genre {name:trim(genre)})
MERGE (m)-[:IN_GENRE]->(g))`;

await graph.query(moviesQuery);
Schema refreshed successfully.
[]

Detecting entities in the user input

We have to extract the types of entities/values we want to map to a graph database. In this example, we are dealing with a movie graph, so we can map movies and people to the database.

import { ChatPromptTemplate } from "@langchain/core/prompts";
import { ChatOpenAI } from "@langchain/openai";
import { z } from "zod";

const llm = new ChatOpenAI({ model: "gpt-3.5-turbo", temperature: 0 });

const entities = z
.object({
names: z
.array(z.string())
.describe("All the person or movies appearing in the text"),
})
.describe("Identifying information about entities.");

const prompt = ChatPromptTemplate.fromMessages([
["system", "You are extracting person and movies from the text."],
[
"human",
"Use the given format to extract information from the following\ninput: {question}",
],
]);

const entityChain = prompt.pipe(llm.withStructuredOutput(entities));

We can test the entity extraction chain.

const entities = await entityChain.invoke({
question: "Who played in Casino movie?",
});
entities;
{ names: [ "Casino" ] }

We will utilize a simple CONTAINS clause to match entities to database. In practice, you might want to use a fuzzy search or a fulltext index to allow for minor misspellings.

const matchQuery = `
MATCH (p:Person|Movie)
WHERE p.name CONTAINS $value OR p.title CONTAINS $value
RETURN coalesce(p.name, p.title) AS result, labels(p)[0] AS type
LIMIT 1`;

const matchToDatabase = async (values) => {
let result = "";
for (const entity of values.names) {
const response = await graph.query(matchQuery, {
value: entity,
});
if (response.length > 0) {
result += `${entity} maps to ${response[0]["result"]} ${response[0]["type"]} in database\n`;
}
}
return result;
};

await matchToDatabase(entities);
"Casino maps to Casino Movie in database\n"

Custom Cypher generating chain

We need to define a custom Cypher prompt that takes the entity mapping information along with the schema and the user question to construct a Cypher statement. We will be using the LangChain expression language to accomplish that.

import { StringOutputParser } from "@langchain/core/output_parsers";
import {
RunnablePassthrough,
RunnableSequence,
} from "@langchain/core/runnables";

// Generate Cypher statement based on natural language input
const cypherTemplate = `Based on the Neo4j graph schema below, write a Cypher query that would answer the user's question:
{schema}
Entities in the question map to the following database values:
{entities_list}
Question: {question}
Cypher query:`;

const cypherPrompt = ChatPromptTemplate.fromMessages([
[
"system",
"Given an input question, convert it to a Cypher query. No pre-amble.",
],
["human", cypherTemplate],
]);

const llmWithStop = llm.bind({ stop: ["\nCypherResult:"] });

const cypherResponse = RunnableSequence.from([
RunnablePassthrough.assign({ names: entityChain }),
RunnablePassthrough.assign({
entities_list: async (x) => matchToDatabase(x.names),
schema: async (_) => graph.getSchema(),
}),
cypherPrompt,
llmWithStop,
new StringOutputParser(),
]);
const cypher = await cypherResponse.invoke({
question: "Who played in Casino movie?",
});
cypher;
'MATCH (:Movie {title: "Casino"})<-[:ACTED_IN]-(actor)\nRETURN actor.name'

Help us out by providing feedback on this documentation page: