How to create a time-weighted retriever
This guide assumes familiarity with the following concepts:
This guide covers the TimeWeightedVectorStoreRetriever
,
which uses a combination of semantic similarity and a time decay.
The algorithm for scoring them is:
semantic_similarity + (1.0 - decay_rate) ^ hours_passed
Notably, hours_passed
refers to the hours passed since the object in the retriever was last accessed, not since it was created. This means that frequently accessed objects remain "fresh."
let score = (1.0 - this.decayRate) ** hoursPassed + vectorRelevance;
this.decayRate
is a configurable decimal number between 0 and 1. A lower number means that documents will be "remembered" for longer, while a higher number strongly weights more recently accessed documents.
Note that setting a decay rate of exactly 0 or 1 makes hoursPassed
irrelevant and makes this retriever equivalent to a standard vector lookup.
It is important to note that due to required metadata, all documents must be added to the backing vector store using the addDocuments
method on the retriever, not the vector store itself.
- npm
- Yarn
- pnpm
npm install @langchain/openai
yarn add @langchain/openai
pnpm add @langchain/openai
import { TimeWeightedVectorStoreRetriever } from "langchain/retrievers/time_weighted";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "@langchain/openai";
const vectorStore = new MemoryVectorStore(new OpenAIEmbeddings());
const retriever = new TimeWeightedVectorStoreRetriever({
vectorStore,
memoryStream: [],
searchKwargs: 2,
});
const documents = [
"My name is John.",
"My name is Bob.",
"My favourite food is pizza.",
"My favourite food is pasta.",
"My favourite food is sushi.",
].map((pageContent) => ({ pageContent, metadata: {} }));
// All documents must be added using this method on the retriever (not the vector store!)
// so that the correct access history metadata is populated
await retriever.addDocuments(documents);
const results1 = await retriever.invoke("What is my favourite food?");
console.log(results1);
/*
[
Document { pageContent: 'My favourite food is pasta.', metadata: {} }
]
*/
const results2 = await retriever.invoke("What is my favourite food?");
console.log(results2);
/*
[
Document { pageContent: 'My favourite food is pasta.', metadata: {} }
]
*/
API Reference:
- TimeWeightedVectorStoreRetriever from
langchain/retrievers/time_weighted
- MemoryVectorStore from
langchain/vectorstores/memory
- OpenAIEmbeddings from
@langchain/openai
Next stepsβ
You've now learned how to use time as a factor when performing retrieval.
Next, check out the broader tutorial on RAG, or this section to learn how to create your own custom retriever over any data source.