File Loaders

Compatibility

Only available on Node.js.

These loaders are used to load files given a filesystem path or a Blob object.

📄️ Folders with multiple files

This example goes over how to load data from folders with multiple files. The second argument is a map of file extensions to loader factories. Each file will be passed to the matching loader, and the resulting documents will be concatenated together.

📄️ ChatGPT files

This example goes over how to load conversations.json from your ChatGPT data export folder. You can get your data export by email by going to: ChatGPT -> (Profile) - Settings -> Export data -> Confirm export -> Check email.

📄️ CSV files

This example goes over how to load data from CSV files. The second argument is the column name to extract from the CSV file. One document will be created for each row in the CSV file. When column is not specified, each row is converted into a key/value pair with each key/value pair outputted to a new line in the document's pageContent. When column is specified, one document is created for each row, and the value of the specified column is used as the document's pageContent.

📄️ Docx files

This example goes over how to load data from docx files.

📄️ EPUB files

This example goes over how to load data from EPUB files. By default, one document will be created for each chapter in the EPUB file, you can change this behavior by setting the splitChapters option to false.

📄️ JSON files

The JSON loader use JSON pointer to target keys in your JSON files you want to target.

📄️ JSONLines files

This example goes over how to load data from JSONLines or JSONL files. The second argument is a JSONPointer to the property to extract from each JSON object in the file. One document will be created for each JSON object in the file.

📄️ Notion markdown export

This example goes over how to load data from your Notion pages exported from the notion dashboard.

📄️ Open AI Whisper Audio

Only available on Node.js.

📄️ PDF files

This example goes over how to load data from PDF files. By default, one document will be created for each page in the PDF file, you can change this behavior by setting the splitPages option to false.

📄️ PPTX files

This example goes over how to load data from PPTX files. By default, one document will be created for all pages in the PPTX file.

📄️ Subtitles

This example goes over how to load data from subtitle files. One document will be created for each subtitles file.

📄️ Text files

This example goes over how to load data from text files.

📄️ Unstructured

This example covers how to use Unstructured to load files of many types. Unstructured currently supports loading of text files, powerpoints, html, pdfs, images, and more.