Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.upsonic.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

DOCX loader extracts content from Microsoft Word documents (.docx). Supports extraction of text, tables, headers, and footers with flexible formatting options. Loader Class: DOCXLoader Config Class: DOCXLoaderConfig

Install

Install the DOCX loader optional dependency group:
uv pip install "upsonic[docx-loader]"

Examples

from upsonic import Agent, Task, KnowledgeBase
from upsonic.loaders.docx import DOCXLoader
from upsonic.loaders.config import DOCXLoaderConfig
from upsonic.embeddings import OpenAIEmbedding, OpenAIEmbeddingConfig
from upsonic.text_splitter.recursive import RecursiveChunker, RecursiveChunkingConfig
from upsonic.vectordb import ChromaProvider, ChromaConfig, ConnectionConfig, Mode

# Configure loader
loader_config = DOCXLoaderConfig(
    include_tables=True,
    include_headers=True,
    table_format="markdown"
)
loader = DOCXLoader(loader_config)

# Setup KnowledgeBase
embedding = OpenAIEmbedding(OpenAIEmbeddingConfig())
chunker = RecursiveChunker(RecursiveChunkingConfig())
vectordb = ChromaProvider(ChromaConfig(
    collection_name="docx_docs",
    vector_size=1536,
    connection=ConnectionConfig(mode=Mode.IN_MEMORY)
))

kb = KnowledgeBase(
    sources=["document.docx"],
    embedding_provider=embedding,
    vectordb=vectordb,
    loaders=[loader],
    splitters=[chunker]
)

# Query with Agent
agent = Agent("anthropic/claude-sonnet-4-5")
task = Task("Extract key points from the document", context=[kb])
result = agent.do(task)
print(result)

Parameters

ParameterTypeDescriptionDefaultSource
encodingstr | NoneFile encoding (auto-detected if None)NoneBase
error_handling"ignore" | "warn" | "raise"How to handle loading errors”warn”Base
include_metadataboolWhether to include file metadataTrueBase
custom_metadatadictAdditional metadata to includeBase
max_file_sizeint | NoneMaximum file size in bytesNoneBase
skip_empty_contentboolSkip documents with empty contentTrueBase
include_tablesboolInclude table contentTrueSpecific
include_headersboolInclude header contentTrueSpecific
include_footersboolInclude footer contentTrueSpecific
table_format"text" | "markdown" | "html"How to format tables”text”Specific