Documentation Index
Fetch the complete documentation index at: https://docs.upsonic.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Markdown splitter parses Markdown syntax to identify structural boundaries like headers, code blocks, tables, and lists. Segments content by semantic blocks and preserves document hierarchy through header tracking. Splitter Class:MarkdownChunker
Config Class: MarkdownChunkingConfig
Dependencies
No additional dependencies required. Uses standard library.Examples
Parameters
| Parameter | Type | Description | Default | Source |
|---|---|---|---|---|
chunk_size | int | Target size of each chunk | 1024 | Base |
chunk_overlap | int | Overlapping units between chunks | 200 | Base |
min_chunk_size | int | None | Minimum size for a chunk | None | Base |
length_function | Callable[[str], int] | Function to measure text length | len | Base |
strip_whitespace | bool | Strip leading/trailing whitespace | False | Base |
split_on_elements | list[str] | Elements that signify boundaries | ["h1", "h2", "h3", "code_block", "table", "horizontal_rule"] | Specific |
preserve_whole_elements | list[str] | Indivisible element types | ["code_block", "table"] | Specific |
strip_elements | bool | Strip Markdown syntax characters | True | Specific |
preserve_original_content | bool | Preserve original markdown content | False | Specific |
text_chunker_to_use | BaseChunker | Chunker for oversized blocks | RecursiveChunker | Specific |

