Documentation Index
Fetch the complete documentation index at: https://docs.upsonic.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Semantic splitter identifies boundaries by embedding sentences and finding points of high cosine distance between adjacent sentences, indicating topic changes. Uses statistical methods to determine breakpoint thresholds. Requires an embedding provider. Splitter Class:SemanticChunker
Config Class: SemanticChunkingConfig
Dependencies
Examples
Parameters
| Parameter | Type | Description | Default | Source |
|---|---|---|---|---|
chunk_size | int | Target size of each chunk | 1024 | Base |
chunk_overlap | int | Overlapping units between chunks | 200 | Base |
min_chunk_size | int | None | Minimum size for a chunk | None | Base |
length_function | Callable[[str], int] | Function to measure text length | len | Base |
strip_whitespace | bool | Strip leading/trailing whitespace | False | Base |
embedding_provider | EmbeddingProvider | Required embedding provider instance | Required | Specific |
breakpoint_threshold_type | BreakpointThresholdType | Statistical method for breakpoints | PERCENTILE | Specific |
breakpoint_threshold_amount | float | Numeric value for threshold type | 95.0 | Specific |
sentence_splitter | Callable[[str], list[str]] | Function to split text into sentences | Default regex | Specific |

