recursiveSplitText
Syntax
recursiveSplitText(text, [maxLength=300], [chunkOverlap=20], [separators],
[keepSeparator=true])
Arguments
text A LITERAL scalar representing the input text to be split into chunks.
maxLength A positive integer indicating the maximum length of each chunk. Defaults to 300.
chunkOverlap A non-negative integer not exceeding maxLength, indicating the maximum length of repetition allowed for adjacent chunks. Defaults to 20.
separators A STRING vector representing a list of user-defined separators.
Defaults to ["\n\n", "\n", " ", ""]
. Regular expressions are not
supported.
keepSeparator A Boolean indicating whether to keep the separators:
- true (default): Keep the separators at the beginning of the second half of the text.
- false: Not keep the separators.
Details
Recursively split the text based on separators.
Return value: A STRING vector.
Examples
text = "This is the first sentence. This is the second sentence containing a comma. Next is the third which is longer than the previous two and needs to be further split. The last sentence is the closing sentence."
separators = [".",","]
chunks = recursiveSplitText(text, maxLength=15, chunkOverlap=5, separators=separators, keepSeparator=true)