recursiveSplitText

Syntax

recursiveSplitText(text, [maxLength=300], [chunkOverlap=20], [separators], [keepSeparator=true])

Arguments

text A LITERAL scalar representing the input text to be split into chunks.

maxLength A positive integer indicating the maximum length of each chunk. Defaults to 300.

chunkOverlap A non-negative integer not exceeding maxLength, indicating the maximum length of repetition allowed for adjacent chunks. Defaults to 20.

separators A STRING vector representing a list of user-defined separators. Defaults to ["\n\n", "\n", " ", ""]. Regular expressions are not supported.

keepSeparator A Boolean indicating whether to keep the separators:

  • true (default): Keep the separators at the beginning of the second half of the text.
  • false: Not keep the separators.

Details

Recursively split the text based on separators.

Return value: A STRING vector.

Examples

text = "This is the first sentence. This is the second sentence containing a comma. Next is the third which is longer than the previous two and needs to be further split. The last sentence is the closing sentence."
separators = [".",","]

chunks = recursiveSplitText(text, maxLength=15, chunkOverlap=5, separators=separators, keepSeparator=true)