# tokenize {#tokenize}

**Parent topic:**[Functions](../../Functions/category.md)

## Syntax {#syntax}

`tokenize(text, parser, [full=false], [lowercase=true], [stem=false])`

## Arguments {#arguments}

**text** is a STRING scalar specifying the text to be tokenized.

**parser** is a STRING scalar specifying the tokenizer. It has no default value and must be explicitly set. Options include:

-   none: not tokenized
-   english: tokenizes based on spaces and punctuations

**lowercase** specifies whether to convert words lowercase \(without affecting the original data\), which only takes effect when *parser* is set to english. The default value is true, which applies to case-insensitive scenarios.

**stem** specifies whether to match English words by their stem, which only takes effect when *parser*=english and *lowercase*=true. The default value is false, indicating exact searches.

**Note**: The *full* parameter is not applicable for English text.

## Details {#details}

Tokenize the input text according to the specified configurations.

**Return value**: A STRING vector containing the tokenization result.

## Examples {#examples}

``` {#codeblock_mgc_ylv_fdc}
text1 = "The sun was shining brightly as I walked down the street, enjoying the warmth of the summer day."
tokenize(text=text1, parser='english', lowercase=false, stem=true)
// output:["The","sun","shine","bright","I","walk","down","street","enjoy","warmth","summer","day"]
```

