7.3.66. table_tokenize¶
7.3.66.1. Summary¶
table_tokenize command tokenizes text by the specified table’s tokenizer.
7.3.66.2. Syntax¶
This command takes many parameters.
table and string are required parameters. Others are
optional:
table_tokenize table
string
[flags=NONE]
[mode=GET]
[index_column=null]
7.3.66.3. Usage¶
Here is a simple example.
Execution example:
plugin_register token_filters/stop_word
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create Terms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto --token_filters TokenFilterStopWord
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create Terms is_stop_word COLUMN_SCALAR Bool
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Terms
[
{"_key": "and", "is_stop_word": true}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
table_tokenize Terms "Hello and Good-bye" --mode GET
# [[0, 1337566253.89858, 0.000355720520019531], []]
Terms table is set TokenBigram tokenizer, NormalizerAuto normalizer,
TokenFilterStopWord token filter. It returns tokens that is
generated by tokenizeing "Hello and Good-bye" with TokenBigram tokenizer.
It is normalized by NormalizerAuto normalizer.
and token is removed with TokenFilterStopWord token filter.
7.3.66.4. Parameters¶
This section describes all parameters. Parameters are categorized.
7.3.66.4.1. Required parameters¶
There are required parameters, table and string.
7.3.66.4.1.1. table¶
Specifies the lexicon table. table_tokenize command uses the
tokenizer, the normalizer, the token filters that is set the
lexicon table.
7.3.66.4.2. Optional parameters¶
There are optional parameters.
7.3.66.4.2.1. flags¶
Specifies a tokenization customize options. You can specify
multiple options separated by “|”.
The default value is NONE.
7.3.66.4.2.3. index_column¶
Specifies an index column.
Return value includes estimated_size of the index.
The estimated_size is useful for checking estimated frequency of tokens.
7.3.66.5. Return value¶
table_tokenize command returns tokenized tokens.
See Return value option in tokenize about details.