token Package¶
chasen
Module¶
Tokenizer Module for Japanese Segmenter Chasen
moses
Module¶
Tokenizer Module for Moses built-in tokenizer
-
corpustools.token.moses.
tokenize
(infile, outfile, lang, tools, step)[source]¶ Call moses built-in tokenizer for corpus.
Moses built-in tokenizer support European languages.
Parameters: - infile – input filename.
- outfile – output filename.
- lang – language of corpus.
- tools – external tools configuration.
stanford_segmenter
Module¶
Tokenizer Module for Stanford Segmenter