Tokenization

In natural language processing, Tokenization is the operation of separating a string of text into words and/or sentences. The resulting list of words/sentences can then be further analyzed, notably to measure word frequency, which can be a first step to understanding what a text is about.

Check the basics of NLP with NLTK to program Tokenization in Python.

More on Tokenization on Wikipedia.

« Back to Glossary Index

Leave a Reply

Your email address will not be published. Required fields are marked *