NLTK

NLTK, which stands for Natural Language Toolkit is a suite of libraries and programs for symbolic and statistical Natural Language Processing – NLP – for the Python programming language.

Developed by Steven Bird and Edward Loper in the Department of Computer and Information Science at the University of Pennsylvania, NLTK notably allows to easily conduct the following operations:

  • Lexical analysis: Word and text tokenizer
  • n-gram and collocations
  • Part-of-speech tagger
  • Tree model and Text chunker for capturing
  • Named-entity recognition

NLTK also provides access to more than 50 corpora and lexical resources, including WordNet, as well as a number of other NLP resources.

For more information on NLTK, check the NLTK website and the NLTK Wikipedia page. There is also a complete and free book on NLTK available online for NLTK 3 and Python 3 with the following chapters.

  1. Preface
  2. Language Processing and Python
  3. Accessing Text Corpora and Lexical Resources
  4. Processing Raw Text
  5. Writing Structured Programs
  6. Categorizing and Tagging Words (minor fixes still required)
  7. Learning to Classify Text
  8. Extracting Information from Text
  9. Analyzing Sentence Structure
  10. Building Feature Based Grammars
  11. Analyzing the Meaning of Sentences (minor fixes still required)
  12. Managing Linguistic Data (minor fixes still required)
  13. Afterword: Facing the Language Challenge

Bibliography
Term Index

« Back to Glossary Index