N-gram

In natural language processing, an N-gram is a contiguous sequence of n items: phonemes, syllables, letters, words or base pairs, from a given sample of text or speech.

Examples from the Google n-gram corpus:

3-grams:

  • ceramics collectables collectibles
  • ceramics collectables fine
  • ceramics collected by

4-grams:

  • serve as the incoming
  • serve as the incubator
  • serve as the independent

In natural language processing, an N-gram model is a type of probabilistic language model for predicting the next item in a sequence, such as a string of text. They can be used to analyze sequences of words, so as to compute the frequency of collocation of words and predict the next possible word in a given request.

Check the basics of NLP with NLTK to program and analyze N-grams in Python.

More on N-grams on Wikipedia.

« Back to Glossary Index