In natural language processing, an N-gram is a contiguous sequence of n items: phonemes, syllables, letters, words or base pairs, from a given sample of text or speech.
Examples from the Google n-gram corpus:
3-grams:
- ceramics collectables collectibles
- ceramics collectables fine
- ceramics collected by
4-grams:
- serve as the incoming
- serve as the incubator
- serve as the independent
In natural language processing, an N-gram model is a type of probabilistic language model for predicting the next item in a sequence, such as a string of text. They can be used to analyze sequences of words, so as to compute the frequency of collocation of words and predict the next possible word in a given request.
Check the basics of NLP with NLTK to program and analyze N-grams in Python.
More on N-grams on Wikipedia.
« Back to Glossary Index