Analyze collocations

This analysis job finds a list of statistically significant immediate pairs of words.

To start this job, select the questions: “What pairs of words often appear directly together? What technical terms or phrases appear in the literature?”

In natural language processing, a collocation is a statistically significant association between a pair of words that appear directly next to one another. For example, while English speakers use the phrases “strong tea” and “powerful computers,” it would not be idiomatic English to use “powerful tea” or “strong computers.”

(If you would like to determine statistically significant associations between words that are farther apart than immediate neighbors, check out the cooccurrence analysis.)

The user can specify how many of the most significant collocations to preserve, and these are offered to the user for download. This job can answer a variety of interesting questions:

What concepts are often invoked together in a body of literature? (Input: a domain of interest, selecting one of the first three analysis methods and then searching for concepts of interest)

What technical terms or phrases are often used in a discipline? (Input: a domain of interest, selecting the parts-of-speech analysis method)


You can choose several tests for determining significance values of collocation pairs.

You may either return a given number of the most significant collocations, or, without any increase in computation time, all collocations regardless of significance values. Lastly, you can choose to filter the list by a particular word, returning only pairs for which one of the two words is the word provided.