Lincoln-Petersen method

The Lincoln index is a statistical measure used in several fields to estimate the number of cases that have not yet been observed, based on two independent sets of observed cases. Described by Frederick Charles Lincoln in 1930, it is also sometimes known as the Lincoln-Petersen method after C.G. Johannes Petersen who was the first to use the related mark and recapture method.

Consider two observers who separately count the different species of plants or animals in a given area. If they each come back having found 100 species but only 5 particular species are found by both observers, then each observer clearly missed at least 95 species (that is, the 95 that only the other observer found). Thus, we know that both observers miss a lot. On the other hand, if 99 of the 100 species each observer found had been found by both, it is fair to expect that they have found a far higher percentage of the total species that are there to find.

The same reasoning applies to mark and recapture. If some animals in a given area are caught and marked, and later a second round of captures is done: the number of marked animals found in the second round can be used to generate an estimate of the total population.

Another example arises in computational linguistics for estimating the total vocabulary of a language. Given two independent samples, the overlap between their vocabularies enables a useful estimate of how many more vocabulary items exist but did not happen to show up in either sample. A similar example involves estimating the number of typographical errors remaining in a text, from two proofreaders' counts.

The Lincoln Index formalizes this phenomenon. If E1 and E2 are the number of species (or words, or other phenomena) observed by two independent methods, and S is the number of observations in common, then the Lincoln Index is simply

$L={E_{1}E_{2} \over S}$

...
Wikipedia