imprint:
Cambridge, Mass: National Bureau of Economic Research, 2019
Published in:NBER working paper series ; no. w26227
Extent:
1 Online-Ressource; illustrations (black and white)
Language:
English
DOI:
10.3386/w26227
Identifier:
Reproduction note:
Hardcopy version available to institutional subscribers
Origination:
Footnote:
System requirements: Adobe [Acrobat] Reader required for PDF files
Mode of access: World Wide Web
Description:
A key challenge for research on many questions in the social sciences is that it is difficult to link historical records in a way that allows investigators to observe people at different points in their life or across generations. In this paper, we develop a new approach that relies on millions of record links created by individual contributors to a large, public, wiki-style family tree. First, we use these "true" links to inform the decisions one needs to make when using traditional linking methods. Second, we use the links to construct a training data set for use in supervised machine learning methods. We describe the procedure we use and illustrate the potential of our approach by linking individuals across the 100% samples of the US decennial censuses from 1900, 1910, and 1920. We obtain an overall match rate of about 70 percent, with a false positive rate of about 12 percent. This combination of high match rate and accuracy represents a point beyond the current frontier for record linking methods