• Media type: E-Book
  • Title: Combining Family History and Machine Learning to Link Historical Records
  • Contributor: Price, Joseph [Author]; Buckles, Kasey [Other]; Van Leeuwen, Jacob [Other]; Riley, Isaac [Other]
  • Corporation: National Bureau of Economic Research
  • imprint: Cambridge, Mass: National Bureau of Economic Research, 2019
  • Published in: NBER working paper series ; no. w26227
  • Extent: 1 Online-Ressource; illustrations (black and white)
  • Language: English
  • DOI: 10.3386/w26227
  • Identifier:
  • Reproduction note: Hardcopy version available to institutional subscribers
  • Origination:
  • Footnote: System requirements: Adobe [Acrobat] Reader required for PDF files
    Mode of access: World Wide Web
  • Description: A key challenge for research on many questions in the social sciences is that it is difficult to link historical records in a way that allows investigators to observe people at different points in their life or across generations. In this paper, we develop a new approach that relies on millions of record links created by individual contributors to a large, public, wiki-style family tree. First, we use these "true" links to inform the decisions one needs to make when using traditional linking methods. Second, we use the links to construct a training data set for use in supervised machine learning methods. We describe the procedure we use and illustrate the potential of our approach by linking individuals across the 100% samples of the US decennial censuses from 1900, 1910, and 1920. We obtain an overall match rate of about 70 percent, with a false positive rate of about 12 percent. This combination of high match rate and accuracy represents a point beyond the current frontier for record linking methods
  • Access State: Open Access