• Media type: E-Book
  • Title: Combining Family History and Machine Learning to Link Historical Records
  • Contributor: Price, Joseph [Author]; Buckles, Kasey [Other]; Van Leeuwen, Jacob [Other]; Riley, Isaac [Other]
  • imprint: [S.l.]: SSRN, [2019]
  • Published in: NBER Working Paper ; No. w26227
  • Extent: 1 Online-Ressource (34 p)
  • Language: English
  • Origination:
  • Footnote: Nach Informationen von SSRN wurde die ursprüngliche Fassung des Dokuments September 2019 erstellt
  • Description: A key challenge for research on many questions in the social sciences is that it is difficult to link historical records in a way that allows investigators to observe people at different points in their life or across generations. In this paper, we develop a new approach that relies on millions of record links created by individual contributors to a large, public, wiki-style family tree. First, we use these “true” links to inform the decisions one needs to make when using traditional linking methods. Second, we use the links to construct a training data set for use in supervised machine learning methods. We describe the procedure we use and illustrate the potential of our approach by linking individuals across the 100% samples of the US decennial censuses from 1900, 1910, and 1920. We obtain an overall match rate of about 70 percent, with a false positive rate of about 12 percent. This combination of high match rate and accuracy represents a point beyond the current frontier for record linking methods.Institutional subscribers to the NBER working paper series, and residents of developing countries may download this paper without additional charge at "http://www.nber.org/papers/w26227"
  • Access State: Open Access