• Media type: E-Article
  • Title: Juneau : data lake management for Jupyter : data lake management for Jupyter
  • Contributor: Zhang, Yi; Ives, Zachary G.
  • Published: Association for Computing Machinery (ACM), 2019
  • Published in: Proceedings of the VLDB Endowment, 12 (2019) 12, Seite 1902-1905
  • Language: English
  • DOI: 10.14778/3352063.3352095
  • ISSN: 2150-8097
  • Origination:
  • Footnote:
  • Description: In collaborative settings such as multi-investigator laboratories, data scientists need improved tools to manage not their data records but rather their data sets and data products , to facilitate both provenance tracking and data (and code) reuse within their data lakes and file systems. We demonstrate the Juneau System, which extends computational notebook software (Jupyter Notebook) as an instrumentation and data management point for overseeing and facilitating improved dataset usage, through capabilities for indexing, searching, and recommending "complementary" data sources, previously extracted machine learning features, and additional training data. This demonstration focuses on how we help the user find related datasets via search .