• Medientyp: E-Artikel
  • Titel: Gradient-Based Optimization of Hyperparameters
  • Beteiligte: Bengio, Yoshua
  • Erschienen: MIT Press - Journals, 2000
  • Erschienen in: Neural Computation
  • Sprache: Englisch
  • DOI: 10.1162/089976600300015187
  • ISSN: 0899-7667; 1530-888X
  • Schlagwörter: Cognitive Neuroscience ; Arts and Humanities (miscellaneous)
  • Entstehung:
  • Anmerkungen:
  • Beschreibung: <jats:p> Many machine learning algorithms can be formulated as the minimization of a training criterion that involves a hyperparameter. This hyperparameter is usually chosen by trial and error with a model selection criterion. In this article we present a methodology to optimize several hyper-parameters, based on the computation of the gradient of a model selection criterion with respect to the hyperparameters. In the case of a quadratic training criterion, the gradient of the selection criterion with respect to the hyperparameters is efficiently computed by backpropagating through a Cholesky decomposition. In the more general case, we show that the implicit function theorem can be used to derive a formula for the hyper-parameter gradient involving second derivatives of the training criterion. </jats:p>