• Media type: E-Article
  • Title: Microgenetic algorithms and artificial neural networks to assess minimum data requirements for prediction of pesticide concentrations in shallow groundwater on a regional scale
  • Contributor: Sahoo, Goloka Behari; Ray, Chittaranjan
  • imprint: American Geophysical Union (AGU), 2008
  • Published in: Water Resources Research
  • Language: English
  • DOI: 10.1029/2007wr005875
  • ISSN: 1944-7973; 0043-1397
  • Keywords: Water Science and Technology
  • Origination:
  • Footnote:
  • Description: <jats:p>Artificial neural networks (ANNs) have been extensively used for forecasting problems involving water quantity and quality. In most cases, the geometry and model parameters of the ANN are set using a trial‐and‐error approach to achieve better network generalization ability, whereby the available data are divided arbitrarily into training, testing, and validation subsets. It has been shown that using the arbitrary sample selection method to assign samples into the training subset commonly results in the inclusion of samples from densely clustered regions and omission of samples from sparsely represented regions. This paper presents a systematic approach using the self‐organizing map (SOM) clustering technique that identifies which samples and determines how many samples should be included in each of the three subsets required by ANN for optimum predictive performance efficiency. In addition, this paper presents the microgenetic algorithms (<jats:italic>μ</jats:italic>GA) that optimize ANN's geometry and model parameters in terms of the correlation coefficient (<jats:italic>R</jats:italic>). In the sensitivity analysis, <jats:italic>μ</jats:italic>GA model parameters are found to be least sensitive to the optimum <jats:italic>R</jats:italic> value, while ANN's predictive performance is significantly affected by (1) the poor selection of its geometry and model parameters and (2) the arbitrary selection of samples for the three subsets of data used. It is demonstrated that the <jats:italic>μ</jats:italic>GA‐ANN model using the SOM technique for data division outperforms the <jats:italic>μ</jats:italic>GA‐ANN model using arbitrary data division. For the training subset, the model using the SOM technique identifies samples that are representative of the region, requiring only 20% of the total samples, whereas the arbitrary sample selection method requires 50–90%. Because resampling on a regional scale is expensive and time consuming, substantial cost and time could be saved if resampling could be done only on the 20% representative drinking water wells.</jats:p>
  • Access State: Open Access