Description:
Various aspects of the wavelet approach to nonparametric regression are considered, with the overall aim of extending the scope of wavelet techniques, to irregularlyspaced data, to regularly-spaced data sets of arbitrary size, to heteroscedastic and correlated data, and to data some of which may be downweighted or omitted as outliers. At the core of the methodology discussed is the following problem: if a sequence has a given covariance structure, what is the variance and covariance structure of its discrete wavelet transform? For sequences whose length is a power of 2, an algorithm for finding all the variances and within-level covariances in the wavelet table is developed and investigated in detail. In particular, it is shown that if the original sequence has band-limited covariance matrix, then the time required by the algorithm is linear in the length of the sequence. Up to now, most statistical work on wavelet methods presumes that the number of observations is a power of 2 and that the independent variable takes values on a regular grid. The variance-calculation algorithm allows data on any set of independent variable values to be treated, by first interpolating to a fine regular grid of suitable length, and then constructing a wavelet expansion of the gridded data. The gridded data will, in general, have a band-limited covariance matrix, and the algorithm therefore allows the elements of the wavelet transform to be thresholded individually using thresholds proportional to their standard deviation. Various thresholding methods are discussed and investigated. Exact risk formulae for the mean square error of the methodology for given design are derived and used, to avoid, as far as possible, the need for simulation in assessing performance. Both for regular and irregular data, good performance is obtained by noise-proportional thresholding, with thresholds somewhat smaller than the classical universal threshold. The general approach allows outliers in the data to be removed or downweighted, and aspects of such robust techniques are developed and demonstrated in an example. Another natural application is to data that are themselves correlated, where the covariance of the wavelet coefficients is not due to an initial grid transform but is an intrinsic feature of the data. The use of the method in these circumstances is demonstrated by an application to data synthesized in the study of ion channel gating. The basic approach of the paper has many other potential applications, and some of these are discussed briefly.