Nonparametric Regression With the Scale Depending on Auxiliary Covariates and Missing Data
Date
Authors
ORCID
Journal Title
Journal ISSN
Volume Title
Publisher
item.page.doi
Abstract
Nonparametric curve estimation is a powerful statistical methodology which allows estimation of curves with no assumption about their shape. It provides useful insight into the nature of data and may guide further inference for specific parametric models. Considered statistical problem is a nonparametric heteroscedastic regression with auxiliary covariates and missing data. In this regression a univariate component is of the primary interest while the scale function is allowed to be dependent on both the predictor and auxiliary covariates. Missing mechanism is the missing at random (MAR), and two settings with missing responses or missing predictors are considered. The assumed MAR means that the probability of missing may depend on observed variables but not on missing variables. Developed asymptotic theory shows how the heteroscedasticity and MAR mechanism affect the constant of minimax convergence under the mean integrated squared error criterion. Further, it is shown that a procedure ignoring the scale function is not efficient and does not attain a sharp constant in the minimax lower bound. Models of missing responses and predictors are considered separately because their theory and methodology are different. For the case of missing responses, a sharp minimax and data-driven procedure is developed which is based on estimation of an unknown nuisance scale function. The estimator adapts to the MAR response mechanism and unknown smoothness of an underlying regression function. Further, efficiency is still preserved for a more general additive model with auxiliary covariates. A model with MAR predictors is dramatically more involved, and here classic regression estimators are no longer even consistent. For a model with MAR predictors a novel data-driven estimator is suggested which takes into account a scale function. This estimator is adaptive and matches performance of an oracle that knows all underlying nuisance functions. The asymptotic theory is extended to the case of a general additive model as well. The theory and methodology are tested using Monte Carlo simulation studies and real examples. The results favor the proposed methodology and support practical feasibility of the proposed methods for heteroscedastic regressions with missing data.