Nonparametric Regression for Responses Missing Not at Random
MetadataShow full item record
In case of a nonparametric regression with responses missing at random (MAR), a completecase approach yields an optimal rate and constant for the convergence of the mean integrated square error (MISE). The condition changes remarkably with the introduction of missing not at random (MNAR) technique. If in a problem, the responses are missing not at random, that is, when the probability of missing the value of the responses (known as availability likelihood) depends on the value of the response itself, then consistent estimation of the regression function is not possible with the help of observed data (complete-case) only. Use of a complete-case approach yields estimation of the regression function with respect to a biased conditional density. The only possibility to unlock the information contained in a MNAR sample is by estimating the availability likelihood with the help of some extra sample. If such a sampling is possible, then what will be the size of the extra sample that allows the proposed estimator to match the performance of an oracle or guru estimator that knows all the underlying samples? In other words, what should be the cost of the extra sample? This is the question that has been addressed in this dissertation. With the help of an extra sample of size proportional to the size of the original sample, the proposed estimator has been shown to be minimax.