Bandwidth Selection in Nonparametric Regression with Large Sample Size release_etvsx5n6cjaf7i7xvs447e77pq

by Daniel Barreiro-Ures, Ricardo Cao, Mario Francisco-Fernández

Published in Proceedings (MDPI) by MDPI AG.

2018   Issue 18, p1166

Abstract

In the context of nonparametric regression estimation, the behaviour of kernel methods such as the Nadaraya-Watson or local linear estimators is heavily influenced by the value of the bandwidth parameter, which determines the trade-off between bias and variance. This clearly implies that the selection of an optimal bandwidth, in the sense of minimizing some risk function (MSE, MISE, etc.), is a crucial issue. However, the task of estimating an optimal bandwidth using the whole sample can be very expensive in terms of computing time in the context of Big Data, due to the computational complexity of some of the most used algorithms for bandwidth selection (leave-one-out cross validation, for example, has O ( n 2 ) complexity). To overcome this problem, we propose two methods that estimate the optimal bandwidth for several subsamples of our large dataset and then extrapolate the result to the original sample size making use of the asymptotic expression of the MISE bandwidth. Preliminary simulation studies show that the proposed methods lead to a drastic reduction in computing time, while the statistical precision is only slightly decreased.
In application/xml+jats format

Archived Files and Locations

application/pdf   714.9 kB
file_xxcnvdjmu5agjhgzsvka6deknu
res.mdpi.com (web)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article-journal
Stage   published
Date   2018-09-17
Language   en ?
Proceedings Metadata
Open Access Publication
In DOAJ
In ISSN ROAD
In Keepers Registry
ISSN-L:  2504-3900
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: f72e52c6-abef-4b17-832b-195b52e1023b
API URL: JSON