Large Sample Theory of Empirical Distributions in Biased Sampling Models

Richard D. Gill, Yehuda Vardi, Jon A. Wellner
1988 Annals of Statistics  
Vardi (1985) introduced an ssample model for biased sampling, gave conditions which guarantee the existence and uniqueness of the nonparametric maximum likelihood estimator Gn of the common underlying distribution G, and discussed numerical methods for calculating the estimator. Here we examine the large sample behaviour of the NPMLE G", including results on uniform consistency of Gn. convergence of Vn(Gn-G) to a Gaussian process, and asymptotic efficiency of Gn as an estimator of G. The proofs
more » ... or of G. The proofs are based upon recent results for empirical processes indexed by sets and functions, properties of irreducible M-matrices, and the homotopy invariance theorem. A final section discusses examples and applications to stratified sampling, 'choice-based' sampling in econometrics, and 'case-control' studies in biostatistics. If support(G) c x+ , then F(r) = G(X+) = so that G+ = G, but in general G (X+) < I and G + =I= G . Also note that the distribution of the data depends on G through G + only, and hence the most we can hope to estimate is G + and w+ Therefore, to ease the notational burden, we henceforth drop the plus sign and write G for G + and W for w+ even when they are not, in fact, equal throughout the remainder of this section and sections 2 ::-4_ The distinction will be made clearly, and the + sign introduced as needed, in the
doi:10.1214/aos/1176350948 fatcat:y4o3jl6nangdbewf3ou36pml54