Odds ratios from logistic, geometric, Poisson, and negative binomial regression models
BMC Medical Research Methodology
The odds ratio (OR) is used as an important metric of comparison of two or more groups in many biomedical applications when the data measure the presence or absence of an event or represent the frequency of its occurrence. In the latter case, researchers often dichotomize the count data into binary form and apply the well-known logistic regression technique to estimate the OR. In the process of dichotomizing the data, however, information is lost about the underlying counts which can reduce the
... precision of inferences on the OR. Methods: We propose analyzing the count data directly using regression models with the log odds link function. With this approach, the parameter estimates in the model have the exact same interpretation as in a logistic regression of the dichotomized data, yielding comparable estimates of the OR. We prove analytically, using the Fisher information matrix, that our approach produces more precise estimates of the OR than logistic regression of the dichotomized data. We also show the gains in precision using simulation studies and real-world datasets. We focus on three related distributions for count data: geometric, Poisson, and negative binomial. Results: In simulation studies, confidence intervals for the OR were 56-65% as wide (geometric model), 75-79% as wide (Poisson model), and 61-69% as wide (negative binomial model) as the corresponding interval from a logistic regression produced by dichotomizing the data. When we analyzed existing datasets using our approach, we found that confidence intervals for the OR could be up to 64% shorter (36% as wide) compared to if the data had been dichotomized and analyzed using logistic regression. Conclusions: More precise estimates of the OR can be obtained directly from the count data by using the log odds link function. This analytic approach is easy to implement in software packages that are capable of fitting generalized linear models or of maximizing user-defined likelihood functions.