Reporting dichotomous data using Logistic Regression in Medical Research: The scenario in developing countries
Nepal Journal of Epidemiology
Nepal Editorial Odds ratio and relative risk have been widely applied in public health and medical research. Clinicians often comment that they are more interested in finding out the risk factors for the diseases they treats in their own country. Many medical research problems call for the analysis and prediction of a dichotomous outcome: whether smokers will have a chance of developing lung cancer, hyperuricemia patients have the risk of getting cardio vascular disease. Traditionally, these
... earch questions were addressed by either ordinary least squares (OLS) regression or linear discriminant function analysis. Both techniques were subsequently found to be less than ideal for handling dichotomous outcomes due to their strict statistical assumptions, i.e. linearity, normality, and continuity for OLS regression and multivariate normality with equal variances and covariances for discriminant analysis     . Logistic regression was proposed as an alternative in the late 1960s and early 1970s 1 , and it became routinely available in statistical packages in the early 1980s. Since that time, the use of logistic regression has increased in all science disciplines. The current wide availability of statistical software applications and good statisticians have resulted in the escalated use of logistic regression. But in developing and under developing countries, its use is low. There are several reasons for this, one being limited knowledge regarding what to expect in an article that uses logistic regression techniques, how to understand the methodology and how to report the results. These destroy the applicability of the research data. Some very good clinical research studies from developing countries might not be using logistic regression and they will not explore the data in the manner of good research methodology. This probably results in lowered quality of reporting and the research article might not be accepted in popular indexed journals. In binary logistic regression, we use dichotomous variables. Two variables used in the logistic regression equation are dependent variables (disease) denoted as Y and independent variable (risk factor) denoted as X. Dichotomous variable is a special case of categorical variable with two outcomes only. Examples of dichotomous variables in Medical fields are in cohort and Clinical trials Y = Cure / no cure, X =Therapy, Other Pt. Variables. In case control studies Y = Case / Control (cancer / non-cancer), X = Risk factors [Age, Sex, Smoking, Occupation]. In cohort studies, Y = MI / No MI, X = Risk factors [Age, Sex, family history etc.]. In the case of looking for a dependence structure, with a dependent variable and a set of explanatory variables (one or more), we can use the logistic regression method. Multiple linear regression may be used to investigate the relationship between a continuous (interval scale) dependent variable, such as Height, Weight, Creatinine, Uric acid and lipid profiles levels. However, socio-demographic and economic variables are very often categorical, rather than interval scale. In many cases, research focuses on models where the dependent variable is categorical. For example, the dependent variable might be Diseased or not otherwise clotting time ≤6 minutes coded as 0 and clotting time >6 minutes coded as 1 (as we saw in Exercise 1) , and we could be interested in how this variable is related to gender, country, blood group, etc. In this case we could not carry out a multiple linear regression as many of the assumptions of this technique will not be met. Instead we would carry out a logistic regression.