Assessing methods--descriptive statistics and graphs
S M Gore
1981
BMJ (Clinical Research Edition)
In his 1981 presidential address to the Royal Statistical Society, Professor D R Cox said: "The setting out of conclusions in a way that is vivid, simple, accurate, and integrated with subjectmatter considerations is a very important part of statistical analysis." This series is about how to set out conclusions in medical papers and answers questions about when particular statistical methods are appropriate-and the hidden snags in using them. Types of problems that ought to be referred to a
more »
... istician are also discussed. This article reviews descriptive statistics-mean, median, and mode, variance and interquartile range-explaining how a little detective work using these summary measures reveals a great deal about the distribution of observations even if a fully informative table or graph is not presented. It is usually helpful to use graphs to present data, and some reminders about how to do this are given-in particular, scattergrams are recommended. (1) From the tablefind the most likely (modal) survival time and estimate median survival for the 347 patients with breast cancer who were referred to the department of radiotherapy, Edinburgh, in 1956. -the most likely survival time is less than one year -50% of patients survived for at least four years -the difference (mean-median) is a crude measure of skewness -measures of dispersion (variance, interquartile range) are needed to qualify central measures such as mean and median COMMENT Authors are advised to summarise important aspects of their data-location, dispersion, skewness-by reporting descriptive statistics such as mean, median, mode, variance, range, percentiles, but too often the summary is presented at the expense of informative tables or graphs. The reader is left to infer from the summary the shape of the underlying distribution. Some guidelines are given for doing this. Measures of location (or centre) are mean, median, and mode. Mean and median coincide for symmetric distributions. The sum of the observations divided by the number of observations estimates the true mean, and the value above which 50% of the observations lie estimates the median of the distribution. When the estimated mean and median are dissimilar and sample size -which should always be reported-is moderate the reader can deduce that the underlying distribution is asymmetric or skewed. From the table notice that 174 (50O%) of the 347 patients survived for at least four years after diagnosis of breast cancer. Mean survival exceeded seven years. The difference (mean-Survival of 347 patients with breast cancer
doi:10.1136/bmj.283.6289.486
fatcat:exnwsyq5kvbt5knckhordysypq