The Epidemic Behavior of the Hepatitis C Virus

O. G. Pybus
2001 Science  
Hepatitis C virus (HCV) is a leading worldwide cause of liver disease. Here, we use a new model of HCV spread to investigate the epidemic behavior of the virus and to estimate its basic reproductive number from gene sequence data. We find significant differences in epidemic behavior among HCV subtypes and suggest that these differences are largely the result of subtype-specific transmission patterns. Our model builds a bridge between the disciplines of population genetics and mathematical
more » ... iology by using pathogen gene sequences to infer the population dynamic history of an infectious disease. An estimated 170 million people worldwide are at risk of liver cirrhosis and liver cancer due to chronic infection with HCV (1). The virus is responsible for 10,000 deaths per year in the United States, and this rate is expected to increase substantially in the next two decades (2). HCV is a rapidly evolving single-stranded positivesense RNA virus that exhibits enormous genetic diversity. It is classified into six types (labeled 1 through 6) and numerous subtypes (labeled 1a, 1b, etc.), which differ in diversity, geographical distribution, and transmission route (3). Subtypes appear to differ in treatment response, although their role in variation of disease progression is unclear (2, 4 ). Any successful HCV vaccination or control strategy, therefore, requires an understanding of the nature and variability of epidemic behavior among subtypes. HCV was first isolated in 1989, and knowledge of its long-term epidemiology before that date is limited. Highly divergent strains have been found in restricted geographic areas such as West Africa and Southeast Asia, suggesting a long period of infection in these regions. In contrast, several glo-bally prevalent subtypes are much less divergent, indicating a recent worldwide spread of these strains (5-7) . We investigate HCV epidemiology using coalescent theory, a population genetic model that describes how the demographic history of a population determines the ancestral relationships of individuals sampled from it (8, 9) . Phylogenies reconstructed from contemporary HCV gene sequences contain information about past population dynamics and can, therefore, be used to infer viral epidemic behavior (10). We also demonstrate one way in which the fundamental epidemiological quantity R 0 (the basic reproductive number of a pathogen) can be estimated from gene sequences. R 0 represents the average number of secondary infections generated by one primary case in a susceptible population and can be used to estimate the level of immunization or behavioral change required to control an epidemic (11). The framework of coalescent theory allows us to estimate N(t), a continuous function that represents the effective number of infections at time t. Time t is zero at the present and increases into the past, hence N(0) is the effective number of infections at the present. N(t) can be considered as the inbreeding effective population size of the viral epidemic (12). Previous viral coalescent studies have used simple models for N(t), specifically, constant population size and ex-ponential growth (13, 14) . A more appropriate approach, which we use here, is to develop a basic epidemiological model, from which a suitable form for N(t) is obtained. Because there is little protection against HCV reinfection (15) and vertical transmission is rare, its epidemic spread can be represented by where y is the proportion of the at-risk population that is infected and D is the average duration of infectiousness. B is a combination of parameters relating the force of infection (the per capita rate of acquisition of infection) to the prevalence of infection. In this model, R 0 ϭ BD and equilibrium prevalence is 1 -(1/R 0 ). A time-reversed version of Eq. 1 was solved for y and then transformed into effective population size using the relation N(t) ϭ N(0) [y(t)/y (0) ]. The resulting demographic model is r is the growth rate achieved in a wholly susceptible population, c is a logistic shape parameter, and k is the constant of integration. Note that B, D, and k cannot be separated. Given a molecular phylogeny reconstructed from contemporary viral gene sequences (16), it is possible to estimate N(0), r, and c within a maximum likelihood (ML) framework (17). Because reconstructed phylogenies represent time in units of nucleotide substitutions per site, some parameters are estimated as functions of the substitution rate (18). These parameters can be transformed back into their natural units using the substitution rate of the viral gene concerned. We estimated HCV substitution rates by reanalyzing gene sequences sampled in 1995 from individuals who were infected by a single batch of antibody to rhesus D 17 years earlier (19-21). The above methods were used to investi-
doi:10.1126/science.1058321 pmid:11423661 fatcat:x3weric2hzfttpwq3dw4wuizb4