Population Synthesis Handling Three Geographical Resolutions
ISPRS International Journal of Geo-Information
In this paper, we develop a synthetic population as the first step in implementing an integrated land use/transport model. The model is agent-based, where every household, person, dwelling, and job is treated as an individual object. Therefore, detailed socioeconomic and demographic attributes are required to support the model. The Iterative Proportional Updating (IPU) procedure is selected for the optimization phase. The original IPU algorithm has been improved to handle three geographical
... ee geographical resolutions simultaneously with very little computational time. For the allocation phase, we use Monte Carlo sampling. We applied our approach to the greater Munich metropolitan area. Based on the available data in the control totals and microdata, we selected 47 attributes at the municipality level, 13 attributes at the county level, and 14 additional attributes at the borough level for the city of Munich. Attributes are aggregated at the household, dwelling, and person level. The algorithm is able to synthesize 4.5 million persons in 2.1 million households in less than 1.5 h. Directions regarding how to handle multiple geographical resolutions and how to balance the amount and order of attributes to avoid overfitting are presented. computer science or RAS method in input-output analysis. The main disadvantage of the procedure is that it can only handle one level of aggregation (person or household) and geographical resolution (municipality or county) at each time. Some authors enhanced the procedure by: substituting the n-dimensional array with sparse lists to accommodate a large number of control attributes without exponentially increasing computational requirements ; using two-step IPF to accommodate person level and household level attributes in sequence  ; incorporating more heterogeneity into the initial seed [17, 19] ; and combining IPF with spatial microsimulation  or reweighting IPF results using Iterative Proportional Updating (IPU)  . Recent work has evolved IPF into IPU, which calculates a set of weights for each one of the microdata records in an iterative approach. IPU is capable of closely matching household-level, dwelling-level, and person-level control totals at the same time  and it can accommodate control attributes defined at municipality-level and county-level simultaneously  . As with IPF, IPU is from the static family of models. Other procedures that can handle person and household-level attributes are entropy maximization [16, 22, 23] , hierarchical IPF [24,25], combinatorial optimization [26,27], Monte Carlo Markov Chain [27,28], Hidden Markov Models , or multinomial regression models [30,31]. Most of the procedures are compared to IPF and usually tend to better match observed distributions in multiple dimensions, although the convergence time can be very high . In terms of control attributes, all studies for transportation engineering include at least household size, age, and gender, as summarized in Table A2 . Employment status has been included at the household [4, 5, 11, 21, 27] or person level [8, 13, 26,    . While household income is available for most of the studies in the United States and Canada [4, 7,       26] , it is commonly not included in European countries and Australia [8, 13, 23, 27, 32] . Other variables are the number of cars, number of children, type of dwelling, or ethnicity. Dwelling attributes are less common, with only a few studies including dwelling tenure [4, 9, 34] or dwelling type [4, 11, 32] . The aim of this work is to synthesize the population of the greater Munich metropolitan area. This paper does not intervene in the methodological debate by comparing performance of alternative procedures but rather gathers alternative procedures available and selects one suitable to the case study needs. The available data is limited in several respects, which triggered our need to create a new multiresolution solution. Firstly, person and household attributes are aggregated at the municipality level, but most dwelling attributes are aggregated at the county level. Secondly, the German administrative division classifies the city of Munich as a single municipality-county of 1.3 million inhabitants in 0.7 million households. A higher resolution is required to synthesize demographic and dwelling differences across boroughs. Thirdly, the data do not cover all attribute dimensions of households, individuals, and dwellings that the model requires. Specifically, data on individual income, car availability, land price, or number of bedrooms are missing. The first and second constraints lead to implementing one optimization procedure that can enable control at household, dwelling, and person levels simultaneously and can deal with different geographical resolutions in a reasonable amount of time. The third constraint is not fundamental and results in having a few uncontrolled attributes that are directly copied from the microdata.