A Balls-and-Bins Model of Trade
Roc Armenter, Miklós Koren
The American Economic Review
A number of stylized facts have been documented about the extensive margin of tradewhich rms export, and how many products they send to how many destinations. We argue that the sparse nature of trade data is crucial to understanding these stylized facts. Trade data are collected through customs forms, one for each export shipment, specifying the country of destination and the product code. Typically the number of observationsthat is, total shipmentsis low relative to the number of possible
... icationse.g., countries and product codes. Given the sparse data, we note that some of the reported facts would be expected to arise even if exports shipments were randomly allocated across classications. These facts are thus not informative of the underlying economic decisions. We propose a statistical model to account for the sparsity of trade data. We formalize the assignment of shipments to categories as balls falling into bins. The balls-and-bins model quantitatively reproduces the prevalence of zero product-level trade ows across export destinations. The model also accounts for rm-level facts: as in the data, most rms export a single product to a single country but these rms represent a tiny fraction of total exports. In contrast, the balls-and-bins cannot reproduce the small fraction of exporters among U.S. rms, and overpredicts their size premium relative to non-exporters. We argue that the balls-and-bins model is a useful statistical tool to discern the interesting facts in disaggregated trade data from patterns arising mechanically through chance. * For useful comments we thank Arnaud Costinot, 1 have become available and have had an enormous impact on the eld. This has spurred a fast-growing research that documents the extensive margin in tradewhich rms export, and how many products they send to how many destinations. This, in turn, has lead to new theories in international trade. A number of stylized facts have been documented about the extensive margin of trade: (1) Most product-level trade ows across countries are zero; (2) the incidence of non-zero trade ows follows a gravity equation; (3) only a small fraction of rms export; (4) exporters are larger than non-exporters; (5) most rms export a single product to a single country; (6) most exports are done by multi-product, multi-destination exporters. 1 These facts have proven to be very robust across datasets from various years in various countries. We argue that the sparse nature of trade data is crucial to understanding these stylized facts. Trade data are collected through customs forms, one for each export shipment, specifying the country of destination and the product code. Typically the number of observations that is, total shipmentsis low relative to the number of possible classicationscountry and product code pairs. For example, there were about 24 million shipments originating in the U.S. in 2000. However, there are 229 countries and 8,867 product codes with active trade, so a shipment can have more than 2 million possible classications. We should then not be surprised to observe empty categories, or to learn that the U.S. does not export all products to all countries. Given the sparsity of the data, how do we interpret a missing trade ow? Take the example of vessels for passenger and freight transport. Switzerland did not import a vessel from the United States in 2005. Being a landlocked country, it probably never will. At the same time, 130 of the 188 coastal countries did not import a vessel either: they have a positive demand for American vessels yet do not buy one every year. In this paper we propose a statistical model to account for the sparsity of trade data. We formalize the assignment of shipments to categories as balls falling into bins. Each shipment constitutes a discrete unit (the ball), which, in turn, is allocated into mutually exclusive categories (the bins). This structure is inherent to disaggregate trade data: we observe a given number of shipments and each of them is classied into a unique category. Because we want an atheoretical account of the sparsity of the data, the model assigns balls to bins at random. That is, a ball falling in a particular bin is an independent and identically distributed random event whose probability distribution is determined solely by the distribution of bin sizes. In spite of its simplicity, the balls-and-bins model has a rich set of predictions. After a number of balls, some bins may end up empty and some will not. Among the latter some will contain a large number of balls, some few. These are taken to be the model's predictions for the extensive and intensive margin, respectively. We can derive analytically the relevant moments. Given a number of balls and a bin size distribution, we show how to compute the 1 The following is a necessarily incomplete list of references. Helpman, and Baldwin and Harrigan (2007) for facts 1 and 2; Haveman and Hummels (2004) and Hummels and Klenow (2001, 2005) for fact 1; Bernard and Jensen (1999) and for facts 3 and 4; Bernard, Jensen and Schott (2007) for facts 3 to 6; Bernard, Jensen, Redding and Schott (2007) for facts 2 to 6; and Eaton, Kramarz (2004, 2007) for facts 5 and 6. See the main text and the Appendix for further discussion.