Compositional space constraints distort C mineralization models of organic residues
EXECUTIVE SUMMARY Carbon mineralization of organic residues in soils is a complex process where labile C is mineralized to end products and stable C accumulates in the soil. The CO2 evolving during C mineralization in soils contributes to global warming while microbial byproducts and stable C contribute to soil aggregation and C sequestration. The capacity of the soil to sequester C physically, chemically and biochemically depends not only on soil minerals but also on biochemical composition of
... ical composition of added organic residues. In C models, biochemical and ash fractions of organic residues are often related to the mineralization of labile C using regression analysis. There are three misconceptions using raw proportions in C models that lead to numerical biases. First, because biochemical fractions are constrained to a close compositional space, their confidence intervals about means may extend to below 0 or above 100 and are thus conceptually meaningless. One fraction is also redundant due to closure: in a composition made of D parts, one component can be deduced from the difference between 100% and the sum of other components, hence leaving D-1 degrees of freedom. Finally, the C forms are often reported on different scales of measurement (fresh, dry or organic matter basis). As a result of redundancy and the various scales of measurements, raw concentration data are spoiled with spurious correlations. Although spurious correlations that were first identified by Karl Pearson in 1897, no solution to this problem has been proposed until recently with the development of compositional data analysis. Spurious correlation may distort linear models such as multiple regression and principal component analysis. Our objective was to show the nature of spurious correlations among raw concentration data of organic residues and to elaborate a compositional model that avoids numerical biases. Isometric log ratio (ilr) coordinates were computed from sequential binary partition of balances between two sub-compositions and were so arranged as to reflect the system under study, i.e. CO2 = f(initial biochemical composition). Using a published dataset, we balanced C and N forms with ash, as well as more labile and more recalcitrant C and N forms. There were D-1 ilr coordinates (orthogonally arranged balances) as imposed by the principle of orthogonality. We showed that there were spurious correlations among components expressed either on dry or organic matter basis, because correlation coefficients differed in magnitude, sign and significance depending on scale of measurement. In contrast, the ilr coordinates were free from spurious correlations by definition because they are orthogonal to each other. The ilr coordinates are also free to range in the real space (±∞) because ratios can be very high or very low depending on numerator and denominator. The R 2 values for equations relating the labile C pool to raw proportions on dry or organic mass bases or to balances were high (0.95) and similar. However, the problem is one of interpretation. The dry and organic mass scales produced different independent variables except . The variable was assigned a negative coefficient on the dry mass scale and a positive one on the organic mass basis while the HEM fraction is part of the C labile pool. One must conclude that regression models on dry or organic mass scales are accurate but incoherent, even contradictory. On the other hand, using balances, the labile C pool was shown to increase with decreasing C/N ratio and increasing ratios between more labile and more recalcitrant C forms, as expected from theory. Compared to using raw concentration data, the relationships between the labile C pool and composition of organic products can be interpreted consistently after recognizing their intrinsic compositional nature. Multivariate analyses should be conducted preferably using the scale-invariant isometric log ratio transformations rather than raw proportions. Balances can be elaborated based on sound theory on C mineralization depending on chemical and biochemical composition of organic products. However, the balances proposed here could be re-arranged or changed upon amalgamation depending on the nature of the data and the hypotheses tested, as long as they remain orthogonal to each other. This paper warns carbon data analysts that the specific numerical properties of compositional data require special log ratio transformation before conducting univariate or multivariate analyses. The ilr approach could provide an unbiased carbon index for the contribution of organic products to greenhouse gases and C sequestration in soils.