Local differential privacy for unbalanced multivariate nominal attributes
Human-Centric Computing and Information Sciences
Introduction At present, with the development of various integrated sensors and crowdsensing systems, crowdsourced information from all aspects can be collected and analyzed among various data attributes to better produce rich knowledge about a group, thus benefiting everyone in the crowdsourcing system  . Particularly, with high-dimensional heterogeneous data (data with unbalanced multivariate nominal attributes), there are many hidden rules and much hidden information behind the data that
... an be mined to provide better services for individuals or groups. For example, in the process of providing cloud services, user gender, age, and habits when using operating systems and browsers should be deeply explored to provide different special services to different user groups. In hospital staff, people's historical medical records Abstract Data with unbalanced multivariate nominal attributes collected from a large number of users provide a wealth of knowledge for our society. However, it also poses an unprecedented privacy threat to participants. Local differential privacy, a variant of differential privacy, is proposed to eliminate the privacy concern by aggregating only randomized values from each user, with the provision of plausible deniability. However, traditional local differential privacy algorithms usually assign the same privacy budget to attributes with different dimensions, leading to large data utility loss and high communication costs. To obtain highly accurate results while satisfying local differential privacy, the aggregator needs a reasonable privacy budget allocation scheme. In this paper, the Lagrange multiplier (LM) algorithm was used to transform the privacy budget allocation problem into a problem of calculating the minimum value from unconditionally constrained convex functions. The solution to the nonlinear equation obtained by the Cardano formula (CF) and Newton-Raphson (NS) methods was used as the optimal privacy budget allocation scheme. Then, we improved two popular local differential privacy mechanisms by taking advantage of the proposed privacy budget allocation techniques. Extension simulations on two different data sets with multivariate nominal attributes demonstrated that the scheme proposed in this paper can significantly reduce the estimation error under the premise of satisfying local differential privacy.