Managing data retention policies at scale

Jun Li, Sharad Singhal, Ram Swaminathan, Alan H. Karp
2011 12th IFIP/IEEE International Symposium on Integrated Network Management (IM 2011) and Workshops  
large-scale policy management; compliance and regulatory; data retention; encryption key store; cloud service Compliance with regulatory policies on data remains a key hurdle to cloud computing. Policies such as EU privacy, HIPAA, and PCI-DSS place requirements on data availability, integrity, migration, retention, and access, among many others. This paper proposes a policy management service that offers scalable management of data retention policies attached to data objects stored in a cloud
more » ... vironment. The management service includes a highly available and secure encryption key store to manage the encryption keys of data objects. By deleting the encryption key at a specified retention time associated with the data object, we effectively delete the data object and its copies stored in online and offline environments. To achieve scalability, our service uses Hadoop MapReduce to perform parallel management tasks, such as data encryption and decryption, key distribution and retention policy enforcement. A prototype deployed in a 16-machine Linux cluster currently supports 56 MB/sec for encryption, 76 MB/sec for decryption, 31, 000 retention policies/sec read and 15,000 retention policies/sec write. Abstract-Compliance with regulatory policies on data remains a key hurdle to cloud computing. Policies such as EU privacy, HIPAA, and PCI-DSS place requirements on data availability, integrity, migration, retention, and access, among many others. This paper proposes a policy management service that offers scalable management of data retention policies attached to data objects stored in a cloud environment. The management service includes a highly available and secure encryption key store to manage the encryption keys of data objects. By deleting the encryption key at a specified retention time associated with the data object, we effectively delete the data object and its copies stored in online and offline environments. To achieve scalability, our service uses Hadoop MapReduce to perform parallel management tasks, such as data encryption and decryption, key distribution and retention policy enforcement. A prototype deployed in a 16-machine Linux cluster currently supports 56 MB/sec for encryption, 76 MB/sec for decryption, 31,000 retention policies/sec read and 15,000 retention policies/sec write.
doi:10.1109/inm.2011.5990674 dblp:conf/im/LiSSK11 fatcat:w65v5lzp7jec5oebgvp5grrjd4