A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit <a rel="external noopener" href="http://www.cs.cmu.edu:80/~jbhatia/papers/jbhatia_TOSEM2016.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
<i title="Association for Computing Machinery (ACM)">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/7dwkv5m7lfbbpi6he2w2suk6b4" style="color: black;">ACM Transactions on Software Engineering and Methodology</a>
Privacy policies describe high-level goals for corporate data practices, and regulators require industries to make available conspicuous, accurate privacy policies to their customers. Consequently, software requirements must conform to those privacy policies. To help stakeholders extract privacy goals from policies, we introduce a semi-automated framework that combines crowdworker annotations, natural language typed dependency parses and a reusable lexicon to improve goal extraction coverage,<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/2907942">doi:10.1145/2907942</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/qdw3tplfk5d3hklnomsa6jpwey">fatcat:qdw3tplfk5d3hklnomsa6jpwey</a> </span>
more »... ecision and recall. The framework evaluation consists of a five policy corpus governing web and mobile information systems yielding an average precision of 0.73 and recall of 0.83. The results show that no single framework element alone is sufficient to extract goals, however the overall framework compensates for elemental limitations: human annotators are highly adaptive at discovering annotations in new texts, but those annotations can be inconsistent and incomplete; dependency parsers lack sophisticated, tacit knowledge, but they can perform exhaustive text search for prospective requirements indicators; and while the lexicon may never completely saturate, the lexicon terms can be reliably used to improve recall. Lexical reuse reduces false negatives by 41%, increasing the average recall to 0.85. Lastly, crowd workers were able to identify and remove false positives by around 80%, which improves average precision to 0.93.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170428190821/http://www.cs.cmu.edu:80/~jbhatia/papers/jbhatia_TOSEM2016.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/46/62/46621e89036d957f7a14978a90c229aeb0eb0077.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/2907942"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>