A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
PClean: Bayesian Data Cleaning at Scale with Domain-Specific Probabilistic Programming
[article]
2020
arXiv
pre-print
Based on this view, we present PClean, a probabilistic programming language for leveraging dataset-specific knowledge to clean and normalize dirty data. ...
to state-of-the-art data-cleaning systems (unlike generic PPL inference given the same runtime); and scale to real-world datasets with millions of records. ...
This paper presents PClean, a domain-specific generative probabilistic programming language (PPL) for Bayesian data cleaning. ...
arXiv:2007.11838v4
fatcat:navjwv7vpfbzhaq4mugkbucvve
Cleaning Dirty Books: Post-OCR Processing for Previously Scanned Texts
[article]
2021
arXiv
pre-print
The inconsistencies found by aligning pairs of scans of the same underlying work provides training data to build models for detecting and correcting errors. ...
Substantial amounts of work are required to clean large collections of digitized books for NLP analysis, both because of the presence of errors in the scanned text and the presence of duplicate volumes ...
Methods such as PClean work off of Bayesian principles and probabilistic programming to identify likely errors in a specific domain. ...
arXiv:2110.11934v1
fatcat:y5bf6bzykngipdpauzxxhtrcbi
Cleaning Dirty Books: Post-OCR Processing for Previously Scanned Texts
2021
Findings of the Association for Computational Linguistics: EMNLP 2021
unpublished
The inconsistencies found by aligning pairs of scans of the same underlying work provides training data to build models for detecting and correcting errors. ...
Substantial amounts of work are required to clean large collections of digitized books for NLP analysis, both because of the presence of errors in the scanned text and the presence of duplicate volumes ...
Methods such as PClean work off of Bayesian principles and probabilistic programming to identify likely errors in a specific domain. ...
doi:10.18653/v1/2021.findings-emnlp.356
fatcat:usexxnmierglzhyneynjdos7mi
A risk-based maintenance methodology of industrial systems
2017
The 142 technique combined the use of probabilistic data together with objective and subjective data. ...
At the other end of the scale is high volume, continuous manufacturing. ...
Bayesian network modelling also allows these influencing events to change and update depending on the influencing data available at any given time, thus changing the failure rate or probability of failure ...
doi:10.24377/ljmu.t.00005904
fatcat:ijf4no5omzei5k4pstarmqqy34