44,480 Hits in 3.4 sec

Outlier Detection for Text Data : An Extended Version [article]

Ramakrishnan Kannan, Hyenkyun Woo, Charu C. Aggarwal, Haesun Park
2017 arXiv   pre-print
Our approach has significant advantages over traditional methods for text outlier detection.  ...  The problem of outlier detection is extremely challenging in many domains such as text, in which the attribute values are typically non-negative, and most values are zero.  ...  license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.  ... 
arXiv:1701.01325v1 fatcat:ujl4qkjqdjasdosoprxbuhztv4

Novel data stream pattern mining report on the StreamKDD'10 workshop

Margaret H. Dunham, Michael Hahsler, Myra Spiliopoulou
2011 SIGKDD Explorations  
We are grateful to the members of the Program Committee for their thorough and dedicated work:  ...  In [3] the authors outlined the approach they are currently working on for the detection of outliers: they use a density based approach to create an outlier score for an incoming point and decide whether  ...  The best paper of the workshop (see below) was selected for inclusion in this issue, as an extended version. All other workshop papers are available in the ACM Digital Library.  ... 
doi:10.1145/1964897.1964912 fatcat:qi5q742ikzheholwezblhkktfi

A Method to Improve High-Resolution Sea Ice Drift Retrievals in the Presence of Deformation Zones

Jakob Griebel, Wolfgang Dierking
2017 Remote Sensing  
We suggest an adapted detection scheme that identifies linear deformation features (LDFs) in the drift vector field, and detects and replaces outliers after considering the presence of such LDFs in their  ...  Firstly, we extended a reliability assessment proposed in an earlier study, which is based on analyzing texture and correlation parameters of SAR image pairs, with the aim to reject unreliable pattern  ...  We used Copernicus Sentinel-1 data acquired over an area north of Fram Strait in January 2015.  ... 
doi:10.3390/rs9070718 fatcat:qf6o5j7qtreh7ooeuzw7j4hl2e

Noise Reduction and Content Retrieval from Web Pages

Surabhi Lingwal
2013 International Journal of Computer Applications  
This research work proposed an approach for removing the noises from a given web page which will improve the performance of web content mining.  ...  Web contents of different fields which can offer important information to users are available in the Web like multimedia data, structured, semistructured and unstructured data.  ...  Algorithm for Outlier Detection Filter Outliers This operation takes a Dataset as input and returns a new DataSet including only the data that fulfill a condition.  ... 
doi:10.5120/12729-9573 fatcat:ozlsoy2uhjek3jv4tivbipmabm

Text-based over-representation analysis of microarray gene lists with annotation bias

Hui Sun Leong, David Kipling
2009 Nucleic Acids Research  
A major challenge in microarray data analysis is the functional interpretation of gene lists.  ...  We report our explorations of whether ORA can be applied to a wider mining of free-text.  ...  ACKNOWLEDGEMENTS We thank Peter Giles for help with establishment of the web server for these methods.  ... 
doi:10.1093/nar/gkp310 pmid:19429895 pmcid:PMC2699530 fatcat:wepafria2fhrdibmzr3ae43y5e

Classification of Concept Drifting Data Streams Using Adaptive Novel-Class Detection

Aparna Yeshwantrao Ladekar, M. Y. Joshi
2016 International Journal of Computer Engineering in Research Trends  
Practically it is not possible to store and use all data for training purpose whenever required due to infinite length of data streams. Feature evolution frequently occurs in many text streams.  ...  In text streams new features like words or phrases may occur when stream progresses. New classes evolving in the data stream which occurs concept-evolution as a result.  ...  Act miner is extended version of mine class. Act miner addresses four major problem concept evolution, concept drift *14+, limited labeled data instances, and novel-class detection.  ... 
doi:10.22362/ijcert/2016/v3/i9/48901 fatcat:2kvggmvrbbhcleur5lrvmqqtge

Small moving targets detection using outlier detection algorithms

Natasa Reljin, Samantha McDaniel, Dragoljub Pokrajac, Nebojsa Pejcic, Tia Vance, Aleksandar Lazarevic, Longin J. Latecki, Oliver E. Drummond
2010 Signal and Data Processing of Small Targets 2010  
cameras have shown promising results in using outlier detection for detection of small moving targets.  ...  Recent research in motion detection has shown that various outlier detection methods could be used for efficient detection of small moving targets.  ...  Incremental outlier detection algorithms, as a special class of outlier detection algorithms can detect if data is an outlier immediately after the data arrives in the database [5] , which are especially  ... 
doi:10.1117/12.850550 fatcat:panrnsh6jbbfzpurwuifp6fh6e

BDQC: a general-purpose analytics tool for domain-blind validation of Big Data [article]

Eric Deutsch, Roger Kramer, Joseph Ames, Andrew Bauman, David S Campbell, Kyle Chard, Kristi Clark, Mike D'Arcy, Ivo Dinov, Rory Donovan, Ian Foster, Benjamin D Heavner (+13 others)
2018 bioRxiv   pre-print
We have developed a framework for Big Data Quality Control (BDQC) including an extensible set of heuristic and statistical analyses that identify deviations in data without regard to its meaning (domain-blind  ...  Such outliers may be symptoms of technology failure (e.g., truncated output of one step of a pipeline for a single genome) or may reveal unsuspected "signal" in the data (e.g., evidence of aneuploidy in  ...  No outliers were found. 2. Anomalies ('flags') were detected in specific files. In this case, a report is generated summarizing the evidence, as text or optionally as an interactive visualization.  ... 
doi:10.1101/258822 fatcat:qcb2cmwl2zefdpw4ylvrshvgk4

Detecting Errors in Numerical Linked Data Using Cross-Checked Outlier Detection [chapter]

Daniel Fleischhacker, Heiko Paulheim, Volha Bryl, Johanna Völker, Christian Bizer
2014 Lecture Notes in Computer Science  
Outlier detection used for identifying wrong values in data is typically applied to single datasets to search them for values of unexpected behavior.  ...  In a first step, we apply outlier detection methods to the property values extracted from a single repository, using a novel approach for splitting the data into relevant subsets.  ...  If an outlier detected in the first step is only a natural outlier, it does not show up as an outlier in the second step which allows for mitigating the problem of falsely marking natural outliers as wrong  ... 
doi:10.1007/978-3-319-11964-9_23 fatcat:wtbqiki75nexzh6435mvqvz6wm

An Adaptive Image-based Plagiarism Detection Approach

Norman Meuschke, Christopher Gondek, Daniel Seebacher, Corinna Breitinger, Daniel Keim, Bela Gipp
2018 Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries - JCDL '18  
Plagiarism detection systems available for productive use reliably identify copied text, or near-copies of text, but often fail to detect disguised forms of academic plagiarism, such as paraphrases, translations  ...  We propose an adaptive, scalable, and extensible image-based plagiarism detection approach suitable for analyzing a wide range of image similarities that we observed in academic documents.  ...  Our process integrates perceptual hashing, for which we extended the detection capabilities by including an extraction procedure for sub-images.  ... 
doi:10.1145/3197026.3197042 dblp:conf/jcdl/MeuschkeGSBKG18 fatcat:uesb4oemsjdrre5kyn7q5sle6u

A LoOP based outlier detection method for high dimensional fuzzy data set

Alireza Fakharzadeh Jahromi, Fateme Zarei
2017 Journal of Intelligent & Fuzzy Systems  
Despite the importance of fuzzy data and existence of many powerful methods for determining crisp outliers, there are few approaches for identifying outliers in fuzzy database.  ...  Next, by using the left and right scoring defuzzyfied method, a fuzzy data outlier degree is determined. Finally, the efficiency of the method in outlier detection is shown by numerical examples.  ...  LOF is an outlier detection method in which, as being introduced by [2] , it became the basic method in identifying outlier data based on density; indeed, a newer and more developed versions of it were  ... 
doi:10.3233/jifs-151447 fatcat:xre5qpq5wjcpxdxb7towwdj5qu

A robust mean and variance test with application to epigenome-wide association studies [article]

James R Staley, Frank Windmeijer, Matthew Suderman, George Davey Smith, Kate Tilling
2020 bioRxiv   pre-print
Results: The extended Brown-Forsythe test and JLSsc had good statistical properties for both categorical and continuous exposures, without requiring transformation of the methylation levels.  ...  These tests can be used to detect associations not solely driven by a mean effect of the exposure on the outcome.  ...  using an extended 15 version of the Brown-Forsythe test and for jointly testing mean and variability.  ... 
doi:10.1101/2020.02.06.926584 fatcat:whuc2gyfkvdvlinpwpzszm3f6e

Multimedia Semantic Integrity Assessment Using Joint Embedding Of Images And Text

Ayush Jaiswal, Ekraam Sabir, Wael AbdAlmageed, Premkumar Natarajan
2017 Proceedings of the 2017 ACM on Multimedia Conference - MM '17  
Real world multimedia data is often composed of multiple modalities such as an image or a video with associated text (e.g. captions, user comments, etc.) and metadata.  ...  Our method is able to achieve F1 scores of 0.75, 0.89 and 0.94 on MAIM, Flickr30K and MS COCO, respectively, for detecting semantically incoherent media packages.  ...  Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright notation thereon.  ... 
doi:10.1145/3123266.3123385 dblp:conf/mm/JaiswalSAN17 fatcat:aq2sifpg6ncy3os42i5uvlp43a

An Open Corpus of Everyday Documents for Simplification Tasks

David Pellow, Maxine Eskenazi
2014 Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)  
Acknowledgments The authors would like to thank the anonymous reviewers for their detailed and helpful feedback and comments on the paper.  ...  An example of an entry for the Alabama Driver Manual is shown in Description of an Extended Corpus of Everyday Documents To meet the needs described in Section 3 the basic corpus will be extended  ...  Document Fields The extended corpus includes both original documents and their simplified versions.  ... 
doi:10.3115/v1/w14-1210 dblp:conf/acl-pitr/PellowE14 fatcat:xxwvn55gtfezrfslsa72pvxrnu

ICSOutlier: Unsupervised Outlier Detection for Low-Dimensional Contamination Structure

Aurore Archimbaud, Klaus Nordhausen, Anne Ruiz-Gazen
2018 The R Journal  
Detecting outliers in a multivariate and unsupervised context is an important and ongoing problem notably for quality control.  ...  In this particular context, the Invariant Coordinate Selection (ICS) method shows remarkable properties for identifying outliers that lie on a low-dimensional subspace in its first invariant components  ...  The authors wish to thank the editor and the two reviewers for their comments and suggestions which helped improve not only the paper but also the R package.  ... 
doi:10.32614/rj-2018-034 fatcat:4knosz2xfrbxlcluzdjmmbxmky
« Previous Showing results 1 — 15 out of 44,480 results