66,069 Hits in 7.5 sec

Clustering Relational Database Entities Using K-means

Farid Bourennani, Mouhcine Guennoun, Ying Zhu
2010 2010 Second International Conference on Advances in Databases, Knowledge, and Data Applications  
In this paper, we use K-means and Self-Organizing Maps for simultaneously processing textual and numerical data types by UV.  ...  We evaluate how the HDM-UV improves the clustering results of these two algorithms (SOM, K-means) by comparing them to the traditional homogeneous data processing.  ...  In this paper, the documents are relational database entities. More precisely, these entities are tables' columns in a relational database model.  ... 
doi:10.1109/dbkda.2010.32 dblp:conf/dbkda/BourennaniGZ10 fatcat:shiiypkubvcdtfk77bprhnhbdq

A Bayesian Nonparametric Model for Joint Relation Integration and Domain Clustering

Dazhuo Li, Fahim Mohammad, Eric Rouchka
2010 2010 Ninth International Conference on Machine Learning and Applications  
The approach is applied to clustering various relations in a gene database.  ...  However, discovering structures beyond the entity type level, e.g. clustering over relation concepts, remains a challenging task.  ...  The harmonic mean identity is used to approximate where U k(1) , . . . , U k(S) are S samples drawn from the posterior distribution Pr(U k |R k ).  ... 
doi:10.1109/icmla.2010.168 dblp:conf/icmla/LiMR10 fatcat:l67rpdwvczen3iibs3jgwv7q3u

Candidate gene prioritization using graph embedding [article]

Quan DO, Pierre LARMANDE
2020 bioRxiv   pre-print
We first introduced a dataset of rice genes created from several open-access databases.  ...  Finally, we evaluated the results using link prediction performance and vectors representation using some unsupervised learning techniques.  ...  To evaluate the K-Means clustering, we used the total distance from genes in the cluster to the centroid of the cluster.  ... 
doi:10.1101/2020.02.03.927913 fatcat:lc4juh7mxfctrh5thknqjm34aq

Extracting Entities and Topics from News and Connecting Criminal Records [article]

Quang Pham, Marija Stanojevic, Zoran Obradovic
2020 arXiv   pre-print
The goal of this paper is to summarize methodologies used in extracting entities and topics from a database of criminal records and from a database of newspapers.  ...  In addition, these models had also been used to successfully analyze entities related to people, organizations, and places (D Newman, 2006).  ...  K-MEANS CLUSTERING ON NEWS COLLECTION A clustering procedure for articles based on their contents and topics was designed and tested successfully with the use of TFIDF, K-Means, and DBSCAN algorithms.  ... 
arXiv:2005.00950v1 fatcat:3dkll6fivrbi3mg7oyhwfxz4x4

An automated entity–relationship clustering algorithm for conceptual database design

Madjid Tavana, Prafulla Joglekar, Michael A. Redmond
2007 Information Systems  
Hence, to improve their understandability and manageability, large ER diagrams need to be decomposed into smaller modules by clustering closely related entities and relationships.  ...  Entity-relationship (ER) modeling is a widely accepted technique for conceptual database design.  ...  While separable hierarchical chains can be useful in designing semantically meaningful clusters in relational databases, when numerous hierarchical chains overlap, and boundaries between overlapping chains  ... 
doi:10.1016/ fatcat:7rz64fo7mbdyph47rgc6jnvlri

Text and Data Quality Mining in CRIS

2019 Information  
Using TDM helps to better identify and eliminate errors, improve the process, develop the business, and make informed decisions. In addition, TDM increases understanding of the data and its context.  ...  There are six steps that describe the k-means algorithm: 1. Distribute all documents on k clusters. 2. Compute the mean vector for each cluster using the following formula.  ...  There are six steps that describe the k-means algorithm: 1. Distribute all documents on k clusters. 2. Compute the mean vector for each cluster using the following formula. ( ) = ( − ) (5) 3.  ... 
doi:10.3390/info10120374 fatcat:rxjucuqthna45bq5om2bkz7lbu

DB2SNA: An All-in-One Tool for Extraction and Aggregation of Underlying Social Networks from Relational Databases [chapter]

Rania Soussi, Etienne Cuvelier, Marie-Aude Aufaure, Amine Louati, Yves Lechevallier
2012 Lecture Notes in Social Networks  
Then, we aggregate the resulting network using the k-SNAP algorithm which produces a summarized graph.  ...  The existing approaches focus on social networks extraction using web document. However a considerable amount of information is stored in relational databases.  ...  Relations (attributes, relations between entities) are generally represented in these models by means of labeled edges.  ... 
doi:10.1007/978-3-7091-1346-2_23 dblp:series/lnsn/SoussiCALL13 fatcat:zhpitabhzfcwnpjg52dqn5ux3i

An Efficient Semantic Ranked Keyword Search of Big Data Using Map Reduce

P. Srinivasa Rao, M. H. M. Krishna Prasad, K. Thammi Reddy
2015 International Journal of Database Theory and Application  
Information retrieval is fast becoming the prevailing form of information access, surpassing traditional database style searching.  ...  A system with ontology mimics the real world, where every task is laced with certain meaning as this is basic idea behind knowledge processing.  ...  K-means clustering algorithm is used to partition the terms associate with related concepts into k-equivalence classes. Now concepts and synonym set are assigned to each concept using word net.  ... 
doi:10.14257/ijdta.2015.8.6.05 fatcat:m3ija7oxyba7nkgpbvd2ghicwq

Research on Medical Question Answering System Based on Knowledge Graph

Zhixue Jiang, Chengying Chi, Yunyun Zhan
2021 IEEE Access  
This system locates the medical field, uses crawler technology to use vertical medical websites as data sources, and uses diseases as the core entity to construct a knowledge graph containing 44,000 knowledge  ...  It is stored in the Neo4j graph database, using rule-based matching methods and string-matching algorithms to construct a domain lexicon to classify and query questions.  ...  It proposes an improved k-means algorithm (k-means max-min) for entity clustering disambiguation to deal with almost the same text representation but different text meanings in a large number of texts.  ... 
doi:10.1109/access.2021.3055371 fatcat:2a5txdlbmjh3hcya42mtr24ycq

A Practioner's Guide to Evaluating Entity Resolution Results [article]

Matt Barnes
2015 arXiv   pre-print
Entity resolution (ER) is the task of identifying records belonging to the same entity (e.g. individual, group) across one or multiple databases.  ...  Some of these metrics are borrowed from multi-class classification and clustering domains, though some key differences exist differentiating entity resolution from general clustering.  ...  The harmonic mean of these metrics leads to the most frequently used entity resolution metric, pairwise F 1 .  ... 
arXiv:1509.04238v1 fatcat:2ppaqd67qrac5phbxevp33nalm

Knowledge Assisted Analysis and Categorization for Semantic Video Retrieval [chapter]

Manolis Wallace, Thanos Athanasiadis, Yannis Avrithis
2004 Lecture Notes in Computer Science  
During retrieval, the context of the query is used to clarify the exact meaning of the query terms and to meaningfully guide the process of query expansion and index matching.  ...  We follow a fuzzy relational approach to knowledge representation, based on which we define and extract the context of either a multimedia document or a user query.  ...  topics that are related to a document d requires that the set of semantic entities that are related to it are clustered, according to their common meaning.  ... 
doi:10.1007/978-3-540-27814-6_65 fatcat:cyqtnkjt3rfonjluxu7vgclv2i

Catching the Drift – Indexing Implicit Knowledge in Chemical Digital Libraries [chapter]

Benjamin Köhncke, Sascha Tönnies, Wolf-Tilo Balke
2012 Lecture Notes in Computer Science  
However, since such clusters are generally too unspecific, containing chemical entities from different chemical classes, we further divide them into sub-clusters using fingerprint based similarity measures  ...  The restriction of each chemical class is somehow also related to the entities' reaction behavior, but further based on the chemist's implicit knowledge.  ...  Top-100 (left), Top-1000 (right) Since we are using k-means clustering we also had to find a suitable k. The aim is that each entity in a cluster has the same chemical class.  ... 
doi:10.1007/978-3-642-33290-6_41 fatcat:q7teqgc6wfdbxk7b2e4wdkftja

Identifying synonymy between relational phrases using word embeddings

Nhung T.H. Nguyen, Makoto Miwa, Yoshimasa Tsuruoka, Satoshi Tojo
2015 Journal of Biomedical Informatics  
We then apply the k-means algorithm on top of the distributional representations to cluster the phrases.  ...  Most of the previous work that has addressed this task of synonymy resolution uses similarity metrics between relational phrases based on textual strings or dependency paths, which, for the most part,  ...  For each value of k, we run k-means with 10 random seeds and calculate the mean scores.  ... 
doi:10.1016/j.jbi.2015.05.010 pmid:26004792 fatcat:coaa6fw2yvd7jfjm35dqaqs44y

Theoretical Limits of Record Linkage and Microclustering [article]

James E. Johndrow, Kristian Lum, David B. Dunson
2017 arXiv   pre-print
There has been substantial recent interest in record linkage, attempting to group the records pertaining to the same entities from a large database lacking unique identifiers.  ...  This can be viewed as a type of "microclustering," with few observations per cluster and a very large number of clusters.  ...  We used identical priors µ k ∼ φ(0, 9) on the means for each component.  ... 
arXiv:1703.04955v1 fatcat:yoseisgeazdyrn4vrqrikzrnk4

Cluster-based concept invention for statistical relational learning

Alexandrin Popescul, Lyle H. Ungar
2004 Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '04  
We use clustering to derive new relations which augment database schema used in automatic generation of predictive features in statistical relational learning.  ...  More importantly, entities derived from clusters increase the expressivity of feature spaces by creating new first-class concepts which contribute to the creation of new features.  ...  We use k-means to derive cluster relations; any other hard clustering algorithm can be used for this purpose.  ... 
doi:10.1145/1014052.1014137 dblp:conf/kdd/PopesculU04 fatcat:tim4fe6oife5ll4yjoiragdaaa
« Previous Showing results 1 — 15 out of 66,069 results