Filters








254 Hits in 1.4 sec

MyLifeBits

Jim Gemmell, Gordon Bell, Roger Lueder
2006 Communications of the ACM  
RSONAL DATABASE EVERYTHING By Jim Gemmell, Gordon Bell, and Roger LuederBY J A I M E T E E -V A N , W I L L I A M J O N E S , A N D B E N J A M I N B .  ... 
doi:10.1145/1107458.1107460 fatcat:agbmiflsorh2zilwckcv523prm

Improving Entity Resolution with Global Constraints [article]

Jim Gemmell and Benjamin I. P. Rubinstein and Ashok K. Chandra
2011 arXiv   pre-print
Some of the greatest advances in web search have come from leveraging socio-economic properties of online user behavior. Past advances include PageRank, anchor text, hubs-authorities, and TF-IDF. In this paper, we investigate another socio-economic property that, to our knowledge, has not yet been exploited: sites that create lists of entities, such as IMDB and Netflix, have an incentive to avoid gratuitous duplicates. We leverage this property to resolve entities across the different web
more » ... and find that we can obtain substantial improvements in resolution accuracy. This improvement in accuracy also translates into robustness, which often reduces the amount of training data that must be labeled for comparing entities across many sites. Furthermore, the technique provides robustness when resolving sites that have some duplicates, even without first removing these duplicates. We present algorithms with very strong precision and recall, and show that max weight matching, while appearing to be a natural choice turns out to have poor performance in some situations. The presented techniques are now being used in the back-end entity resolution system at a major Internet search engine.
arXiv:1108.6016v1 fatcat:islnpt7vvfdq3gdqpakcqtbxym

Scaling Multiple-Source Entity Resolution using Statistically Efficient Transfer Learning [article]

Sahand Negahban, Benjamin I. P. Rubinstein, Jim Gemmell
2012 arXiv   pre-print
We consider a serious, previously-unexplored challenge facing almost all approaches to scaling up entity resolution (ER) to multiple data sources: the prohibitive cost of labeling training data for supervised learning of similarity scores for each pair of sources. While there exists a rich literature describing almost all aspects of pairwise ER, this new challenge is arising now due to the unprecedented ability to acquire and store data from online sources, features driven by ER such as
more » ... search verticals, and the uniqueness of noisy and missing data characteristics for each source. We show on real-world and synthetic data that for state-of-the-art techniques, the reality of heterogeneous sources means that the number of labeled training data must scale quadratically in the number of sources, just to maintain constant precision/recall. We address this challenge with a brand new transfer learning algorithm which requires far less training data (or equivalently, achieves superior accuracy with the same data) and is trained using fast convex optimization. The intuition behind our approach is to adaptively share structure learned about one scoring problem with all other scoring problems sharing a data source in common. We demonstrate that our theoretically motivated approach incurs no runtime cost while it can maintain constant precision/recall with the cost of labeling increasing only linearly with the number of sources.
arXiv:1208.1860v1 fatcat:qb7asbtwtngnnfyotvmcmpys5u

Principled Graph Matching Algorithms for Integrating Multiple Data Sources [article]

Duo Zhang and Benjamin I. P. Rubinstein and Jim Gemmell
2014 arXiv   pre-print
Gemmell is with Trōv, USA.  ... 
arXiv:1402.0282v1 fatcat:ug3go42r7nhjneojy3xmcyojfq

MyLifeBits

Jim Gemmell, Gordon Bell, Roger Lueder, Steven Drucker, Curtis Wong
2002 Proceedings of the tenth ACM international conference on Multimedia - MULTIMEDIA '02  
ACKNOWLEDGMENTS We are grateful for comments from Jim Gray and Ted Nelson. Victoria Rozycki helped digitize Gordon Bell's personal media. Figure 1 : 1 Timeline view of query results.  ... 
doi:10.1145/641007.641053 dblp:conf/mm/GemmellBLDW02 fatcat:zx5porakhrhmdn3krlt3iqtjxe

A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration [article]

Bo Zhao, Benjamin I. P. Rubinstein, Jim Gemmell, Jiawei Han
2012 arXiv   pre-print
In practical data integration systems, it is common for the data sources being integrated to provide conflicting information about the same entity. Consequently, a major challenge for data integration is to derive the most complete and accurate integrated records from diverse and sometimes conflicting sources. We term this challenge the truth finding problem. We observe that some sources are generally more reliable than others, and therefore a good model of source quality is the key to solving
more » ... he truth finding problem. In this work, we propose a probabilistic graphical model that can automatically infer true records and source quality without any supervision. In contrast to previous methods, our principled approach leverages a generative process of two types of errors (false positive and false negative) by modeling two different aspects of source quality. In so doing, ours is also the first approach designed to merge multi-valued attribute types. Our method is scalable, due to an efficient sampling-based inference algorithm that needs very few iterations in practice and enjoys linear time complexity, with an even faster incremental variant. Experiments on two real world datasets show that our new method outperforms existing state-of-the-art approaches to the truth finding problem.
arXiv:1203.0058v1 fatcat:3pllovnmlzaijdnt2vqjmzf5nu

Principles of delay-sensitive multimedia data storage retrieval

Jim Gemmell, Stavros Christodoulakis
1992 ACM Transactions on Information Systems  
Gemmell and S. Christodoulakis Now consider the buffer allocation requirements.  ...  Gemmell and S. Christodoulakis introduced, and with rt > rC, there exists a playback algorithm using 'b=lk+d+l+' (29) sector-sized buffers.  ... 
doi:10.1145/128756.128758 fatcat:77udf6oirje7josnxbhea55ooq

MyLifeBits

Jim Gemmell, Gordon Bell, Roger Lueder, Steven Drucker, Curtis Wong
2002 Proceedings of the tenth ACM international conference on Multimedia - MULTIMEDIA '02  
ACKNOWLEDGMENTS We are grateful for comments from Jim Gray and Ted Nelson. Victoria Rozycki helped digitize Gordon Bell's personal media. Figure 1 : 1 Timeline view of query results.  ... 
doi:10.1145/641043.641053 fatcat:2rrzq6ea5nctdfeixkptbmzuoy

The MyLifeBits lifetime store

Jim Gemmell, Roger Lueder, Gordon Bell
2003 Proceedings of the 2003 ACM SIGMM workshop on Experiential telepresence - ETP '03  
aspects of MyLifeBits that remained to b studied, including: privacy (and sc specification), effective communication of my life to others, social issues and user interface issues. [3] NOWLEDGMENTS M Jim  ... 
doi:10.1145/982484.982500 fatcat:spjzzbht2bgbrdteqbb7e32ta4

A Bayesian approach to discovering truth from conflicting sources for data integration

Bo Zhao, Benjamin I. P. Rubinstein, Jim Gemmell, Jiawei Han
2012 Proceedings of the VLDB Endowment  
In practical data integration systems, it is common for the data sources being integrated to provide conflicting information about the same entity. Consequently, a major challenge for data integration is to derive the most complete and accurate integrated records from diverse and sometimes conflicting sources. We term this challenge the truth finding problem. We observe that some sources are generally more reliable than others, and therefore a good model of source quality is the key to solving
more » ... he truth finding problem. In this work, we propose a probabilistic graphical model that can automatically infer true records and source quality without any supervision. In contrast to previous methods, our principled approach leverages a generative process of two types of errors (false positive and false negative) by modeling two different aspects of source quality. In so doing, ours is also the first approach designed to merge multi-valued attribute types. Our method is scalable, due to an efficient sampling-based inference algorithm that needs very few iterations in practice and enjoys linear time complexity, with an even faster incremental variant. Experiments on two real world datasets show that our new method outperforms existing state-ofthe-art approaches to the truth finding problem.
doi:10.14778/2168651.2168656 fatcat:z376bflf4za3vaesddqndmweuq

Challenges in using lifetime personal information stores

Gordon Bell, Jim Gemmell, Roger Lueder
2004 Proceedings of the 27th annual international conference on Research and development in information retrieval - SIGIR '04  
Extended Abstract Within five years, our personal computers with terabyte disk drives will be able to store everything we read, write, hear, and many of the images we see including video. Vannevar Bush outlined such a system in his famous 1945 Memex article [1] .
doi:10.1145/1008992.1008993 dblp:conf/sigir/BellGL04 fatcat:7lvfj5ccdjcudotthjm4a2iale

Digital memories in an era of ubiquitous computing and abundant storage

Mary Czerwinski, Douglas W. Gage, Jim Gemmell, Catherine C. Marshall, Manuel A. Pérez-Quiñones, Meredith M. Skeels, Tiziana Catarci
2006 Communications of the ACM  
Storage for digital memories has been explored by groups like MIT's Haystack (haystack.lcs.mit.edu/), Microsoft's Stuff I've Seen (research.microsoft.com/adapt/sis/), and MyLifeBits (see the article by Gemmell  ... 
doi:10.1145/1107458.1107489 fatcat:6tevuqjyxvbj3oagbqlxli7zey

Planz to put our digital information in its place

William Jones, Dawei Hou, Bhuricha Deen Sethanandha, Sheng Bi, Jim Gemmell
2010 Proceedings of the 28th of the international conference extended abstracts on Human factors in computing systems - CHI EA '10  
Planz provides a single, integrative document-like overlay to a folder hierarchy through the dynamic, ondemand assembly of XML fragments. This overlay provides a context in which to create or reference not only files but also email messages, web pages and informal notes. This paper describes an evaluation of Planz over a period of several days during which participants compared their experiences on two projects -one involving "status quo" methods, a second project involving Planz. Also
more » ... is an architecture that extends on the front-end to provide additional overlays and on the back-end in support of additional information stores. Work on Planz is guided by a vision of "structural integrity": Many tools, many modes of interaction applied to a common structure for the organization of and access to personal information.
doi:10.1145/1753846.1753866 dblp:conf/chi/JonesHSBG10 fatcat:yiqqeuyyezc4zorythsozm5xy4

Passive capture and ensuing issues for a personal lifetime store

Jim Gemmell, Lyndsay Williams, Ken Wood, Roger Lueder, Gordon Bell
2004 Proceedings of the the 1st ACM workshop on Continuous archival and retrieval of personal experiences - CARPE'04  
Jim Gray has provided many helpful suggestions. Figure 2 - 2 RSVP viewer for SenseCam images. The slider below the large image controls speed/direction.  ... 
doi:10.1145/1026653.1026660 fatcat:5lzxzycpmndcnaxf46sheoalla

Capturing digital lives

Fred Turner
2009 Nature  
Pioneering computer scientist Gordon Bell and his Microsoft colleague Jim Gemmell take a libertarian view in Total Recall. Digital media will free us to dip back into the past at will, they argue.  ...  Gemmell, has developed a suite of digital tools for recording, storing and searching everything from old family photographs to kerbside chats.  ... 
doi:10.1038/4611206a fatcat:d7o2znl5gvccne2clzfg5nsqgq
« Previous Showing results 1 — 15 out of 254 results