39 Hits in 1.2 sec

A Survey on PageRank Computing

Pavel Berkhin
2005 Internet Mathematics  
Berkhin: A Survey on PageRank Computing 87 Contrary to some references, const = (1 − c) but depends on a sink term in Equation (2.8).  ...  The authority weight of a page is an aggregated significance of the hubs that point to it ("beauty lies in the eye of the beholder"), while the hub weight of Berkhin: A Survey on PageRank Computing  ... 
doi:10.1080/15427951.2005.10129098 fatcat:ahbwdzfelzbfrcpzsoqae4w2za

Bookmark-Coloring Algorithm for Personalized PageRank Computing

Pavel Berkhin
2006 Internet Mathematics  
We introduce a novel bookmark-coloring algorithm (BCA) that computes authority weights over the web pages utilizing the web hyperlink structure. The computed vector (BCV) is similar to the PageRank vector defined for a page-specific teleportation. Meanwhile, BCA is very fast, and BCV is sparse. BCA also has important algebraic properties. If several BCVs corresponding to a set of pages (called hub) are known, they can be leveraged in computing arbitrary BCV via a straightforward algebraic
more » ... s and hub BCVs can be efficiently computed and encoded.
doi:10.1080/15427951.2006.10129116 fatcat:2ggnlhicrzdqzfpfzvmeb6p3uu

Evaluation of Explore-Exploit Policies in Multi-result Ranking Systems [article]

Dragomir Yankov, Pavel Berkhin, Lihong Li
2015 arXiv   pre-print
We analyze the problem of using Explore-Exploit techniques to improve precision in multi-result ranking systems such as web search, query autocompletion and news recommendation. Adopting an exploration policy directly online, without understanding its impact on the production system, may have unwanted consequences - the system may sustain large losses, create user dissatisfaction, or collect exploration data which does not help improve ranking quality. An offline framework is thus necessary to
more » ... et us decide what policy and how we should apply in a production environment to ensure positive outcome. Here, we describe such an offline framework. Using the framework, we study a popular exploration policy - Thompson sampling. We show that there are different ways of implementing it in multi-result ranking systems, each having different semantic interpretation and leading to different results in terms of sustained click-through-rate (CTR) loss and expected model improvement. In particular, we demonstrate that Thompson sampling can act as an online learner optimizing CTR, which in some cases can lead to an interesting outcome: lift in CTR during exploration. The observation is important for production systems as it suggests that one can get both valuable exploration data to improve ranking performance on the long run, and at the same time increase CTR while exploration lasts.
arXiv:1504.07662v1 fatcat:72v5xrckzzczdjqulzwud4ufza


Michael R. Evans, Dragomir Yankov, Pavel Berkhin, Pavel Yudin, Florin Teodorescu, Wei Wu
2017 Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '17  
Image search is a popular application on web search engines. Issuing a location-related query in image search engines o en returns multiple images of maps among the top ranked results. Traditionally, clicking on such images either opens the image in a new browser tab or takes users to a web page containing the image. However, nding the area of intent on an interactive web map is a manual process. In this paper, we describe a novel system, LiveMaps, for analyzing and retrieving an appropriate
more » ... viewport for a given image of a map. is allows annotation of images of maps returned by image search engines, allowing users to directly open a link to an interactive map centered on the location of interest. LiveMaps works in several stages. It rst checks whether the input image represents a map. If yes, then the system a empts to identify what geographical area this map image represents. In the process, we use textual as well as visual information extracted from the image. Finally, we construct an interactive map object capturing the geographical area inferred for the image. Evaluation results on a dataset of high ranked location images indicate our system constructs very precise map representations also achieving good levels of coverage.
doi:10.1145/3077136.3080673 dblp:conf/sigir/EvansYBYTW17 fatcat:zoeobgh6b5f4rl7h2tdeyov7ee

Learning Simple Relations: Theory and Applications [chapter]

Pavel Berkhin, Jonathan D. Becher
2002 Proceedings of the 2002 SIAM International Conference on Data Mining  
In addition to classic clustering algorithms, many different approaches to clustering are emerging for objects of special nature. In this article we deal with the grouping of rows and columns of a matrix with non-negative entries. Two rows (or columns) are considered similar if corresponding cross-distributions are close. This grouping is a dual clustering of two sets of elements, row and column indices. The introduced approach is based on the minimization of reduction of mutual information
more » ... ained in a matrix that represents the relationship between two sets of elements. Our clustering approach contains many parallels with K-Means clustering due to certain common algebraic properties. The obtained results have many applications, including grouping of Web visit data.
doi:10.1137/1.9781611972726.25 dblp:conf/sdm/BerkhinB02 fatcat:rept7emfxzggjd2v6iamjoqjii

Automating exploratory data analysis for efficient data mining

Jonathan D. Becher, Pavel Berkhin, Edmund Freeman
2000 Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '00  
Having access to large data sets for the purpose of predictive data mining does not guarantee good models, even when the size of the training data is virtually unlimited. Instead, careful data preprocessing is required, including data cleansing, handling missing values, attribute representation and encoding, and generating derived attributes. In particular, the selection of the most appropriate subset of attributes to include is a critical step in building an accurate and efficient model. We
more » ... cribe an automated approach to the exploration, preprocessing, and selection of the optimal attribute subset whose goal is to simplify the KDD process and dramatically shorten the time to build a model. Our implementation finds inappropriate and suspicious attributes, performs target dependency analysis, determining optimal attribute encoding, generates new derived attributes, and provides a flexible approach to attribute selection. We present results generated by an industrial KDD environment called the Accrue Decision Series on several real world Web data sets.
doi:10.1145/347090.347179 dblp:conf/kdd/BecherBF00 fatcat:a6siexjp5jdondz2cebddk2k4y

Interoperability ranking for mobile applications

Dragomir Yankov, Pavel Berkhin, Rajen Subba
2013 Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '13  
At present, most major app marketplaces perform ranking and recommendation based on search relevance features or marketplace "popularity" statistics. For instance, they check similarity between app descriptions and user search queries, or rank-order the apps according to statistics such as number of downloads, user ratings etc. Rankings derived from such signals, important as they are, are insufficient to capture the dynamics of the apps ecosystem. Consider for example the questions: In a
more » ... ular user context, is app A more likely to be launched than app B ? Or does app C provide complementary functionality to app D? Answering these questions requires identifying and analyzing the dependencies between apps in the apps ecosystem. Ranking mechanisms that reflect such interdependences are thus necessary. In this paper we introduce the notion of interoperability ranking for mobile applications. Intuitively, apps with high rank are such apps which are inferred to be somehow important to other apps in the ecosystem. We demonstrate how interoperability ranking can help answer the above questions and also provide the basis for solving several problems which are rapidly attracting the attention of both researchers and the industry, such as building personalized real-time app recommender systems or intelligent mobile agents. We describe a set of methods for computing interoperability ranks and analyze their performance on real data from the Windows Phone app marketplace.
doi:10.1145/2484028.2484122 dblp:conf/sigir/YankovBS13 fatcat:cg5wabjwhbesjgiqz4nw7blhby

Fast and cost-efficient bid estimation for contextual ads

Ye Chen, Pavel Berkhin, Jie Li, Sharon Wan, Tak W. Yan
2012 Proceedings of the 21st international conference companion on World Wide Web - WWW '12 Companion  
We study the problem of estimating the value of a contextual ad impression, and based upon which an ad network bids on an exchange. The ad impression opportunity would materialize into revenue only if the ad network wins the impression and a user clicks on the ads, both as a rare event especially in an open exchange for contextual ads. Given a low revenue expectation and the elusive nature of predicting weak-signal click-through rates, the computational cost incurred by bid estimation shall be
more » ... autiously justified. We developed and deployed a novel impression valuation model, which is expected to reduce the computational cost by 95% and hence more than double the profit. Our approach is highly economized through a fast implementation of kNN regression that primarily leverages low-dimensional sell-side data (user and publisher). We also address the cold-start problem or the exploration vs. exploitation requirement by Bayesian smoothing using a beta prior, and adapt to the temporal dynamics using an autoregressive model.
doi:10.1145/2187980.2188085 dblp:conf/www/ChenBLWY12 fatcat:qjtrmbkvwjar7fuac3lqf275e4

Real-time bidding algorithms for performance-based display ad allocation

Ye Chen, Pavel Berkhin, Bo Anderson, Nikhil R. Devanur
2011 Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11  
We describe a real-time bidding algorithm for performancebased display ad allocation. A central issue in performance display advertising is matching campaigns to ad impressions, which can be formulated as a constrained optimization problem that maximizes revenue subject to constraints such as budget limits and inventory availability. The current practice is to solve the optimization problem offline at a tractable level of impression granularity (e.g., the placement level), and to serve ads
more » ... e based on the precomputed static delivery scheme. Although this offline approach takes a global view to achieve optimality, it fails to scale to ad delivery decision making at an individual impression level. Therefore, we propose a real-time bidding algorithm that enables fine-grained impression valuation (e.g., targeting users with real-time conversion data), and adjusts value-based bid according to real-time constraint snapshot (e.g., budget consumption level). Theoretically, we show that under a linear programming (LP) primal-dual formulation, the simple real-time bidding algorithm is indeed an online solver to the original primal problem by taking the optimal solution to the dual problem as input. In other words, the online algorithm guarantees the offline optimality given the same level of knowledge an offline optimization would have. Empirically, we develop and experiment with two real-time bid adjustment approaches to adapting to the non-stationary nature of the marketplace: one adjusts bids against real-time constraint satisfaction level using control-theoretic methods, and the other adjusts bids also based on the historical bidding landscape statistically modeled. Finally, we show experimental results with real-world ad serving data.
doi:10.1145/2020408.2020604 dblp:conf/kdd/ChenBAD11 fatcat:oopxy57gtnbevo2jwgy365kh2y

A new approach to geocoding

Pavel Berkhin, Michael R. Evans, Florin Teodorescu, Wei Wu, Dragomir Yankov
2015 Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems - GIS '15  
Real-time geocoders help users find precise locations in online mapping systems. Geocoding unstructured queries can be difficult, as users may describe map locations by referencing several spatially co-located entities (e.g., a business near a street intersection). Serving these queries is important as it provides new capabilities and allows for expanding in markets with less structured postal systems. Traditionally, this problem poses significant difficulties for online systems where latency
more » ... nstraints prevent exhaustive join-based algorithms. Previous work in this area involved natural language processing to segment queries based on known rules, or purely spatial approaches that are difficult to maintain and may have high latency. In this paper, we present a new approach to geocoding -BingGC -that makes fulfillment of extremely diverse geocoding queries possible via a combination of traditional web search technologies and a novel algorithm that uses textual search and spatial joins to quickly find results. It allows resolution of up to s spatially co-located entities in a single query with no pre-computation or rule-based matching. We provide experimental analysis of our system compared against leading online geocoders.
doi:10.1145/2820783.2820827 dblp:conf/gis/BerkhinETWY15 fatcat:zzetlsaxejclzaqd64umnqroaa

Interactive path analysis of web site traffic

Pavel Berkhin, Jonathan D. Beche, Dee Jay Randall
2001 Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '01  
The goal of Path Analysis is to understand visitors' navigation of a Web site. The fundamental analysis component is a path. A path is a finite sequence of elements, typically representing URLs or groups ofURLs. A full path is an abstraction of a visit or a session, which can contain attributes described below. Subpaths represent interesting subsequences of the full paths. Path Analysis provides user-confignrable extraction, filtering, preprocessing, noise reduction, descriptive statistics and
more » ... etailed analysis of three basic specific objects: elements, (sub)paths, and couples of elements. In each case, lists of frequent objects --subject to particular filtering and sorting -are available. We call the corresponding interactive tools Element, Path, and Couple Analyzers. We also allow in-depth exploration of individual elements, paths, and couples: Element Explorer investigates composition and convergence of traffic through an element and allows conditioning based on the number of preceding/succeeding steps. Path Explorer visualizes in and out flows of a path and attrition rate along the path. Couple Explorer presents distinct paths connecting couple elements, along with measures of their association and some additional statistics.
doi:10.1145/502512.502574 fatcat:nxf4qblcpzffdozk7cqubfcrwy

Kinerja Algoritma Kmeans++ pada Pengelompokkan Dokumen Teks Pendek pada Abstrak di Jurusan Teknik Elektro Fakultas Teknik UNJ

Catur Rahma Sistiani, Widodo, Bambang Prasetya Padhi
2018 Pinter: Jurnal Pendidikan Teknik Informatika dan Komputer  
Clustering Menurut Berkhin, Pavel diacu dalam Sri Andayani (2007) , clustering adalah membagi data ke dalam group-group yang mempunyai obyek yang karakteristiknya sama.  ... 
doi:10.21009/pinter.2.1.6 fatcat:y2i67va4bnbqvjegsgqov3f3xi

Mine Blood Donors Information through Improved K-Means Clustering [article]

Bondu Venkateswarlu, Prof G.S.V.Prasad Raju
2013 arXiv   pre-print
Pavel Berkhin [11] performed extensive research in the field of data mining experiments and organized analysis of the blood bank repositories which is helpful to the health professionals for a better  ... 
arXiv:1309.2597v1 fatcat:jyuizvfpeba3xcgttxbhziz2um

Kajian Algoritma GDBScan, Clarans dan Cure untuk Spatial Clustering

Budi Setiyono, Imam Mukhlash
2005 Limits Journal of Mathematics and Its Applications  
Pavel Berkhin mengklasifikasikan algoritma clustering menjadi beberapa macam antara lain hierarchical methods, partitioning methods, gridbased methods, methods based on co occurence of categorical data  ... 
doi:10.12962/j1829605x.v2i2.1373 fatcat:n5q3o3wfxnem7px6qgppbdughu

Mine Blood Donors Information through Improved K-Means Clustering

Bondu Venkateswarlu, Prasad Raju G.S.V
2013 International Journal of Computational Science and Information Technology  
Pavel Berkhin [11] performed extensive research in the field of data mining experiments and organized analysis of the blood bank repositories which is helpful to the health professionals for a better  ... 
doi:10.5121/ijcsity.2013.1302 fatcat:67vbyh7p2rdirlsxldod6gkzea
« Previous Showing results 1 — 15 out of 39 results