A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2006; you can also visit the original URL.
The file type is application/pdf
.
Filters
A Survey on PageRank Computing
2005
Internet Mathematics
Berkhin: A Survey on PageRank Computing 87 Contrary to some references, const = (1 − c) but depends on a sink term in Equation (2.8). ...
The authority weight of a page is an aggregated significance of the hubs that point to it ("beauty lies in the eye of the beholder"), while the hub weight of
Berkhin: A Survey on PageRank Computing ...
doi:10.1080/15427951.2005.10129098
fatcat:ahbwdzfelzbfrcpzsoqae4w2za
Bookmark-Coloring Algorithm for Personalized PageRank Computing
2006
Internet Mathematics
We introduce a novel bookmark-coloring algorithm (BCA) that computes authority weights over the web pages utilizing the web hyperlink structure. The computed vector (BCV) is similar to the PageRank vector defined for a page-specific teleportation. Meanwhile, BCA is very fast, and BCV is sparse. BCA also has important algebraic properties. If several BCVs corresponding to a set of pages (called hub) are known, they can be leveraged in computing arbitrary BCV via a straightforward algebraic
doi:10.1080/15427951.2006.10129116
fatcat:2ggnlhicrzdqzfpfzvmeb6p3uu
more »
... s and hub BCVs can be efficiently computed and encoded.
Evaluation of Explore-Exploit Policies in Multi-result Ranking Systems
[article]
2015
arXiv
pre-print
We analyze the problem of using Explore-Exploit techniques to improve precision in multi-result ranking systems such as web search, query autocompletion and news recommendation. Adopting an exploration policy directly online, without understanding its impact on the production system, may have unwanted consequences - the system may sustain large losses, create user dissatisfaction, or collect exploration data which does not help improve ranking quality. An offline framework is thus necessary to
arXiv:1504.07662v1
fatcat:72v5xrckzzczdjqulzwud4ufza
more »
... et us decide what policy and how we should apply in a production environment to ensure positive outcome. Here, we describe such an offline framework. Using the framework, we study a popular exploration policy - Thompson sampling. We show that there are different ways of implementing it in multi-result ranking systems, each having different semantic interpretation and leading to different results in terms of sustained click-through-rate (CTR) loss and expected model improvement. In particular, we demonstrate that Thompson sampling can act as an online learner optimizing CTR, which in some cases can lead to an interesting outcome: lift in CTR during exploration. The observation is important for production systems as it suggests that one can get both valuable exploration data to improve ranking performance on the long run, and at the same time increase CTR while exploration lasts.
LiveMaps
2017
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '17
Image search is a popular application on web search engines. Issuing a location-related query in image search engines o en returns multiple images of maps among the top ranked results. Traditionally, clicking on such images either opens the image in a new browser tab or takes users to a web page containing the image. However, nding the area of intent on an interactive web map is a manual process. In this paper, we describe a novel system, LiveMaps, for analyzing and retrieving an appropriate
doi:10.1145/3077136.3080673
dblp:conf/sigir/EvansYBYTW17
fatcat:zoeobgh6b5f4rl7h2tdeyov7ee
more »
... viewport for a given image of a map. is allows annotation of images of maps returned by image search engines, allowing users to directly open a link to an interactive map centered on the location of interest. LiveMaps works in several stages. It rst checks whether the input image represents a map. If yes, then the system a empts to identify what geographical area this map image represents. In the process, we use textual as well as visual information extracted from the image. Finally, we construct an interactive map object capturing the geographical area inferred for the image. Evaluation results on a dataset of high ranked location images indicate our system constructs very precise map representations also achieving good levels of coverage.
Learning Simple Relations: Theory and Applications
[chapter]
2002
Proceedings of the 2002 SIAM International Conference on Data Mining
In addition to classic clustering algorithms, many different approaches to clustering are emerging for objects of special nature. In this article we deal with the grouping of rows and columns of a matrix with non-negative entries. Two rows (or columns) are considered similar if corresponding cross-distributions are close. This grouping is a dual clustering of two sets of elements, row and column indices. The introduced approach is based on the minimization of reduction of mutual information
doi:10.1137/1.9781611972726.25
dblp:conf/sdm/BerkhinB02
fatcat:rept7emfxzggjd2v6iamjoqjii
more »
... ained in a matrix that represents the relationship between two sets of elements. Our clustering approach contains many parallels with K-Means clustering due to certain common algebraic properties. The obtained results have many applications, including grouping of Web visit data.
Automating exploratory data analysis for efficient data mining
2000
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '00
Having access to large data sets for the purpose of predictive data mining does not guarantee good models, even when the size of the training data is virtually unlimited. Instead, careful data preprocessing is required, including data cleansing, handling missing values, attribute representation and encoding, and generating derived attributes. In particular, the selection of the most appropriate subset of attributes to include is a critical step in building an accurate and efficient model. We
doi:10.1145/347090.347179
dblp:conf/kdd/BecherBF00
fatcat:a6siexjp5jdondz2cebddk2k4y
more »
... cribe an automated approach to the exploration, preprocessing, and selection of the optimal attribute subset whose goal is to simplify the KDD process and dramatically shorten the time to build a model. Our implementation finds inappropriate and suspicious attributes, performs target dependency analysis, determining optimal attribute encoding, generates new derived attributes, and provides a flexible approach to attribute selection. We present results generated by an industrial KDD environment called the Accrue Decision Series on several real world Web data sets.
Interoperability ranking for mobile applications
2013
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '13
At present, most major app marketplaces perform ranking and recommendation based on search relevance features or marketplace "popularity" statistics. For instance, they check similarity between app descriptions and user search queries, or rank-order the apps according to statistics such as number of downloads, user ratings etc. Rankings derived from such signals, important as they are, are insufficient to capture the dynamics of the apps ecosystem. Consider for example the questions: In a
doi:10.1145/2484028.2484122
dblp:conf/sigir/YankovBS13
fatcat:cg5wabjwhbesjgiqz4nw7blhby
more »
... ular user context, is app A more likely to be launched than app B ? Or does app C provide complementary functionality to app D? Answering these questions requires identifying and analyzing the dependencies between apps in the apps ecosystem. Ranking mechanisms that reflect such interdependences are thus necessary. In this paper we introduce the notion of interoperability ranking for mobile applications. Intuitively, apps with high rank are such apps which are inferred to be somehow important to other apps in the ecosystem. We demonstrate how interoperability ranking can help answer the above questions and also provide the basis for solving several problems which are rapidly attracting the attention of both researchers and the industry, such as building personalized real-time app recommender systems or intelligent mobile agents. We describe a set of methods for computing interoperability ranks and analyze their performance on real data from the Windows Phone app marketplace.
Fast and cost-efficient bid estimation for contextual ads
2012
Proceedings of the 21st international conference companion on World Wide Web - WWW '12 Companion
We study the problem of estimating the value of a contextual ad impression, and based upon which an ad network bids on an exchange. The ad impression opportunity would materialize into revenue only if the ad network wins the impression and a user clicks on the ads, both as a rare event especially in an open exchange for contextual ads. Given a low revenue expectation and the elusive nature of predicting weak-signal click-through rates, the computational cost incurred by bid estimation shall be
doi:10.1145/2187980.2188085
dblp:conf/www/ChenBLWY12
fatcat:qjtrmbkvwjar7fuac3lqf275e4
more »
... autiously justified. We developed and deployed a novel impression valuation model, which is expected to reduce the computational cost by 95% and hence more than double the profit. Our approach is highly economized through a fast implementation of kNN regression that primarily leverages low-dimensional sell-side data (user and publisher). We also address the cold-start problem or the exploration vs. exploitation requirement by Bayesian smoothing using a beta prior, and adapt to the temporal dynamics using an autoregressive model.
Real-time bidding algorithms for performance-based display ad allocation
2011
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11
We describe a real-time bidding algorithm for performancebased display ad allocation. A central issue in performance display advertising is matching campaigns to ad impressions, which can be formulated as a constrained optimization problem that maximizes revenue subject to constraints such as budget limits and inventory availability. The current practice is to solve the optimization problem offline at a tractable level of impression granularity (e.g., the placement level), and to serve ads
doi:10.1145/2020408.2020604
dblp:conf/kdd/ChenBAD11
fatcat:oopxy57gtnbevo2jwgy365kh2y
more »
... e based on the precomputed static delivery scheme. Although this offline approach takes a global view to achieve optimality, it fails to scale to ad delivery decision making at an individual impression level. Therefore, we propose a real-time bidding algorithm that enables fine-grained impression valuation (e.g., targeting users with real-time conversion data), and adjusts value-based bid according to real-time constraint snapshot (e.g., budget consumption level). Theoretically, we show that under a linear programming (LP) primal-dual formulation, the simple real-time bidding algorithm is indeed an online solver to the original primal problem by taking the optimal solution to the dual problem as input. In other words, the online algorithm guarantees the offline optimality given the same level of knowledge an offline optimization would have. Empirically, we develop and experiment with two real-time bid adjustment approaches to adapting to the non-stationary nature of the marketplace: one adjusts bids against real-time constraint satisfaction level using control-theoretic methods, and the other adjusts bids also based on the historical bidding landscape statistically modeled. Finally, we show experimental results with real-world ad serving data.
A new approach to geocoding
2015
Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems - GIS '15
Real-time geocoders help users find precise locations in online mapping systems. Geocoding unstructured queries can be difficult, as users may describe map locations by referencing several spatially co-located entities (e.g., a business near a street intersection). Serving these queries is important as it provides new capabilities and allows for expanding in markets with less structured postal systems. Traditionally, this problem poses significant difficulties for online systems where latency
doi:10.1145/2820783.2820827
dblp:conf/gis/BerkhinETWY15
fatcat:zzetlsaxejclzaqd64umnqroaa
more »
... nstraints prevent exhaustive join-based algorithms. Previous work in this area involved natural language processing to segment queries based on known rules, or purely spatial approaches that are difficult to maintain and may have high latency. In this paper, we present a new approach to geocoding -BingGC -that makes fulfillment of extremely diverse geocoding queries possible via a combination of traditional web search technologies and a novel algorithm that uses textual search and spatial joins to quickly find results. It allows resolution of up to s spatially co-located entities in a single query with no pre-computation or rule-based matching. We provide experimental analysis of our system compared against leading online geocoders.
Interactive path analysis of web site traffic
2001
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '01
The goal of Path Analysis is to understand visitors' navigation of a Web site. The fundamental analysis component is a path. A path is a finite sequence of elements, typically representing URLs or groups ofURLs. A full path is an abstraction of a visit or a session, which can contain attributes described below. Subpaths represent interesting subsequences of the full paths. Path Analysis provides user-confignrable extraction, filtering, preprocessing, noise reduction, descriptive statistics and
doi:10.1145/502512.502574
fatcat:nxf4qblcpzffdozk7cqubfcrwy
more »
... etailed analysis of three basic specific objects: elements, (sub)paths, and couples of elements. In each case, lists of frequent objects --subject to particular filtering and sorting -are available. We call the corresponding interactive tools Element, Path, and Couple Analyzers. We also allow in-depth exploration of individual elements, paths, and couples: Element Explorer investigates composition and convergence of traffic through an element and allows conditioning based on the number of preceding/succeeding steps. Path Explorer visualizes in and out flows of a path and attrition rate along the path. Couple Explorer presents distinct paths connecting couple elements, along with measures of their association and some additional statistics.
Kinerja Algoritma Kmeans++ pada Pengelompokkan Dokumen Teks Pendek pada Abstrak di Jurusan Teknik Elektro Fakultas Teknik UNJ
2018
Pinter: Jurnal Pendidikan Teknik Informatika dan Komputer
Clustering Menurut Berkhin, Pavel diacu dalam Sri Andayani (2007) , clustering adalah membagi data ke dalam group-group yang mempunyai obyek yang karakteristiknya sama. ...
doi:10.21009/pinter.2.1.6
fatcat:y2i67va4bnbqvjegsgqov3f3xi
Mine Blood Donors Information through Improved K-Means Clustering
[article]
2013
arXiv
pre-print
Pavel Berkhin [11] performed extensive research in the field of data mining experiments and organized analysis of the blood bank repositories which is helpful to the health professionals for a better ...
arXiv:1309.2597v1
fatcat:jyuizvfpeba3xcgttxbhziz2um
Kajian Algoritma GDBScan, Clarans dan Cure untuk Spatial Clustering
2005
Limits Journal of Mathematics and Its Applications
Pavel Berkhin mengklasifikasikan algoritma clustering menjadi beberapa macam antara lain hierarchical methods, partitioning methods, gridbased methods, methods based on co occurence of categorical data ...
doi:10.12962/j1829605x.v2i2.1373
fatcat:n5q3o3wfxnem7px6qgppbdughu
Mine Blood Donors Information through Improved K-Means Clustering
2013
International Journal of Computational Science and Information Technology
Pavel Berkhin [11] performed extensive research in the field of data mining experiments and organized analysis of the blood bank repositories which is helpful to the health professionals for a better ...
doi:10.5121/ijcsity.2013.1302
fatcat:67vbyh7p2rdirlsxldod6gkzea
« Previous
Showing results 1 — 15 out of 39 results