45 Hits in 0.73 sec

On a Model for Integrated Information [article]

Alessandro Epasto, Enrico Nardelli
2009 arXiv   pre-print
In this paper we give a thorough presentation of a model proposed by Tononi et al. for modeling integrated information, i.e. how much information is generated in a system transitioning from one state to the next one by the causal interaction of its parts and above and beyond the information given by the sum of its parts. We also provides a more general formulation of such a model, independent from the time chosen for the analysis and from the uniformity of the probability distribution at the
more » ... tial time instant. Finally, we prove that integrated information is null for disconnected systems.
arXiv:1001.0063v1 fatcat:ztblgi2lkfcofp3pxi5heimdve

Fair Correlation Clustering [article]

Sara Ahmadian, Alessandro Epasto, Ravi Kumar, Mohammad Mahdian
2020 arXiv   pre-print
In this paper, we study correlation clustering under fairness constraints. Fair variants of k-median and k-center clustering have been studied recently, and approximation algorithms using a notion called fairlet decomposition have been proposed. We obtain approximation algorithms for fair correlation clustering under several important types of fairness constraints. Our results hinge on obtaining a fairlet decomposition for correlation clustering by introducing a novel combinatorial optimization
more » ... problem. We define a fairlet decomposition with cost similar to the k-median cost and this allows us to obtain approximation algorithms for a wide range of fairness constraints. We complement our theoretical results with an in-depth analysis of our algorithms on real graphs where we show that fair solutions to correlation clustering can be obtained with limited increase in cost compared to the state-of-the-art (unfair) algorithms.
arXiv:2002.02274v2 fatcat:sriudmddbrcyvlcacznlkvt6na

Submodular Optimization over Sliding Windows [article]

Alessandro Epasto, Silvio Lattanzi, Sergei Vassilvitskii, Morteza Zadimoghaddam
2016 arXiv   pre-print
Maximizing submodular functions under cardinality constraints lies at the core of numerous data mining and machine learning applications, including data diversification, data summarization, and coverage problems. In this work, we study this question in the context of data streams, where elements arrive one at a time, and we want to design low-memory and fast update-time algorithms that maintain a good solution. Specifically, we focus on the sliding window model, where we are asked to maintain a
more » ... solution that considers only the last W items. In this context, we provide the first non-trivial algorithm that maintains a provable approximation of the optimum using space sublinear in the size of the window. In particular we give a 1/3 - ϵ approximation algorithm that uses space polylogarithmic in the spread of the values of the elements, Φ, and linear in the solution size k for any constant ϵ > 0 . At the same time, processing each element only requires a polylogarithmic number of evaluations of the function itself. When a better approximation is desired, we show a different algorithm that, at the cost of using more memory, provides a 1/2 - ϵ approximation and allows a tunable trade-off between average update time and space. This algorithm matches the best known approximation guarantees for submodular optimization in insertion-only streams, a less general formulation of the problem. We demonstrate the efficacy of the algorithms on a number of real world datasets, showing that their practical performance far exceeds the theoretical bounds. The algorithms preserve high quality solutions in streams with millions of items, while storing a negligible fraction of them.
arXiv:1610.09984v1 fatcat:b5aw7falnfenjovpiezc4urga4

Communities, Random Walks, and Social Sybil Defense

Lorenzo Alvisi, Allen Clement, Alessandro Epasto, Silvio Lattanzi, Alessandro Panconesi
2014 Internet Mathematics  
Alessandro Epasto is supported by the Google European Doctoral Fellowship in Algorithms, 2011.  ...  Alessandro Panconesi is partially supported by a Google Faculty Research Award and by the EU FET project MULTIPLEX 317532.  ... 
doi:10.1080/15427951.2013.865685 fatcat:zhgbxp6dqne6xj6wa4dmm7ev2q

Smooth Anonymity for Sparse Binary Matrices [article]

Hossein Esfandiari, Alessandro Epasto, Vahab Mirrokni, Andres Munoz Medina, Sergei Vassilvitskii
2022 arXiv   pre-print
When working with user data providing well-defined privacy guarantees is paramount. In this work we aim to manipulate and share an entire sparse dataset with a third party privately. In fact, differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets, as one of our main results, we prove that any differentially private mechanism that maintains a reasonable similarity with the initial dataset is doomed to have a very weak privacy
more » ... Hence we need to opt for other privacy notions such as k-anonymity are better at preserving utility in this context. In this work we present a variation of k-anonymity, which we call smooth k-anonymity and design simple algorithms that efficiently provide smooth k-anonymity. We further perform an empirical evaluation to back our theoretical guarantees, and show that our algorithm improves the performance in downstream machine learning tasks on anonymized data.
arXiv:2207.06358v1 fatcat:5fvwzsnu7ff2zdgmumtn4ui67i

Optimal Approximation – Smoothness Tradeoffs for Soft-Max Functions [article]

Alessandro Epasto, Mohammad Mahdian, Vahab Mirrokni, Manolis Zampetakis
2020 arXiv   pre-print
[EMZ17] Alessandro Epasto, Vahab Mirrokni, and Morteza Zadimoghaddam. Bicriteria distributed submodular maximization in a few rounds. In SPAA. ACM, 2017.  ... 
arXiv:2010.11450v1 fatcat:2z4noyrbq5gcvjmb47ylb3tw3q

Efficient Approximation for Restricted Biclique Cover Problems

Alessandro Epasto, Eli Upfal
2018 Algorithms  
Covering the edges of a bipartite graph by a minimum set of bipartite complete graphs (bicliques) is a basic graph theoretic problem, with numerous applications. In particular, it is used to characterize parsimonious models of a set of observations (each biclique corresponds to a factor or feature that relates the observations in the two sets of nodes connected by the biclique). The decision version of the minimum biclique cover problem is NP-Complete, and unless P = NP, the cover size cannot
more » ... approximated in general within less than a sub-linear factor of the number of nodes (or edges) in the graph. In this work, we consider two natural restrictions to the problem, motivated by practical applications. In the first case, we restrict the number of bicliques a node can belong to. We show that when this number is at least 5, the problem is still NP-hard. In contrast, we show that when nodes belong to no more than two bicliques, the problem has efficient approximations. The second model we consider corresponds to observing a set of independent samples from an unknown model, governed by a possibly large number of factors. The model is defined by a bipartite graph G = (L, R, E), where each node in L is assigned to an arbitrary subset of up to a constant f factors, while the nodes in R (the independent observations) are assigned to random subsets of the set of k factors where k can grow with size of the graph. We show that this practical version of the biclique cover problem is amenable to efficient approximations.
doi:10.3390/a11060084 fatcat:mf5haw4krrbyfpgycexkppx3si

Massively Parallel and Dynamic Algorithms for Minimum Size Clustering [article]

Alessandro Epasto, Mohammad Mahdian, Vahab Mirrokni, Peilin Zhong
2021 arXiv   pre-print
In this paper, we study the r-gather problem, a natural formulation of minimum-size clustering in metric spaces. The goal of r-gather is to partition n points into clusters such that each cluster has size at least r, and the maximum radius of the clusters is minimized. This additional constraint completely changes the algorithmic nature of the problem, and many clustering techniques fail. Also previous dynamic and parallel algorithms do not achieve desirable complexity. We propose algorithms
more » ... h in the Massively Parallel Computation (MPC) model and in the dynamic setting. Our MPC algorithm handles input points from the Euclidean space ℝ^d. It computes an O(1)-approximate solution of r-gather in O(log^ε n) rounds using total space O(n^1+γ· d) for arbitrarily small constants ε,γ > 0. In addition our algorithm is fully scalable, i.e., there is no lower bound on the memory per machine. Our dynamic algorithm maintains an O(1)-approximate r-gather solution under insertions/deletions of points in a metric space with doubling dimension d. The update time is r · 2^O(d)·log^O(1)Δ and the query time is 2^O(d)·log^O(1)Δ, where Δ is the ratio between the largest and the smallest distance.
arXiv:2106.02685v1 fatcat:ei62ajztrjcgzm2cbnkle7pyh4

Fair Hierarchical Clustering [article]

Sara Ahmadian, Alessandro Epasto, Marina Knittel, Ravi Kumar, Mohammad Mahdian, Benjamin Moseley, Philip Pham, Sergei Vassilvitskii, Yuyan Wang
2020 arXiv   pre-print
As machine learning has become more prevalent, researchers have begun to recognize the necessity of ensuring machine learning systems are fair. Recently, there has been an interest in defining a notion of fairness that mitigates over-representation in traditional clustering. In this paper we extend this notion to hierarchical clustering, where the goal is to recursively partition the data to optimize a specific objective. For various natural objectives, we obtain simple, efficient algorithms to
more » ... find a provably good fair hierarchical clustering. Empirically, we show that our algorithms can find a fair hierarchical clustering, with only a negligible loss in the objective.
arXiv:2006.10221v2 fatcat:v3npk2q3mbdcxirxlvqmn33lqa

Signals from the crowd

Marco V. Barbera, Alessandro Epasto, Alessandro Mei, Vasile C. Perta, Julinda Stefa
2013 Proceedings of the 2013 conference on Internet measurement conference - IMC '13  
The ever increasing ubiquitousness of WiFi access points, coupled with the diffusion of smartphones, suggest that Internet every time and everywhere will soon (if not already has) become a reality. Even in presence of 3G connectivity, our devices are built to switch automatically to WiFi networks so to improve user experience. Most of the times, this is achieved by recurrently broadcasting automatic connectivity requests (known as Probe Requests) to known access points (APs), like, e.g., "Home
more » ... iFi", "Campus WiFi", and so on. In a large gathering of people, the number of these probes can be very high. This scenario rises a natural question: "Can significant information on the social structure of a large crowd and on its socioeconomic status be inferred by looking at smartphone probes?". In this work we give a positive answer to this question. We organized a 3-months long campaign, through which we collected around 11M probes sent by more than 160K different devices. During the campaign we targeted national and international events that attracted large crowds as well as other gatherings of people. Then, we present a simple and automatic methodology to build the underlying social graph of the smartphone users, starting from their probes. We do so for each of our target events, and find that they all feature social-network properties. In addition, we show that, by looking at the probes in an event, we can learn important sociological aspects of its participants-language, vendor adoption, and so on.
doi:10.1145/2504730.2504742 dblp:conf/imc/BarberaEMPS13 fatcat:wbbzkeutmfhg3bvwclfjaaod6i

Spreading rumours without the network

Paweł Brach, Alessandro Epasto, Alessandro Panconesi, Piotr Sankowski
2014 Proceedings of the second edition of the ACM conference on Online social networks - COSN '14  
In this paper we tackle the following question: is it possible to predict the characteristics of the evolution of an epidemic process in a social network on the basis of the degree distribution alone? We answer this question a rmatively for several di↵usion processes-Push-Pull, Broadcast and SIRby showing that it is possible to predict with good accuracy their average evolution. We do this by developing a space efficient predictor that makes it possible to handle very large networks with very
more » ... mited computational resources. Our experiments show that the prediction is surprisingly good for many instances of real-world networks. The class of realworld networks for which this happens can be characterized in terms of their neighbourhood function, which turns out to be similar to that of random networks. Finally, we analyse real instances of rumour spreading in Twitter and observe that our model describes qualitatively well their evolution.
doi:10.1145/2660460.2660472 dblp:conf/cosn/BrachEPS14 fatcat:tyk35crff5exlku3ajznbefjvu

Differentially Private Graph Learning via Sensitivity-Bounded Personalized PageRank [article]

Alessandro Epasto, Vahab Mirrokni, Bryan Perozzi, Anton Tsitsulin, Peilin Zhong
2022 arXiv   pre-print
Personalized PageRank (PPR) is a fundamental tool in unsupervised learning of graph representations such as node ranking, labeling, and graph embedding. However, while data privacy is one of the most important recent concerns, existing PPR algorithms are not designed to protect user privacy. PPR is highly sensitive to the input graph edges: the difference of only one edge may cause a big change in the PPR vector, potentially leaking private user data. In this work, we propose an algorithm which
more » ... outputs an approximate PPR and has provably bounded sensitivity to input edges. In addition, we prove that our algorithm achieves similar accuracy to non-private algorithms when the input graph has large degrees. Our sensitivity-bounded PPR directly implies private algorithms for several tools of graph learning, such as, differentially private (DP) PPR ranking, DP node classification, and DP node embedding. To complement our theoretical analysis, we also empirically verify the practical performances of our algorithms.
arXiv:2207.06944v1 fatcat:ux2ebslsxbaonbilbruxx4eozm

Improved Sliding Window Algorithms for Clustering and Coverage via Bucketing-Based Sketches [article]

Alessandro Epasto, Mohammad Mahdian, Vahab Mirrokni, Peilin Zhong
2021 arXiv   pre-print
Streaming computation plays an important role in large-scale data analysis. The sliding window model is a model of streaming computation which also captures the recency of the data. In this model, data arrives one item at a time, but only the latest W data items are considered for a particular problem. The goal is to output a good solution at the end of the stream by maintaining a small summary during the stream. In this work, we propose a new algorithmic framework for designing efficient
more » ... g window algorithms via bucketing-based sketches. Based on this new framework, we develop space-efficient sliding window algorithms for k-cover, k-clustering and diversity maximization problems. For each of the above problems, our algorithm achieves (1±ε)-approximation. Compared with the previous work, it improves both the approximation ratio and the space.
arXiv:2110.15533v1 fatcat:ysxbxvuyr5fnrk2vdllk33abye

Sliding Window Algorithms for k-Clustering Problems [article]

Michele Borassi, Alessandro Epasto, Silvio Lattanzi, Sergei Vassilvitskii, Morteza Zadimoghaddam
2020 arXiv   pre-print
Epasto, S. Lattanzi, S. Vassilvitskii, and M. Zadimoghaddam. Submodular optimization over sliding windows.  ...  ., 2016 , Epasto et al., 2017 , graph sparsification [Crouch et al., 2013] , minimizing the enclosing ball Wang et al.  ... 
arXiv:2006.05850v2 fatcat:q3423iziljc3hkzjy5ffn5yeje

Bisect and Conquer: Hierarchical Clustering via Max-Uncut Bisection [article]

Sara Ahmadian, Vaggos Chatziafratis, Alessandro Epasto, Euiwoong Lee, Mohammad Mahdian, Konstantin Makarychev, Grigory Yaroslavtsev
2019 arXiv   pre-print
Hierarchical Clustering is an unsupervised data analysis method which has been widely used for decades. Despite its popularity, it had an underdeveloped analytical foundation and to address this, Dasgupta recently introduced an optimization viewpoint of hierarchical clustering with pairwise similarity information that spurred a line of work shedding light on old algorithms (e.g., Average-Linkage), but also designing new algorithms. Here, for the maximization dual of Dasgupta's objective
more » ... ced by Moseley-Wang), we present polynomial-time .4246 approximation algorithms that use Max-Uncut Bisection as a subroutine. The previous best worst-case approximation factor in polynomial time was .336, improving only slightly over Average-Linkage which achieves 1/3. Finally, we complement our positive results by providing APX-hardness (even for 0-1 similarities), under the Small Set Expansion hypothesis.
arXiv:1912.06983v1 fatcat:bq3ppbj5ybdwrg6iwvmwubukpu
« Previous Showing results 1 — 15 out of 45 results