Filters








36 Hits in 0.95 sec

Responsible Scoring Mechanisms Through Function Sampling [article]

Abolfazl Asudeh, H. V. Jagadish
2019 arXiv   pre-print
Human decision-makers often receive assistance from data-driven algorithmic systems that provide a score for evaluating objects, including individuals. The scores are generated by a function (mechanism) that takes a set of features as input and generates a score.The scoring functions are either machine-learned or human-designed and can be used for different decision purposes such as ranking or classification. Given the potential impact of these scoring mechanisms on individuals' lives and on
more » ... iety, it is important to make sure these scores are computed responsibly. Hence we need tools for responsible scoring mechanism design. In this paper, focusing on linear scoring functions, we highlight the importance of unbiased function sampling and perturbation in the function space for devising such tools. We provide unbiased samplers for the entire function space, as well as a θ-vicinity around a given function. We then illustrate the value of these samplers for designing effective algorithms in three diverse problem scenarios in the context of ranking. Finally, as a fundamental method for designing responsible scoring mechanisms, we propose a novel approach for approximating the construction of the arrangement of hyperplanes. Despite the exponential complexity of an arrangement in the number of dimensions, using function sampling, our algorithm is linear in the number of samples and hyperplanes, and independent of the number of dimensions.
arXiv:1911.10073v1 fatcat:x4ug2gd7y5ctzpsui2rs5g5itq

Fair Active Learning [article]

Hadis Anahideh and Abolfazl Asudeh and Saravanan Thirumuruganathan
2020 arXiv   pre-print
Machine learning (ML) is increasingly being used in high-stakes applications impacting society. Therefore, it is of critical importance that ML models do not propagate discrimination. Collecting accurate labeled data in societal applications is challenging and costly. Active learning is a promising approach to build an accurate classifier by interactively querying an oracle within a labeling budget. We design algorithms for fair active learning that carefully selects data points to be labeled
more » ... as to balance model accuracy and fairness. Specifically, we focus on demographic parity - a widely used measure of fairness. Extensive experiments over benchmark datasets demonstrate the effectiveness of our proposed approach.
arXiv:2006.13025v2 fatcat:cjln3vpv45a7bnovgdhza2jiim

Designing Fair Ranking Schemes [article]

Abolfazl Asudeh, H. V. Jagadish, Julia Stoyanovich, Gautam Das
2018 arXiv   pre-print
Items from a database are often ranked based on a combination of multiple criteria. A user may have the flexibility to accept combinations that weigh these criteria differently, within limits. On the other hand, this choice of weights can greatly affect the fairness of the produced ranking. In this paper, we develop a system that helps users choose criterion weights that lead to greater fairness. We consider ranking functions that compute the score of each item as a weighted sum of (numeric)
more » ... ribute values, and then sort items on their score. Each ranking function can be expressed as a vector of weights, or as a point in a multi-dimensional space. For a broad range of fairness criteria, we show how to efficiently identify regions in this space that satisfy these criteria. Using this identification method, our system is able to tell users whether their proposed ranking function satisfies the desired fairness criteria and, if it does not, to suggest the smallest modification that does. We develop user-controllable approximation that and indexing techniques that are applied during preprocessing, and support sub-second response times during the online phase. Our extensive experiments on real datasets demonstrate that our methods are able to find solutions that satisfy fairness criteria effectively and efficiently.
arXiv:1712.09752v2 fatcat:rm5tamalofgyxgogjt64ir6sse

Crowdsourcing Pareto-Optimal Object Finding by Pairwise Comparisons [article]

Abolfazl Asudeh, Gensheng Zhang, Naeemul Hassan, Chengkai Li, Gergely V. Zaruba
2014 arXiv   pre-print
This is the first study on crowdsourcing Pareto-optimal object finding, which has applications in public opinion collection, group decision making, and information exploration. Departing from prior studies on crowdsourcing skyline and ranking queries, it considers the case where objects do not have explicit attributes and preference relations on objects are strict partial orders. The partial orders are derived by aggregating crowdsourcers' responses to pairwise comparison questions. The goal is
more » ... to find all Pareto-optimal objects by the fewest possible questions. It employs an iterative question-selection framework. Guided by the principle of eagerly identifying non-Pareto optimal objects, the framework only chooses candidate questions which must satisfy three conditions. This design is both sufficient and efficient, as it is proven to find a short terminal question sequence. The framework is further steered by two ideas---macro-ordering and micro-ordering. By different micro-ordering heuristics, the framework is instantiated into several algorithms with varying power in pruning questions. Experiment results using both real crowdsourcing marketplace and simulations exhibited not only orders of magnitude reductions in questions when compared with a brute-force approach, but also close-to-optimal performance from the most efficient instantiation.
arXiv:1409.4161v1 fatcat:g6cng7klqbfulkwu6nvmozy4au

RRR: Rank-Regret Representative [article]

Abolfazl Asudeh and Azade Nazi and Nan Zhang and Gautam Das and H. V. Jagadish
2018 arXiv   pre-print
Selecting the best items in a dataset is a common task in data exploration. However, the concept of "best" lies in the eyes of the beholder: different users may consider different attributes more important, and hence arrive at different rankings. Nevertheless, one can remove "dominated" items and create a "representative" subset of the data set, comprising the "best items" in it. A Pareto-optimal representative is guaranteed to contain the best item of each possible ranking, but it can be
more » ... as big as the full data. Representative can be found if we relax the requirement to include the best item for every possible user, and instead just limit the users' "regret". Existing work defines regret as the loss in score by limiting consideration to the representative instead of the full data set, for any chosen ranking function. However, the score is often not a meaningful number and users may not understand its absolute value. Sometimes small ranges in score can include large fractions of the data set. In contrast, users do understand the notion of rank ordering. Therefore, alternatively, we consider the position of the items in the ranked list for defining the regret and propose the rank-regret representative as the minimal subset of the data containing at least one of the top-k of any possible ranking function. This problem is NP-complete. We use the geometric interpretation of items to bound their ranks on ranges of functions and to utilize combinatorial geometry notions for developing effective and efficient approximation algorithms for the problem. Experiments on real datasets demonstrate that we can efficiently find small subsets with small rank-regrets.
arXiv:1802.10303v2 fatcat:intbu4utwngt7dcnznb7filslu

Maximizing Fair Content Spread via Edge Suggestion in Social Networks [article]

Ian P. Swift, Sana Ebrahmi, Azade Nova, Abolfazl Asudeh
2022 arXiv   pre-print
Content spread inequity is a potential unfairness issue in online social networks, disparately impacting minority groups. In this paper, we view friendship suggestion, a common feature in social network platforms, as an opportunity to achieve an equitable spread of content. In particular, we propose to suggest a subset of potential edges (currently not existing in the network but likely to be accepted) that maximizes content spread while achieving fairness. Instead of re-engineering the
more » ... systems, our proposal builds a fairness wrapper on top of the existing friendship suggestion components. We prove the problem is NP-hard and inapproximable in polynomial time unless P = NP. Therefore, allowing relaxation of the fairness constraint, we propose an algorithm based on LP-relaxation and randomized rounding with fixed approximation ratios on fairness and content spread. We provide multiple optimizations, further improving the performance of our algorithm in practice. Besides, we propose a scalable algorithm that dynamically adds subsets of nodes, chosen via iterative sampling, and solves smaller problems corresponding to these nodes. Besides theoretical analysis, we conduct comprehensive experiments on real and synthetic data sets. Across different settings, our algorithms found solutions with nearzero unfairness while significantly increasing the content spread. Our scalable algorithm could process a graph with half a million nodes on a single machine, reducing the unfairness to around 0.0004 while lifting content spread by 43%.
arXiv:2207.07704v1 fatcat:6t7h3a3tw5eu7eg4uumwjcm3ay

MithraDetective: A System for Cherry-picked Trendlines Detection [article]

Yoko Nagafuchi, Yin Lin, Kaushal Mamgain, Abolfazl Asudeh, H. V. Jagadish, You Wu, Cong Yu
2020 arXiv   pre-print
Given a data set, misleading conclusions can be drawn from it by cherry-picking selected samples. One important class of conclusions is a trend derived from a data set of values over time. Our goal is to evaluate whether the 'trends' described by the extracted samples are representative of the true situation represented in the data. We demonstrate MithraDetective, a system to compute a support score to indicate how cherry-picked a statement is; that is, whether the reported trend is
more » ... ed by the data. The system can also be used to discover more supported alternatives. MithraDetective provides an interactive visual interface for both tasks.
arXiv:2010.08807v1 fatcat:wtxbnpdftzdmpossvyvaecxyla

Assessing and Remedying Coverage for a Given Dataset [article]

Abolfazl Asudeh, Zhongjun Jin, H. V. Jagadish
2019 arXiv   pre-print
Data analysis impacts virtually every aspect of our society today. Often, this analysis is performed on an existing dataset, possibly collected through a process that the data scientists had limited control over. The existing data analyzed may not include the complete universe, but it is expected to cover the diversity of items in the universe. Lack of adequate coverage in the dataset can result in undesirable outcomes such as biased decisions and algorithmic racism, as well as creating
more » ... ilities such as opening up room for adversarial attacks. In this paper, we assess the coverage of a given dataset over multiple categorical attributes. We first provide efficient techniques for traversing the combinatorial explosion of value combinations to identify any regions of attribute space not adequately covered by the data. Then, we determine the least amount of additional data that must be obtained to resolve this lack of adequate coverage. We confirm the value of our proposal through both theoretical analyses and comprehensive experiments on real data.
arXiv:1810.06742v2 fatcat:qe42jhnkj5af5hx6dfyr5ec44a

On the Choice of Fairness: Finding Representative Fairness Metrics for a Given Context [article]

Hadis Anahideh, Nazanin Nezami, Abolfazl Asudeh
2021 arXiv   pre-print
in different stages of predictive modeling including pre-processing (Feldman et al. 2015; Kamiran and Calders 2012; Calmon et al. 2017) , in-processing (Calders and Verwer 2010; Zafar et al. 2015; Asudeh  ... 
arXiv:2109.05697v1 fatcat:xtphjt65jvgnhojpvfq4lqfcd4

OmniFair: A Declarative System for Model-Agnostic Group Fairness in Machine Learning [article]

Hantian Zhang, Xu Chu, Abolfazl Asudeh, Shamkant B. Navathe
2021 arXiv   pre-print
Machine learning (ML) is increasingly being used to make decisions in our society. ML models, however, can be unfair to certain demographic groups (e.g., African Americans or females) according to various fairness metrics. Existing techniques for producing fair ML models either are limited to the type of fairness constraints they can handle (e.g., preprocessing) or require nontrivial modifications to downstream ML training algorithms (e.g., in-processing). We propose a declarative system
more » ... r for supporting group fairness in ML. OmniFair features a declarative interface for users to specify desired group fairness constraints and supports all commonly used group fairness notions, including statistical parity, equalized odds, and predictive parity. OmniFair is also model-agnostic in the sense that it does not require modifications to a chosen ML algorithm. OmniFair also supports enforcing multiple user declared fairness constraints simultaneously while most previous techniques cannot. The algorithms in OmniFair maximize model accuracy while meeting the specified fairness constraints, and their efficiency is optimized based on the theoretically provable monotonicity property regarding the trade-off between accuracy and fairness that is unique to our system. We conduct experiments on commonly used datasets that exhibit bias against minority groups in the fairness literature. We show that OmniFair is more versatile than existing algorithmic fairness approaches in terms of both supported fairness constraints and downstream ML models. OmniFair reduces the accuracy loss by up to 94.8% compared with the second best method. OmniFair also achieves similar running time to preprocessing methods, and is up to 270× faster than in-processing methods.
arXiv:2103.09055v1 fatcat:duqp5yo22nefdcqzqxix7apgzy

QR2: A Third-party Query Reranking Service Over Web Databases [article]

Yeshwanth D. Gunasekaran, Abolfazl Asudeh, Sona Hasani, Nan Zhang, Ali Jaoua, Gautam Das
2018 arXiv   pre-print
The ranked retrieval model has rapidly become the de-facto way for search query processing in web databases. Despite the extensive efforts on designing better ranking mechanisms, in practice, many such databases fail to address the diverse and sometimes contradicting preferences of users. In this paper, we present QR2, a third-party service that uses nothing but the public search interface of a web database and enables the on-the-fly processing of queries with any user-specified ranking
more » ... s, no matter if the ranking function is supported by the database or not.
arXiv:1807.05258v1 fatcat:vmek4ciw2bcfva4knvimftc4ly

Fair Active Learning [article]

Hadis Anahideh and Abolfazl Asudeh and Saravanan Thirumuruganathan
2021 arXiv   pre-print
Machine learning (ML) is increasingly being used in high-stakes applications impacting society. Therefore, it is of critical importance that ML models do not propagate discrimination. Collecting accurate labeled data in societal applications is challenging and costly. Active learning is a promising approach to build an accurate classifier by interactively querying an oracle within a labeling budget. We design algorithms for fair active learning that carefully selects data points to be labeled
more » ... as to balance model accuracy and fairness. We demonstrate the effectiveness and efficiency of our proposed algorithms over widely used benchmark datasets using demographic parity and equalized odds notions of fairness.
arXiv:2001.01796v5 fatcat:u2iufnu4qnh67og5v64hn75dta

Discovering the skyline of web databases

Abolfazl Asudeh, Saravanan Thirumuruganathan, Nan Zhang, Gautam Das
2016 Proceedings of the VLDB Endowment  
ACKNOWLEDGMENTS The work of Abolfazl Asudeh, Saravanan Thirumuruganathan and Gautam Das was supported in part by the National Science Foundation under grant 1343976, the Army Research Office under grant  ... 
doi:10.14778/2904483.2904491 fatcat:c2pb2q5kw5gdpcqypevdllpbem

On detecting cherry-picked trendlines

Abolfazl Asudeh, H. V. Jagadish, You (Will) Wu, Cong Yu
2020 Proceedings of the VLDB Endowment  
PVLDB Reference Format: Abolfazl Asudeh, H. V. Jagadish, You (Will) Wu, Cong Yu. On Detecting Cherry-picked Trendlines. PVLDB, 13(6): 939-952, 2020.  ... 
doi:10.14778/3380750.3380762 fatcat:7hi7ar2owzf7xipajli557p5r4

Perturbation-based Detection and Resolution of Cherry-picking

Abolfazl Asudeh, You Wu, Cong Yu, H. V. Jagadish
2021 IEEE Data Engineering Bulletin  
In settings where an outcome, a decision, or a statement is made based on a single option among alternatives, it is popular to cherry-pick the data to generate an outcome that is supported by the cherrypicked data but not in general. In this paper, we use perturbation as a technique to design a support measure to detect, and resolve, cherry-picking across different contexts. In particular, to demonstrate the general scope of our proposal, we study cherry picking in two very different domains:
more » ... ) political statements based on trend-lines and (b) linear rankings. We also discuss sampling-based estimation as an effective and efficient approximation approach for detecting and resolving cherry-picking at scale.
dblp:journals/debu/Asudeh00J21 fatcat:ag4xfvni6nbhrc3cruiafpopcq
« Previous Showing results 1 — 15 out of 36 results