Filters








14,271 Hits in 5.7 sec

Comparison and benchmark of name-to-gender inference services

Lucía Santamaría, Helena Mihaljević
2018 PeerJ Computer Science  
We compare and benchmark five name-to-gender inference services by applying them to the classification of a test data set consisting of 7,076 manually labeled names.  ...  The increased interest in analyzing and explaining gender inequalities in tech, media, and academia highlights the need for accurate inference methods to predict a person's gender from their name.  ...  We are indebted to Elian Carsenat for allowing us to freely access NamSor's gender and origin APIs.  ... 
doi:10.7717/peerj-cs.156 pmid:33816809 pmcid:PMC7924484 fatcat:cdatpsteonhahfjp6ewydorzze

To "See" is to Stereotype: Image Tagging Algorithms, Gender Recognition, and the Accuracy – Fairness Trade-of

Pinar Barlas, Kyriakos Kyriakou, Olivia Guest, Styliani Kleanthous, Jahna Otterbacher
2020 Zenodo  
We design a controlled experiment, to examine the interdependence between algorithmic recognition of context and the depicted person's gender.  ...  Machine-learned computer vision algorithms for tagging images are increasingly used by developers and researchers, having become popularized as easy-to-use "cognitive services."  ...  To make within-tagger/between-context and between-tagger comparisons on gender recognition accuracy (RQ3), we consider only the cases in which there is evidence the algorithm inferred (at least partially  ... 
doi:10.5281/zenodo.4028262 fatcat:n6cfjopz3ndb5jjhiwmsr64ykm

Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking [article]

Zhiyi Ma, Kawin Ethayarajh, Tristan Thrush, Somya Jain, Ledell Wu, Robin Jia, Christopher Potts, Adina Williams, Douwe Kiela
2021 arXiv   pre-print
We introduce Dynaboard, an evaluation-as-a-service framework for hosting benchmarks and conducting holistic model comparison, integrated with the Dynabench platform.  ...  Under this paradigm, models are submitted to be evaluated in the cloud, circumventing the issues of reproducibility, accessibility, and backwards compatibility that often hinder benchmarking in NLP.  ...  We're grateful to Amanpreet Singh and Sujit Verma for useful feedback and engineering suggestions. We thank Eric Smith and April Bailey for helping with name lists for the fairness perturbation.  ... 
arXiv:2106.06052v1 fatcat:x6dknsfwvjhtxnynamslainykq

To "See" is to Stereotype

Pinar Barlas, Kyriakos Kyriakou, Olivia Guest, Styliani Kleanthous, Jahna Otterbacher
2021 Proceedings of the ACM on Human-Computer Interaction  
We design a controlled experiment, to examine the interdependence between algorithmic recognition of context and the depicted person's gender.  ...  Machine-learned computer vision algorithms for tagging images are increasingly used by developers and researchers, having become popularized as easy-to-use "cognitive services."  ...  To make within-tagger/between-context and between-tagger comparisons on gender recognition accuracy (RQ3), we consider only the cases in which there is evidence the algorithm inferred (at least partially  ... 
doi:10.1145/3432931 fatcat:adgcx37afvan3l2cu7yuer7esu

A global approach to the gender gap in mathematical, computing and natural sciences: How to measure it, how to reduce it?

Irvy M.A. Gledhill, Marie-Françoise Roy, Mei-Hung Chiu, Rachel Ivie, Silvina Ponce-Dawson, Helena Mihaljević
2019 South African Journal of Science  
The objectives of the project are to provide evidence on which interventions can be based, and to make available material on best practice that has been proven by test.  ...  In this Commentary, we describe one of three collaborative projects -the Gender Gap project -funded by the International Science Council (ISC) and the 11 partners of the project.  ...  Opinions expressed and conclusions arrived at are those of the authors and are not necessarily to be attributed to the CoE-MaSS.  ... 
doi:10.17159/sajs.2019/a0305 fatcat:suoeq2vcv5dbxpcg3hna3ortsi

A global approach to the gender gap in mathematical, computing and natural sciences: How to measure it, how to reduce it?

Irvy M.A. Gledhill, Marie-Françoise Roy, Mei-Hung Chiu, Rachel Ivie, Silvina Ponce-Dawson, Helena Mihaljević
2019 South African Journal of Science  
The objectives of the project are to provide evidence on which interventions can be based, and to make available material on best practice that has been proven by test.  ...  In this Commentary, we describe one of three collaborative projects -the Gender Gap project -funded by the International Science Council (ISC) and the 11 partners of the project.  ...  Opinions expressed and conclusions arrived at are those of the authors and are not necessarily to be attributed to the CoE-MaSS.  ... 
doi:10.17159//sajs.2019/a0305 fatcat:3unffl6ue5fgjgdttfntwhakdu

Cognitive Inference of Demographic Data by User Ratings [article]

Jinliang Xu, Shangguang Wang, Fangchun Yang, Rong N. Chang
2017 arXiv   pre-print
Cognitive inference of user demographics, such as gender and age, plays an important role in creating user profiles for adjusting marketing strategies and generating personalized recommendations because  ...  In this paper, we investigate the inference power of user ratings data, and propose a simple yet general cognitive inference model, called rating to profile (R2P), to infer user demographics from user  ...  ACKNOWLEDGMENT This work was supported in part by the National Science Foundation of China (61472047 and 61571066).  ... 
arXiv:1703.04216v2 fatcat:oigbzvfv7rbptjr7bpvx3h6oja

RuMedBench: A Russian Medical Language Understanding Benchmark [article]

Pavel Blinov, Arina Reshetnikova, Aleksandr Nesterov, Galina Zubkova, Vladimir Kokh
2022 arXiv   pre-print
The paper describes the open Russian medical language understanding benchmark covering several task types (classification, question answering, natural language inference, named entity recognition) on a  ...  A single-number metric expresses a model's ability to cope with the benchmark.  ...  In the absence of similar data and labeling, we translated MedNLI to Russian. First, each text is independently processed by two automatic translation services.  ... 
arXiv:2201.06499v2 fatcat:xuuww62durhnvjqjpe2uhkrqpi

Variomes: a high recall search engine to support the curation of genomic variants [article]

Emilie Pasche, Anaïs Mottaz, Deborah Caucheteur, Julien Gobeill, Pierre-André Michel, Patrick Ruch
2021 bioRxiv   pre-print
Experimental setting 1: The literature retrieval task is tuned and evaluated using the TREC Precision Medicine 2018 and 2019 benchmarks consisting respectively in 50 and 40 topics.  ...  Experimental setting 3: A comparison of Variomes with LitVar, a well-known search engine for genetic variants is performed.  ...  SVIP uses an integrated version of the Variomes services to support the curators of the clinical database.  ... 
doi:10.1101/2021.05.29.446224 fatcat:4kysyzabnnfqdgjx7aq4iqitve

Personalized Benchmarking with the Ludwig Benchmarking Toolkit [article]

Avanika Narayan, Piero Molino, Karan Goel, Willie Neiswanger, Christopher Ré
2021 arXiv   pre-print
We explore the trade-offs between inference latency and performance, relationships between dataset attributes and performance, and the effects of pretraining on convergence and robustness, showing how  ...  Unfortunately, these users cannot use standard benchmark results to perform such value-driven comparisons as traditional benchmarks evaluate models on a single objective (e.g. average accuracy) and fail  ...  Acknowledgments and Disclosure of Funding We are thankful to Michael Zhang, Laurel Orr, Sarah Hooper, Dan Fu, Arjun Desai and many other members of the Stanford AI Lab for helpful discussions and feedback  ... 
arXiv:2111.04260v1 fatcat:lsj2xfgn7zdsncb7zjmvrexnpu

A Cross-Platform Benchmark Framework for Mobile Semantic Web Reasoning Engines [chapter]

William Van Woensel, Newres Al Haider, Ahmad Ahmad, Syed S. R. Abidi
2014 Lecture Notes in Computer Science  
To tackle these challenges, it should be possible to benchmark mobile reasoning performance across different mobile platforms, with rule-and datasets of varying scale and complexity and existing reasoning  ...  In this paper, we present a cross-platform benchmark framework that supplies 1) a generic, standards-based Semantic Web layer on top of existing mobile reasoning engines; and 2) a benchmark engine to investigate  ...  age, gender and ethnicity).  ... 
doi:10.1007/978-3-319-11964-9_25 fatcat:dzru6gc7n5ge7hwxnysmpcqhiy

ICT and IT Initiatives in Public Governance − Benchmarking and Insights from Ethiopia

Premkumar Balaraman
2018 Business Ethics and Leadership  
Findings: Some of the innovative ICT /IT based Global Benchmark E-Governance Services includes, E-Employment Services, GIS based Emergency Assistance, Integrated Social Assistance System, Open government  ...  The face-face in depth interviews revealed major issues like high levels of Corruption, non-transparency, lack of skilled manpower in ICT/IT domain, low levels of literacy, cultural and language barriers  ...  Whereas, there is still a long way to go in delivering innovative ICT based E-governance services and solutions at par with developed countries.  ... 
doi:10.21272/bel.2(1).14-31.2018 fatcat:pjdddh5ghbentkwdmw3mb44jd4

Association of Age, Gender, and Race with Intensity of End-of-Life Care for Medicare Beneficiaries with Cancer

Susan Miesfeldt, Kimberly Murray, Lee Lucas, Chiang-Hua Chang, David Goodman, Nancy E. Morden
2012 Journal of Palliative Medicine  
Purpose: To measure intensity of end-of-life (EOL) care for Medicare cancer patients and variations in care by age, gender, and race.  ...  Female gender was associated with lower ORs (0.82 to 0.86) for aggressive care, and an OR of 0.84 (95% CI 0.81-0.86) for late hospice enrollment.  ...  Authors KM Murray and CH Chang had full access to all of the data. All authors take responsibility for the integrity of the data and the accuracy of the data analysis.  ... 
doi:10.1089/jpm.2011.0310 pmid:22468739 pmcid:PMC3353746 fatcat:ausyec347beavohzutep4qyiwe

On Measuring Social Biases in Prompt-Based Multi-Task Learning [article]

Afra Feyza Akyürek, Sejin Paik, Muhammed Yusuf Kocyigit, Seda Akbiyik, Şerife Leman Runyun, Derry Wijaya
2022 arXiv   pre-print
We use an existing bias benchmark for the former BBQ and create the first bias benchmark in natural language inference BBNLI with hand-written hypotheses while also converting each benchmark into the other  ...  A large body of work within prompt engineering attempts to understand the effects of input forms and prompts in achieving superior performance.  ...  Government is authorized to reproduce and distribute reprints for Governmental purposes.  ... 
arXiv:2205.11605v1 fatcat:ofcldcog7bcedo4voonzh4zeme

Balancing the Personality of Programmer: Software Development Team Composition

Abdul Rehman Gilal, Jafreezal Jaafar, Mazni Omar, Shuib Basri, Izzatdin Abdul Aziz
2016 Malaysian Journal of Computer Science  
The experiments were divided into two segments: defining balancing benchmark and validating the benchmark.  ...  team since an adept and compatible team members, in terms of personality, are likely to ensure the success of software.  ...  Figure 2 shows the trait-to-trait comparison of male and female programmers.  ... 
doi:10.22452/mjcs.vol29no2.5 fatcat:nduv734wpfb4pgixxs7z3u5pxi
« Previous Showing results 1 — 15 out of 14,271 results