Filters








238 Hits in 5.2 sec

Measuring Societal Biases from Text Corpora with Smoothed First-Order Co-occurrence [article]

Navid Rekabsaz, Robert West, James Henderson, Allan Hanbury
2021 arXiv   pre-print
Text corpora are widely used resources for measuring societal biases and stereotypes.  ...  We propose an alternative approach to bias measurement utilizing the smoothed first-order co-occurrence relations between the word and the representative concept words, which we derive by reconstructing  ...  Introduction Text data has been widely utilized for studying and monitoring societal phenomena -such as biases and stereotypes -commonly by exploiting co-occurrence statistics of words in text.  ... 
arXiv:1812.10424v4 fatcat:irlckgvcsbbpfcnsodfvhhpgou

Categorization in the Wild: Generalizing Cognitive Models to Naturalistic Data across Languages [article]

Lea Frermann, Mirella Lapata
2019 arXiv   pre-print
Models of category learning and representation, however, are typically tested on data from small-scale experiments involving small sets of concepts with artificially restricted features; and experiments  ...  We present a Bayesian cognitive model designed to jointly learn categories and their structured representation from natural language text which allows us to (a) evaluate performance on a large scale, and  ...  from thematically unconstrained corpora of natural text.  ... 
arXiv:1902.08830v1 fatcat:w5y7aavbbnfnjmj6xlmbbmc5fq

A Geometry-Driven Longitudal Topic Model

Yu Wang, Conrad Hougen, Brandon Oselio, Walter Dempsey, Alfred Hero
2021 Harvard data science review  
A simple and scalable framework for longitudinal analysis of Twitter data is developed that combines latent topic models with computational geometric methods.  ...  Dimensionality reduction tools from computational geometry are applied to learn the intrinsic manifold on which the latent, temporal topics reside.  ...  ., word co-occurrences) from which the latent topics can be learned.  ... 
doi:10.1162/99608f92.b447c07e doaj:23143f3dd7e449d1be6552936c5d8e55 fatcat:vmvz4j3ax5eohp2hc2pg6kzteq

Exploring the Possibility of Peak Individualism, Humanity's Existential Crisis, and an Emerging Age of Purpose

Gabriel B. Grant
2017 Frontiers in Psychology  
ACKNOWLEDGMENTS This research was created in collaboration with Linda Kay Klein at Echoing Green as part of their Work on Purpose Program that inspires and equips those in the first decade of their careers  ...  The article was inspired by Linda's questions inquiring into how Echoing Green might measure the program's success.  ...  The graphs were made with the Google Books Ngram Viewer with a smoothing of 3. Figures 5, 6 . These results are consistent across both the English corpora and the English 1M corpora.  ... 
doi:10.3389/fpsyg.2017.01478 pmid:28928689 pmcid:PMC5591862 fatcat:hr52fhjulvfrpcoinftffdhcdm

Gender Bias, Social Bias and Representation: 70 Years of B^Hollywood [article]

Kunal Khadilkar, Ashiqur R. KhudaBukhsh, Tom M. Mitchell
2021 arXiv   pre-print
While the number of lives Bollywood can potentially touch is massive, no comprehensive NLP study on the evolution of social and gender biases in Bollywood dialogues exists.  ...  In this project, we propose to analyze such trends over 70 years of Bollywood movies contrasting them with their Hollywood counterpart and critically acclaimed world movies.  ...  Conclusions In this paper, we analyzed how social biases and subtle gender biases get reflected on diachronic corpora of popular entertainment.  ... 
arXiv:2102.09103v1 fatcat:ku6inruvh5e2jandkbsfqw4odq

Semantic changes in harm-related concepts in English [chapter]

Ekaterina Vylomova, Nick Haslam
2021 Zenodo  
We then continue with a more detailed study in order to understand how exactly the concepts changed, and to do so employ and evaluate different types of semantic representations.  ...  Here we apply computational models in order to address the concept creep hypothesis.  ...  For each concept we first constructed a list of words which the concept most often co-occurred with within each time period.  ... 
doi:10.5281/zenodo.5040304 fatcat:w6gyf3twknf2jaotwu7jdui5na

Designing an Extensible Domain-Specific Web Corpus for "Layfication" [chapter]

Marina Santini, Arne Jönsson, Wiktor Strandqvist, Gustav Cederblad, Mikael Nyström, Marjan Alirezaie, Leili Lind, Eva Blomqvist, Maria Lindén, Annica Kristoffersson
2019 Advances in Systems Analysis, Software Engineering, and High Performance Computing  
Jiang and Yang (2013) used co-occurrence analysis to identify terms that co-occur frequently with a set of seed terms.  ...  Methodology It was decided not to create a full co-occurrence matrix, in order to streamline and accelerate the process. Only context windows were extracted for the desired target terms.  ... 
doi:10.4018/978-1-5225-7879-6.ch006 fatcat:tgaorpe5fvepnhl7j66mkp2taa

Survey of Computational Approaches to Lexical Semantic Change [article]

Nina Tahmasebi, Lars Borin, Adam Jatowt
2019 arXiv   pre-print
Understanding the characteristics of shifts in the meaning and in the use of words is useful for those who work with the content of historical texts, the interested general public, but also in and of itself  ...  The findings from automatic lexical semantic change detection, and the models of diachronic conceptual change are currently being incorporated in approaches for measuring document across-time similarity  ...  The method creates a first order co-occurrence matrix using positive pointwise mutual information scores for each of the two sub-corpora.  ... 
arXiv:1811.06278v2 fatcat:hk73n5sf6bezjf35uqtlh3zxre

Gender Stereotype Reinforcement: Measuring the Gender Bias Conveyed by Ranking Algorithms [article]

Alessandro Fabris, Alberto Purpura, Gianmaria Silvello, Gian Antonio Susto
2020 arXiv   pre-print
Similar biases were found encoded in Word Embeddings (WEs) learned from large online corpora.  ...  To the best of our knowledge, GSR is the first specifically tailored measure for IR, capable of quantifying representational harms.  ...  Based on co-occurrence with intrinsically gendered terms within the text corpora (such as woman and man), a genderedness score can be derived for each word in the embedding space.  ... 
arXiv:2009.01334v1 fatcat:7cwmnyak4ncsbeifjlmi7afvgm

Mining social media data for biomedical signals and health-related behavior [article]

Rion Brattig Correia and Ian B. Wood and Johan Bollen and Luis M. Rocha
2020 arXiv   pre-print
From cohort level discussions of a condition to planetary level analyses of sentiment, social media has provided scientists with unprecedented amounts of data to study human behavior and response associated  ...  with a variety of health conditions and medical treatments.  ...  This tool extended the original 72 terms in the POMS questionnaire to a dictionary of 964 words by looking at co-occurrences in Google's 4, and 5-gram corpora.  ... 
arXiv:2001.10285v1 fatcat:rbpjmvltkjderdrbzz4midfxey

Why the quantitative analysis of diachronic corpora that does not consider the temporal aspect of time-series can lead to wrong conclusions

Alexander Koplenig
2015 Digital Scholarship in the Humanities  
Since the covariance measures whether values of x that are above/below average tend to co-occur with values of y that are above/below average, then by mathematical necessity, the correlation coefficient  ...  In the absence of information about the texts that the German Google Books corpus compiles, this analysis supports the argument that the corpus was strongly biased toward volumes published in Switzerland  ...  However, the autoregressive integrated moving average (ARIMA or ARMAX) models he uses in order to predict the relative frequency of a keyword on the basis of the POLITY2 score (a measure of the level of  ... 
doi:10.1093/llc/fqv030 dblp:journals/lalc/Koplenig17 fatcat:vapkalbpabdr5bvtbmlbojp7pq

Discovering and Interpreting Conceptual Biases in Online Communities [article]

Xavier Ferrer-Aran, Tom van Nuenen, Natalia Criado, Jose M. Such
2020 arXiv   pre-print
Recently, ML-based NLP methods such as word embeddings have been shown to learn such language biases with striking accuracy.  ...  Language carries implicit human biases, functioning both as a reflection and a perpetuation of stereotypes that people carry with them.  ...  Given a word embeddings model built from a text corpora, and two sets of words representing the attribute concepts (e.g. men/women) one wants to discover biases towards, a common approach in the literature  ... 
arXiv:2010.14448v1 fatcat:gdz7sfx45zddjgfrbs23dnwxxq

Mining Social Media Data for Biomedical Signals and Health-Related Behavior

Rion Brattig Correia, Ian B. Wood, Johan Bollen, Luis M. Rocha
2020 Annual Review of Biomedical Data Science  
From cohort-level discussions of a condition to population-level analyses of sentiment, social media have provided scientists with unprecedented amounts of data to study human behavior associated with  ...  Automatic methods to deal with language inconsistency include automatic topic modeling and word embedding techniques that cluster similar terms according to their co-occurrence patterns with other terms  ...  Diurnal and seasonal rhythms measured from Twitter data were found to be correlated with positive and negative sentiment as measured by LIWC.  ... 
doi:10.1146/annurev-biodatasci-030320-040844 pmid:32550337 pmcid:PMC7299233 fatcat:ae52gyu4rjebdd3s4mj75lafky

A narrowing of AI research? [article]

Joel Klinger, Juan Mateos-Garcia, Konstantinos Stathoulopoulos
2022 arXiv   pre-print
through the citations they receive and their collaborations with other institutions.  ...  The arrival of deep learning techniques able to infer patterns from large datasets has dramatically improved the performance of Artificial Intelligence (AI) systems.  ...  each sub-corpora, with 30 runs per comparison.  ... 
arXiv:2009.10385v4 fatcat:v4jxjhboxrcqxdfh2ppihzaj3q

A Multi-Lingually Applicable Journalist Toolset For The Big-Data Era

G. Kiomourtzis, G. Giannakopoulos, V. Karkaletsis, A. Kosmopoulos
2016 Zenodo  
The project FREME has received funding from the EU's Horizon 2020 programme under grant agreement No. 644 771.  ...  This research was supported in part by a Discovery and Innovation Research Seed award from the Office of the Vice Provost for Research at Cornell.  ...  clusters in a corpus based on the distribution of co-occurrence probabilities across word vectors.  ... 
doi:10.5281/zenodo.1242850 fatcat:nfkqg7jhjffdvgezdjzc6xxppa
« Previous Showing results 1 — 15 out of 238 results