A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Measuring Societal Biases from Text Corpora with Smoothed First-Order Co-occurrence
[article]
2021
arXiv
pre-print
Text corpora are widely used resources for measuring societal biases and stereotypes. ...
We propose an alternative approach to bias measurement utilizing the smoothed first-order co-occurrence relations between the word and the representative concept words, which we derive by reconstructing ...
Introduction Text data has been widely utilized for studying and monitoring societal phenomena -such as biases and stereotypes -commonly by exploiting co-occurrence statistics of words in text. ...
arXiv:1812.10424v4
fatcat:irlckgvcsbbpfcnsodfvhhpgou
Categorization in the Wild: Generalizing Cognitive Models to Naturalistic Data across Languages
[article]
2019
arXiv
pre-print
Models of category learning and representation, however, are typically tested on data from small-scale experiments involving small sets of concepts with artificially restricted features; and experiments ...
We present a Bayesian cognitive model designed to jointly learn categories and their structured representation from natural language text which allows us to (a) evaluate performance on a large scale, and ...
from thematically unconstrained corpora of natural text. ...
arXiv:1902.08830v1
fatcat:w5y7aavbbnfnjmj6xlmbbmc5fq
A Geometry-Driven Longitudal Topic Model
2021
Harvard data science review
A simple and scalable framework for longitudinal analysis of Twitter data is developed that combines latent topic models with computational geometric methods. ...
Dimensionality reduction tools from computational geometry are applied to learn the intrinsic manifold on which the latent, temporal topics reside. ...
., word co-occurrences) from which the latent topics can be learned. ...
doi:10.1162/99608f92.b447c07e
doaj:23143f3dd7e449d1be6552936c5d8e55
fatcat:vmvz4j3ax5eohp2hc2pg6kzteq
Exploring the Possibility of Peak Individualism, Humanity's Existential Crisis, and an Emerging Age of Purpose
2017
Frontiers in Psychology
ACKNOWLEDGMENTS This research was created in collaboration with Linda Kay Klein at Echoing Green as part of their Work on Purpose Program that inspires and equips those in the first decade of their careers ...
The article was inspired by Linda's questions inquiring into how Echoing Green might measure the program's success. ...
The graphs were made with the Google Books Ngram Viewer with a smoothing of 3. Figures 5, 6 . These results are consistent across both the English corpora and the English 1M corpora. ...
doi:10.3389/fpsyg.2017.01478
pmid:28928689
pmcid:PMC5591862
fatcat:hr52fhjulvfrpcoinftffdhcdm
Gender Bias, Social Bias and Representation: 70 Years of B^Hollywood
[article]
2021
arXiv
pre-print
While the number of lives Bollywood can potentially touch is massive, no comprehensive NLP study on the evolution of social and gender biases in Bollywood dialogues exists. ...
In this project, we propose to analyze such trends over 70 years of Bollywood movies contrasting them with their Hollywood counterpart and critically acclaimed world movies. ...
Conclusions In this paper, we analyzed how social biases and subtle gender biases get reflected on diachronic corpora of popular entertainment. ...
arXiv:2102.09103v1
fatcat:ku6inruvh5e2jandkbsfqw4odq
Semantic changes in harm-related concepts in English
[chapter]
2021
Zenodo
We then continue with a more detailed study in order to understand how exactly the concepts changed, and to do so employ and evaluate different types of semantic representations. ...
Here we apply computational models in order to address the concept creep hypothesis. ...
For each concept we first constructed a list of words which the concept most often co-occurred with within each time period. ...
doi:10.5281/zenodo.5040304
fatcat:w6gyf3twknf2jaotwu7jdui5na
Designing an Extensible Domain-Specific Web Corpus for "Layfication"
[chapter]
2019
Advances in Systems Analysis, Software Engineering, and High Performance Computing
Jiang and Yang (2013) used co-occurrence analysis to identify terms that co-occur frequently with a set of seed terms. ...
Methodology It was decided not to create a full co-occurrence matrix, in order to streamline and accelerate the process. Only context windows were extracted for the desired target terms. ...
doi:10.4018/978-1-5225-7879-6.ch006
fatcat:tgaorpe5fvepnhl7j66mkp2taa
Survey of Computational Approaches to Lexical Semantic Change
[article]
2019
arXiv
pre-print
Understanding the characteristics of shifts in the meaning and in the use of words is useful for those who work with the content of historical texts, the interested general public, but also in and of itself ...
The findings from automatic lexical semantic change detection, and the models of diachronic conceptual change are currently being incorporated in approaches for measuring document across-time similarity ...
The method creates a first order co-occurrence matrix using positive pointwise mutual information scores for each of the two sub-corpora. ...
arXiv:1811.06278v2
fatcat:hk73n5sf6bezjf35uqtlh3zxre
Gender Stereotype Reinforcement: Measuring the Gender Bias Conveyed by Ranking Algorithms
[article]
2020
arXiv
pre-print
Similar biases were found encoded in Word Embeddings (WEs) learned from large online corpora. ...
To the best of our knowledge, GSR is the first specifically tailored measure for IR, capable of quantifying representational harms. ...
Based on co-occurrence with intrinsically gendered terms within the text corpora (such as woman and man), a genderedness score can be derived for each word in the embedding space. ...
arXiv:2009.01334v1
fatcat:7cwmnyak4ncsbeifjlmi7afvgm
Mining social media data for biomedical signals and health-related behavior
[article]
2020
arXiv
pre-print
From cohort level discussions of a condition to planetary level analyses of sentiment, social media has provided scientists with unprecedented amounts of data to study human behavior and response associated ...
with a variety of health conditions and medical treatments. ...
This tool extended the original 72 terms in the POMS questionnaire to a dictionary of 964 words by looking at co-occurrences in Google's 4, and 5-gram corpora. ...
arXiv:2001.10285v1
fatcat:rbpjmvltkjderdrbzz4midfxey
Why the quantitative analysis of diachronic corpora that does not consider the temporal aspect of time-series can lead to wrong conclusions
2015
Digital Scholarship in the Humanities
Since the covariance measures whether values of x that are above/below average tend to co-occur with values of y that are above/below average, then by mathematical necessity, the correlation coefficient ...
In the absence of information about the texts that the German Google Books corpus compiles, this analysis supports the argument that the corpus was strongly biased toward volumes published in Switzerland ...
However, the autoregressive integrated moving average (ARIMA or ARMAX) models he uses in order to predict the relative frequency of a keyword on the basis of the POLITY2 score (a measure of the level of ...
doi:10.1093/llc/fqv030
dblp:journals/lalc/Koplenig17
fatcat:vapkalbpabdr5bvtbmlbojp7pq
Discovering and Interpreting Conceptual Biases in Online Communities
[article]
2020
arXiv
pre-print
Recently, ML-based NLP methods such as word embeddings have been shown to learn such language biases with striking accuracy. ...
Language carries implicit human biases, functioning both as a reflection and a perpetuation of stereotypes that people carry with them. ...
Given a word embeddings model built from a text corpora, and two sets of words representing the attribute concepts (e.g. men/women) one wants to discover biases towards, a common approach in the literature ...
arXiv:2010.14448v1
fatcat:gdz7sfx45zddjgfrbs23dnwxxq
Mining Social Media Data for Biomedical Signals and Health-Related Behavior
2020
Annual Review of Biomedical Data Science
From cohort-level discussions of a condition to population-level analyses of sentiment, social media have provided scientists with unprecedented amounts of data to study human behavior associated with ...
Automatic methods to deal with language inconsistency include automatic topic modeling and word embedding techniques that cluster similar terms according to their co-occurrence patterns with other terms ...
Diurnal and seasonal rhythms measured from Twitter data were found to be correlated with positive and negative sentiment as measured by LIWC. ...
doi:10.1146/annurev-biodatasci-030320-040844
pmid:32550337
pmcid:PMC7299233
fatcat:ae52gyu4rjebdd3s4mj75lafky
A narrowing of AI research?
[article]
2022
arXiv
pre-print
through the citations they receive and their collaborations with other institutions. ...
The arrival of deep learning techniques able to infer patterns from large datasets has dramatically improved the performance of Artificial Intelligence (AI) systems. ...
each sub-corpora, with 30 runs per comparison. ...
arXiv:2009.10385v4
fatcat:v4jxjhboxrcqxdfh2ppihzaj3q
A Multi-Lingually Applicable Journalist Toolset For The Big-Data Era
2016
Zenodo
The project FREME has received funding from the EU's Horizon 2020 programme under grant agreement No. 644 771. ...
This research was supported in part by a Discovery and Innovation Research Seed award from the Office of the Vice Provost for Research at Cornell. ...
clusters in a corpus based on the distribution of co-occurrence probabilities across word vectors. ...
doi:10.5281/zenodo.1242850
fatcat:nfkqg7jhjffdvgezdjzc6xxppa
« Previous
Showing results 1 — 15 out of 238 results