Filters








77 Hits in 1.7 sec

Pagination: It's what you say, not how long it takes to say it [article]

Joshua Hailpern, Niranjan Damera Venkata, Marina Danilevsky
2014 arXiv   pre-print
Pagination - the process of determining where to break an article across pages in a multi-article layout is a common layout challenge for most commercially printed newspapers and magazines. To date, no one has created an algorithm that determines a minimal pagination break point based on the content of the article. Existing approaches for automatic multi-article layout focus exclusively on maximizing content (number of articles) and optimizing aesthetic presentation (e.g., spacing between
more » ... acing between articles). However, disregarding the semantic information within the article can lead to overly aggressive cutting, thereby eliminating key content and potentially confusing the reader, or setting too generous of a break point, thereby leaving in superfluous content and making automatic layout more difficult. This is one of the remaining challenges on the path from manual layouts to fully automated processes that still ensure article content quality. In this work, we present a new approach to calculating a document minimal break point for the task of pagination. Our approach uses a statistical language model to predict minimal break points based on the semantic content of an article. We then compare 4 novel candidate approaches, and 4 baselines (currently in use by layout algorithms). Results from this experiment show that one of our approaches strongly outperforms the baselines and alternatives. Results from a second study suggest that humans are not able to agree on a single "best" break point. Therefore, this work shows that a semantic-based lower bound break point prediction is necessary for ideal automated document synthesis within a real-world context.
arXiv:1404.3233v1 fatcat:qg35zvhfaje5vacewxvw2ubp3q

ACES

Joshua Hailpern, Marina Danilevsky, Andrew Harris, Sunah Suh, Reed LaBotz, Karrie Karahalios
2013 Proceedings of the 2013 conference on Computer supported cooperative work - CSCW '13  
While conducting research focused on individuals with impairments is vitally important, such experiments often have high costs (time and money), and researchers may be limited in the instructions they can give, or participant feedback they can gather (due to the impairment). We present how an impairment emulation system (ACES) can be used by researchers in the behavioral sciences. By repurposing this new technology within the context of a "traditional" psychology experiment, we were able to
more » ... we were able to analyze impaired linguistic and communication in a manner that was not possible without a system such as ACES. Our experiment on 96 participants provided strong support for a theory in the aphasia psychology community, and uncovered new understandings of how people communicate when one interlocutor's speech is distorted with aphasia. These findings illustrate a new direction of HCI research that directly helps researchers in Psychology, Communication, and Speech and Hearing Science. ...$15.00. efits to test and validate theories, and run "traditional" Psychology experiments targeting individuals with impairments. RELATED WORK As this work leverages an aphasia emulator, we describe aphasia and other tools that have emulated non-language disorders. We then highlight the relevance of ACES-like solutions for researchers by briefly discussing how language and communication is examined the existing technical and nontechnical literature. Aphasia Aphasia is a term that describes an acquired language disorder that impairs an individual's ability to produce and understand language [5] in both written and spoken forms [6] . Aphasia is associated with individuals that have brain damage (e.g. stroke), though the manifestation (symptoms and severity) can vary based on the location and type of damage to the brain. Based on the variety of aphasia "flavors," classification systems were created to help researchers, clinicians, and individuals [15, 42] . HCI research on aphasia has largely focused the remedying communication challenges via image based communication in mobile phones [3] , and day-to-day interaction [9, 1] . Also of note is the technology based research to aid individuals with aphasia in speech therapy [34] and scheduling their daily activities [31] .
doi:10.1145/2441776.2441835 dblp:conf/cscw/HailpernDHSLK13 fatcat:ogyhiosporfobpu53arazms4zm

ACES

Joshua Hailpern, Marina Danilevsky, Karrie Karahalios
2011 The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility - ASSETS '11  
To an outsider it may appear as though an individual with aphasia has poor cognitive function. However, the problem resides in the individual's receptive and expressive language, and not in their ability to think. This misperception, paired with a lack of empathy, can have a direct impact on quality of life and medical care. Hailpern's 2011 paper on ACES demonstrated a novel system that enabled users (e.g., caregivers, therapists, family) to experience first hand the communication-distorting
more » ... ation-distorting effects of aphasia. While their paper illustrated the impact of ACES on empathy, it did not validate the underlying distortion emulation. This paper provides a validation of ACES' distortions through a Turing Test experiment with participants from the Speech and Hearing Science community. It illustrates that text samples generated with ACES distortions are generally not distinguishable from text samples originating from individuals with aphasia. This paper explores ACES distortions through a 'How Human' is it test, in which participants explicitly rate how human-or computer-like distortions appear to be.
doi:10.1145/2049536.2049553 dblp:conf/assets/HailpernDK11 fatcat:ewj6x276gfgfzkmbbl2gui5cqm

KERT: Automatic Extraction and Ranking of Topical Keyphrases from Content-Representative Document Titles [article]

Marina Danilevsky, Chi Wang, Nihit Desai, Jingyi Guo, Jiawei Han
2013 arXiv   pre-print
We introduce KERT (Keyphrase Extraction and Ranking by Topic), a framework for topical keyphrase generation and ranking. By shifting from the unigram-centric traditional methods of unsupervised keyphrase extraction to a phrase-centric approach, we are able to directly compare and rank phrases of different lengths. We construct a topical keyphrase ranking function which implements the four criteria that represent high quality topical keyphrases (coverage, purity, phraseness, and completeness).
more » ... nd completeness). The effectiveness of our approach is demonstrated on two collections of content-representative titles in the domains of Computer Science and Physics.
arXiv:1306.0271v1 fatcat:w6iib3arkjbvjpbbnama73ovxy

A Survey of the State of Explainable AI for Natural Language Processing [article]

Marina Danilevsky, Kun Qian, Ranit Aharonov, Yannis Katsis, Ban Kawas, Prithviraj Sen
2020 arXiv   pre-print
Recent years have seen important advances in the quality of state-of-the-art models, but this has come at the expense of models becoming less interpretable. This survey presents an overview of the current state of Explainable AI (XAI), considered within the domain of Natural Language Processing (NLP). We discuss the main categorization of explanations, as well as the various ways explanations can be arrived at and visualized. We detail the operations and explainability techniques currently
more » ... ques currently available for generating explanations for NLP model predictions, to serve as a resource for model developers in the community. Finally, we point out the current gaps and encourage directions for future work in this important research area.
arXiv:2010.00711v1 fatcat:7si7hkcknzchbb7gdujew5sbiq

Constructing topical hierarchies in heterogeneous information networks

Chi Wang, Jialu Liu, Nihit Desai, Marina Danilevsky, Jiawei Han
2014 Knowledge and Information Systems  
A digital data collection (e.g., scientific publications, enterprise reports, news, and social media) can often be modeled as a heterogeneous information network, linking text with multiple types of entities. Constructing high-quality concept hierarchies that can represent topics at multiple granularities benefits tasks such as search, information browsing, and pattern mining. In this work we present an algorithm for recursively constructing multi-typed topical hierarchies. Contrary to
more » ... ontrary to traditional text-based topic modeling, our approach handles both textual phrases and multiple types of entities by a newly designed clustering and ranking algorithm for heterogeneous network data, as well as mining and ranking topical patterns of different types. Our experiments on datasets from two different domains demonstrate that our algorithm yields high quality, multi-typed topical hierarchies.
doi:10.1007/s10115-014-0777-4 fatcat:exhvtiyr7nd4nhshh5kf2uoiau

Graph Regularized Transductive Classification on Heterogeneous Information Networks [chapter]

Ming Ji, Yizhou Sun, Marina Danilevsky, Jiawei Han, Jing Gao
2010 Lecture Notes in Computer Science  
A heterogeneous information network is a network composed of multiple types of objects and links. Recently, it has been recognized that strongly-typed heterogeneous information networks are prevalent in the real world. Sometimes, label information is available for some objects. Learning from such labeled and unlabeled data via transductive classification can lead to good knowledge extraction of the hidden network structure. However, although classification on homogeneous networks has been
more » ... orks has been studied for decades, classification on heterogeneous networks has not been explored until recently. In this paper, we consider the transductive classification problem on heterogeneous networked data which share a common topic. Only some objects in the given network are labeled, and we aim to predict labels for all types of the remaining objects. A novel graph-based regularization framework, GNetMine, is proposed to model the link structure in information networks with arbitrary network schema and arbitrary number of object/link types. Specifically, we explicitly respect the type differences by preserving consistency over each relation graph corresponding to each type of links separately. Efficient computational schemes are then introduced to solve the corresponding optimization problem. Experiments on the DBLP data set show that our algorithm significantly improves the classification accuracy over existing state-of-theart methods. ii To my family for all their love. iii Acknowledgments First and foremost, I deeply thank my advisor Prof. Jiawei Han whose guidance and support helped me during the whole procedure of writing this thesis. Moreover, I would like to express my gratitude to all professors and colleagues in my research group
doi:10.1007/978-3-642-15880-3_42 fatcat:r47krvy7tbgkvdponhgzcdh2za

ACES

Joshua Hailpern, Marina Danilevsky, Andrew Harris, Karrie Karahalios, Gary Dell, Julie Hengst
2011 Proceedings of the 2011 annual conference on Human factors in computing systems - CHI '11  
Individuals with aphasia, an acquired communication disorder, constantly struggle against a world that does not understand them. This lack of empathy and understanding negatively impacts their quality of life. While aphasic individuals may appear to have lost cognitive functioning, their impairment relates to receptive and expressive language, not to thinking processes. We introduce a novel system and model, Aphasia Characteristics Emulation Software (ACES), enabling users (e.g., caregivers,
more » ... .g., caregivers, speech therapists and family) to experience, firsthand, the communication-distorting effects of aphasia. By allowing neurologically typical individuals to "walk in another's shoes," we aim to increase patience, awareness and understanding. ACES was grounded in the communication science and psychological literature, and informed by an initial pilot study. Results from an evaluation of 64 participants indicate that ACES provides a rich experience that increases understanding and empathy for aphasia. We describe aphasia, empathy and how our work builds upon, and extends, the literature related to aphasia. Aphasia Aphasia is a term used to describe an acquired language disorder that is caused by damage to the left or dominant hemisphere of the brain and impairs an individual's ability to produce and understand language in both written and spoken forms [1] . The severity and pattern of aphasic symptoms vary, depending in part on the specific locations of brain damage. Clinical researchers have developed classification systems that identify different patterns or sub-types of aphasia. For example, diagnostic batteries [12,27] based on the Boston classification system are designed to categorize an individual's aphasia symptoms as either a type of non-fluent aphasia (Broca's, Transcortical Motor, Global) or fluent aphasia (Wernicke, Transcortical Sensory, Conduction, Anomic). Of particular interest to the goals of the current study is that all individuals with aphasia will display at least some difficulty with writing, and although writing may be more or less impaired than spoken language, the linguistic deficits in writing will be generally consistent with those of the person's spoken language [2] . Recent research focusing on issues of treatment and functional recovery in aphasia [5] has drawn attention to the need for clinical interventions to attend not only to the areas of deficits in the patient with aphasia, but also to the person's communicative and social systems more broadly. This paper focuses on increasing empathy for those interacting with aphasics. * Half values are due to even number of data points where the vale separating the higher half from the lower half lies between two different value, resulting in a median which is the average of the two values. For example a data set of [1, 4, 4, 5, 5, 7] would have a median of 4.5 although 4.5 is not a possible value. * Half values are due to even number of data points where the vale separating the higher half from the lower half lies between two different value, resulting in a median which is the average of the two values. For example a data set of [1, 4, 4, 5, 5, 7] would have a median of 4.5 although 4.5 is not a possible value. * Half values are due to even number of data points where the vale separating the higher half from the lower half lies between two different value, resulting in a median which is the average of the two values. For example a data set of [1, 4, 4, 5, 5, 7] would have a median of 4.5 although 4.5 is not a possible value. * Half values are due to even number of data points where the vale separating the higher half from the lower half lies between two different value, resulting in a median which is the average of the two values. For example a data set of [1, 4, 4, 5, 5, 7] would have a median of 4.5 although 4.5 is not a possible value. * Half values are due to even number of data points where the vale separating the higher half from the lower half lies between two different value, resulting in a median which is the average of the two values. For example a data set of [1, 4, 4, 5, 5, 7] would have a median of 4.5 although 4.5 is not a possible value. * Half values are due to even number of data points where the vale separating the higher half from the lower half lies between two different value, resulting in a median which is the average of the two values. For example a data set of [1, 4, 4, 5, 5, 7] would have a median of 4.5 although 4.5 is not a possible value. * Half values are due to even number of data points where the vale separating the higher half from the lower half lies between two different value, resulting in a median which is the average of the two values. For example a data set of [1, 4, 4, 5, 5, 7] would have a median of 4.5 although 4.5 is not a possible value. * Half values are due to even number of data points where the vale separating the higher half from the lower half lies between two different value, resulting in a median which is the average of the two values. For example a data set of [1, 4, 4, 5, 5, 7] would have a median of 4.5 although 4.5 is not a possible value. * Half values are due to even number of data points where the vale separating the higher half from the lower half lies between two different value, resulting in a median which is the average of the two values. For example a data set of [1, 4, 4, 5, 5, 7] would have a median of 4.5 although 4.5 is not a possible value. * Half values are due to even number of data points where the vale separating the higher half from the lower half lies between two different value, resulting in a median which is the average of the two values. For example a data set of [1, 4, 4, 5, 5, 7] would have a median of 4.5 although 4.5 is not a possible value. † Question Choices
doi:10.1145/1978942.1979029 dblp:conf/chi/HailpernDHKDH11 fatcat:pgjqalkconhejcftrqrtwzdgby

SCENE: Structural Conversation Evolution NEtwork

Marina Danilevsky, Joshua Hailpern, Jiawei Han
2011 2011 International Conference on Advances in Social Networks Analysis and Mining  
Sections of this work are reprinted, with permission, from Marina Danilevsky, Joshua Hailpern, and Jiawei Han. SCENE: Structural Conversation Evolution NEtwork.  ...  Danilevsky was born in Moscow, Russia, and moved to the north suburbs of Chicago with her family at a young age.  ... 
doi:10.1109/asonam.2011.117 dblp:conf/asunam/DanilevskyHH11 fatcat:zbeqszydx5f5nj3dide32o4u4i

Walking in another's shoes

Joshua Hailpern, Marina Danilevsky, Karrie Karahalios
2010 Proceedings of the 12th international ACM SIGACCESS conference on Computers and accessibility - ASSETS '10  
The impact of living in a world that does not understand your impairment can be frustrating and a daunting task. Consider how an individual would feel if their family, friends, or doctors did not understand or were not even empathetic to daily struggles brought on by an acquired language disorder such as Aphasia. This work seeks to shed new light on aphasia by creating an instant message client which emulates the effects of aphasia. The goal of this new system is to raise awareness, teach, and
more » ... reness, teach, and increase empathy for caregivers, family members, and doctors/therapists who work with this population on a daily basis.
doi:10.1145/1878803.1878880 dblp:conf/assets/HailpernDK10 fatcat:ox2mdy2pa5hxpi4dg7adqnzyma

Information graph model and application to online advertising

Marina Danilevsky, Eunyee Koh
2013 Proceedings of the 1st workshop on User engagement optimization - UEO '13  
We present an algorithm which adapts a graph-based ranking model to the context of the problem of improving the process of serving advertisements to users. We transform the ad-based clickstream data into a heterogeneous graph model which respects differences in feature types (e.g. geolocation features, or browser-history features). The heterogeneous network model generates meaningful rankings of features which are predictive for each ad, as demonstrated by our classifier's performance. We also
more » ... rformance. We also discuss how, in addition to serving as the basis for a classifier, this model may also provide an informative view of the data, which is not possible with black-box approaches, and which therefore makes it very suitable to the problem space of targeted ad serving.
doi:10.1145/2512875.2512878 dblp:conf/cikm/DanilevskyK13 fatcat:rb7r65fi2vdehi4b5fyr3dl3we

AMETHYST

Marina Danilevsky, Chi Wang, Fangbo Tao, Son Nguyen, Gong Chen, Nihit Desai, Lidan Wang, Jiawei Han
2013 Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '13  
In this demo we present AMETHYST, a system for exploring and analyzing a topical hierarchy constructed from a heterogeneous information network (HIN). HINs, composed of multiple types of entities and links are very common in the real world. Many have a text component, and thus can benefit from a high quality hierarchical organization of the topics in the network dataset. By organizing the topics into a hierarchy, AMETHYST helps understand search results in the context of an ontology, and
more » ... ntology, and explain entity relatedness at different granularities. The automatically constructed topical hierarchy reflects a domain-specific ontology, interacts with multiple types of linked entities, and can be tailored for both free text and OLAP queries.
doi:10.1145/2487575.2487716 dblp:conf/kdd/DanilevskyWTNCDWH13 fatcat:z3r2hwkcb5dorcfj77y2z2qyxi

Constructing Topical Hierarchies in Heterogeneous Information Networks

Chi Wang, Marina Danilevsky, Jialu Liu, Nihit Desai, Heng Ji, Jiawei Han
2013 2013 IEEE 13th International Conference on Data Mining  
A digital data collection (e.g., scientific publications, enterprise reports, news, and social media) can often be modeled as a heterogeneous information network, linking text with multiple types of entities. Constructing high-quality concept hierarchies that can represent topics at multiple granularities benefits tasks such as search, information browsing, and pattern mining. In this work we present an algorithm for recursively constructing multi-typed topical hierarchies. Contrary to
more » ... ontrary to traditional text-based topic modeling, our approach handles both textual phrases and multiple types of entities by a newly designed clustering and ranking algorithm for heterogeneous network data, as well as mining and ranking topical patterns of different types. Our experiments on datasets from two different domains demonstrate that our algorithm yields high quality, multi-typed topical hierarchies.
doi:10.1109/icdm.2013.53 dblp:conf/icdm/WangDLDJH13 fatcat:rfn4w3f7ufeg5fw7asip4exxfy

Ranking-based classification of heterogeneous information networks

Ming Ji, Jiawei Han, Marina Danilevsky
2011 Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11  
It has been recently recognized that heterogeneous information networks composed of multiple types of nodes and links are prevalent in the real world. Both classification and ranking of the nodes (or data objects) in such networks are essential for network analysis. However, so far these approaches have generally been performed separately. In this paper, we combine ranking and classification in order to perform more accurate analysis of a heterogeneous information network. Our intuition is that
more » ... r intuition is that highly ranked objects within a class should play more important roles in classification. On the other hand, class membership information is important for determining a quality ranking over a dataset. We believe it is therefore beneficial to integrate classification and ranking in a simultaneous, mutually enhancing process, and to this end, propose a novel ranking-based iterative classification framework, called RankClass. Specifically, we build a graph-based ranking model to iteratively compute the ranking distribution of the objects within each class. At each iteration, according to the current ranking results, the graph structure used in the ranking algorithm is adjusted so that the subnetwork corresponding to the specific class is emphasized, while the rest of the network is weakened. As our experiments show, integrating ranking with classification not only generates more accurate classes than the state-of-art classification methods on networked data, but also provides meaningful ranking of objects within each class, serving as a more informative view of the data than traditional classification.
doi:10.1145/2020408.2020603 dblp:conf/kdd/JiHD11 fatcat:rtmyo2v6ijc7fisz6qykdm5rku

Clustering in the Creative Industries: Insights from the Origins of Computer Software

Martin Campbell-Kelly, Marina Danilevsky, Daniel D. Garcia-Swartz, Shane Pederson
2010 Industry and Innovation  
We use several different sources (a 1970 Roster of Organizations in Data Processing and the 1960 and 1970 Censuses of Population) to study patterns of geographic clustering at the very origins of the software industry. We find a strong trend toward clustering of the industry in a few metropolitan areas. Furthermore, we uncover a tendency in the early software industry to agglomerate in close proximity to some of its main customers. This tendency holds even after controlling for region-specific
more » ... or region-specific heterogeneity and for the potentially endogenous nature of the software customers' location decisions. We explore the factors that may have driven the observed clustering patterns and suggest directions for further research. Abstract We use several different sources (a 1970 Roster of Organizations in Data Processing and the 1960 and 1970 Censuses of Population) to study patterns of geographic clustering at the very origins of the software industry. We find a strong trend toward clustering of the industry in a few metropolitan areas. Furthermore, we uncover a tendency in the early software industry to agglomerate in close proximity to some of its main customers. This tendency holds even after controlling for region-specific heterogeneity and for the potentially endogenous nature of the software customers' location decisions. We explore the factors that may have driven the observed clustering patterns and suggest directions for further research.
doi:10.1080/13662711003790593 fatcat:evcuwkorebf2npxdat5kjaa73u
« Previous Showing results 1 — 15 out of 77 results