Filters








22,843 Hits in 6.2 sec

Enriching feature engineering for short text samples by language time series analysis

Yichen Tang, Kelly Blincoe, Andreas W. Kempa-Liehr
2020 EPJ Data Science  
In this case study, we are extending feature engineering approaches for short text samples by integrating techniques which have been introduced in the context of time series classification and signal processing  ...  The resulting language time series can be characterised by collections of established time series feature extraction algorithms from time series analysis and signal processing.  ...  Funding This research was supported by the Faculty of Engineering of the University of Auckland. Availability of data and materials Data are available from [17] and [61] .  ... 
doi:10.1140/epjds/s13688-020-00244-9 fatcat:aichvkviebgf3ov5oxoujl4qdu

ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients' disease history

Ayoub Bagheri, Arjan Sammani, Peter G. M. van der Heijden, Folkert W. Asselbergs, Daniel L. Oberski
2020 Journal of Intelligent Information Systems  
The ETM enriches text representation by incorporating probability distributions generated by an unsupervised algorithm into it.  ...  This study proposes the ETM (enrichment by topic modeling) algorithm, based on latent Dirichlet allocation, to smoothen the semantic representations of short sentences.  ...  There are two reasons for this: (1) Feature engineering in the Crest and ETM approaches has been proposed especially for the short text classification problem. (2) The trained word vectors are not rich  ... 
doi:10.1007/s10844-020-00605-w fatcat:ho7ujxwy3fhijbyiiko66wkw5i

Fault Classification Method for Power Dispatching Log Based on Text Mining

JIA-YI ZHU, XIANG-YANG GONG, ZHEN-HUA CAI, YU-ZHE XIE, XIA-MING YE, YUN QIU
2018 DEStech Transactions on Engineering and Technology Research  
In power grid scheduling, as the basis for the dispatcher to record the operation status of the power grid, the scheduling log usually uses short text to record the current state of the power grid, accident  ...  After obtaining the TF-IDF feature of the text and creating the Word2Vec feature model, we compare three kinds of text classification algorithms, the nearest neighbor algorithm, Naive Bayes and support  ...  Acknowledgement This work was partially supported by the science and technology project of Zhejiang Electric Power Corporation (Grant No. 5211NB160006).  ... 
doi:10.12783/dtetr/pmsms2018/24915 fatcat:oyxq7wq6wvaltasl6dv5r4xeaq

Research on Sentiment Analysis Model of Short Text Based on Deep Learning

Zhou Gui Zhou, Jie Liu
2022 Scientific Programming  
neural network features in deep learning and learning the short text by combining shallow learning and deep learning.  ...  potential emotional features of short texts.  ...  Learning features enrich the textual feature representation of short texts. In the near future, deep learning has also been widely used in sentiment analysis of short texts.  ... 
doi:10.1155/2022/2681533 fatcat:ndquxxc2sjgj5kig4rea5u7ppu

Integration of Text and Graph-based Features for Detecting Mental Health Disorders from Voice [article]

Nasser Ghadiri, Rasoul Samani, Fahime Shahrokh
2022 arXiv   pre-print
In this paper, two methods are used to enrich voice analysis for depression detection: graph transformation of voice signals, and natural language processing of the transcript based on representational  ...  The results of experiments with the DAIC-WOZ dataset suggest that integration of text-based voice classification and learning from low level and graph-based voice signal features can improve the detection  ...  Mapping of Signals to Graphs Mapping from a time series to a complex network was proposed by (Lacasa et al., 2008) .  ... 
arXiv:2205.07006v1 fatcat:vlbsimfgondx3lktyywilwvcaa

Capturing non-functional properties through model interlinking

Mahdi Noorian, Ebrahim Bagheri, Weichang Du
2014 2014 IEEE 27th Canadian Conference on Electrical and Computer Engineering (CCECE)  
Then, using texts that are associated with the elements and through a semantics-enabled textual analysis process, the model elements will be semantically annotated with related ontological concepts.  ...  Researchers have argued that connecting intentional variability models such as goal models with feature variability models in a target domain can enrich feature models with valuable quality and non-functional  ...  Table 1 shows sample supporting texts for "3G" and "WiFi" features, and "WLAN" and "Cellular" tasks.  ... 
doi:10.1109/ccece.2014.6901063 dblp:conf/ccece/NoorianBD14 fatcat:x7yezxqumvaubhgbxdanmqb34e

Short Text Classification Improved by Feature Space Extension [article]

Yanxuan Li
2019 arXiv   pre-print
The difference between classifying short text and long documents is that short text is of shortness and sparsity.  ...  With the explosive development of mobile Internet, short text has been applied extensively.  ...  However, short text has a series of features, such as shortness, sparsity, lack of semantic and contextual information [1-2]. It brings challenges for traditional methods to achieve good performance.  ... 
arXiv:1904.01313v1 fatcat:k6gfczizc5dfzgns3xfzkocnry

Text Mining for News and Blogs Analysis [chapter]

Bettina Berendt
2016 Encyclopedia of Machine Learning and Data Mining  
models, time series, and stream mining methods.  ...  Many mining methods therefore enrich the text by, for example, the contents of referenced URLs (e.g. Abel et al., 2011) .  ... 
doi:10.1007/978-1-4899-7502-7_833-1 fatcat:ywxnluhwdfd6ldwcxjqp4x2ed4

The Paralinguistic Function of Emojis in Twitter Communication

Yasmin Tantawi, Mary Beth Rosson
2019 Zenodo  
A manual content analysis was then conducted to ascertain the paralinguistic and emotional features of the emojis used in these tweets.  ...  We present our characterization of emoji usage in Twitter and discuss implications for the design of Twitter and other text-based communication tools.  ...  The paralinguistic features of spoken language are primarily auditory and visual in nature, and verbal text is neither auditory nor visual.  ... 
doi:10.5281/zenodo.3298638 fatcat:py5ja6b5mfcobdyxdf5tjomaka

A Study of Multilingual Toxic Text Detection Approaches under Imbalanced Sample Distribution

Guizhe Song, Degen Huang, Zhifeng Xiao
2021 Information  
Multilingual characteristics, lack of annotated data, and imbalanced sample distribution are the three main challenges for toxic comment analysis in a multilingual setting.  ...  Two models, multilingual bidirectional encoder representation from transformers (MBERT) and XLM-RoBERTa (XLM-R), are employed for pre-training through Masking Language Modeling (MLM) and Translation Language  ...  Deep neural networks, on the other hand, can address this challenge by capturing the text semantic information from raw text data, without manual feature engineering and also boost the detection performance  ... 
doi:10.3390/info12050205 fatcat:q54pa3gxmvcr7o7emv4s3xpvqq

Clustering of semantically enriched short texts

Marek Kozlowski, Henryk Rybinski
2018 Journal of Intelligent Information Systems  
In addition, we test the possibilities of improving the quality of clustering ultra-short texts by means of enriching them semantically.  ...  The paper is devoted to the issue of clustering small sets of very short texts.  ...  Acknowledgments We would like to thank three anonymous referees for their valuable and constructive comments, which helped us to improve the quality of this article.  ... 
doi:10.1007/s10844-018-0541-4 fatcat:eipabygtdrdr3ji7wqth6vic4a

Wikipedia-based Semantic Interpretation for Natural Language Processing

E. Gabrilovich, S. Markovitch
2009 The Journal of Artificial Intelligence Research  
Here we propose a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts.  ...  We evaluate the effectiveness of our method on text categorization and on computing the degree of semantic relatedness between fragments of natural language text.  ...  Lee and Brandon Pincombe for making available their document similarity data.  ... 
doi:10.1613/jair.2669 fatcat:mwcky2jqx5e6zimhzgsbh5rffa

Serial Expression Analysis: a web tool for the analysis of serial gene expression data

Maria Jos� Nueda, Jos� Carbonell, Ignacio Medina, Joaqu�n Dopazo, Ana Conesa
2010 Nucleic Acids Research  
We have created the SEA (Serial Expression Analysis) suite to provide a complete web-based resource for the analysis of serial transcriptomics data.  ...  Serial transcriptomics experiments investigate the dynamics of gene expression changes associated with a quantitative variable such as time or dosage.  ...  designed for short series.  ... 
doi:10.1093/nar/gkq488 pmid:20525784 pmcid:PMC2896172 fatcat:k32krdkke5di7o66rlh77lfcmq

Application of Semantic Tagging to Academic Paper Services

Sumi Shin
2017 The International Journal of Engineering and Science  
If detailed explanation on the tagged words can also be viewed at the same time while reading a paper, in addition, readers' convenience and legibility can be improved simultaneously.  ...  Among 10 papers in 5 subjects, those with improved keyword matching accounted for 70%, 60%, 50%, 60% and 80% respectively.  ...  Ontology engineering in Semantic Web is primarily supported by languages such as RDF, RDFS and OWL [3] .  ... 
doi:10.9790/1813-0601011216 fatcat:ksl623zprvg3lm3ovktmatkyea

Analysis of the Error Pattern of HMM based Bangla ASR

Shourin R. Aura, Ahsanullah University of Science and Technology, Dhaka, Bangladesh, Md. J. Rahimi, Oli L. Baroi
2020 International Journal of Image Graphics and Signal Processing  
Finally, the results are analyzed to get the error pattern needed for future development. fast by sending voice and getting the result in the text format.  ...  Research on ASR by machine has attracted much attention over the last few decades. Bengali is largely spoken all over the world.  ...  ACKNOWLEDGEMENTS The authors would like to thank Ahsanullah University of Science and Technology for supporting this work.  ... 
doi:10.5815/ijigsp.2020.01.01 fatcat:6yfrohf47fewzmwasgiqqo3xgy
« Previous Showing results 1 — 15 out of 22,843 results