Filters








144,156 Hits in 7.0 sec

Word-based self-indexes for natural language text

Antonio Fariña, Nieves R. Brisaboa, Gonzalo Navarro, Francisco Claude, Ángeles S. Places, Eduardo Rodríguez
2012 ACM Transactions on Information Systems  
Natural language texts are then regarded as sequences of words, not characters, to achieve word-based self-indexes.  ...  The inverted index supports efficient full-text searches on natural language text collections. It requires some extra space over the compressed text that can be traded for search speed.  ...  We have also discussed how to apply them for self-indexing natural language text.  ... 
doi:10.1145/2094072.2094073 fatcat:lj4bsjt6wzccdnwwox5qc3qvjm

Self-indexing Natural Language [chapter]

Nieves R. Brisaboa, Antonio Fariña, Gonzalo Navarro, Angeles S. Places, Eduardo Rodríguez
2008 Lecture Notes in Computer Science  
In this paper we explore the possibility of regarding natural language text as a string of words and applying a self-index to it.  ...  On natural language, a compressed inverted index over the compressed text already provides a reasonable alternative, in space and time, for indexed searching of words and phrases.  ...  Conclusions and Future Work We have shown that a self-index applied to natural language text, seen as a sequence of words rather than symbols, offers a very relevant alternative to the traditional inverted  ... 
doi:10.1007/978-3-540-89097-3_13 fatcat:uk37adf4f5bfndj5c35z5a7c2e

Word-Based Statistical Compressors as Natural Language Compression Boosters

Antonio Fari, Gonzalo Navarro, Jos Param
2008 Data Compression Conference (DCC), Proceedings  
We show, for example, that the AF-FMindex coupled with Tagged Huffman coding is an attractive alternative index for natural language texts. *  ...  Semistatic word-based byte-oriented compression codes are known to be attractive alternatives to compress natural language texts.  ...  We also included in the comparison another compressor called MPPM 7 [2] that basically maps words into 2-byte ids, which are later encoded with ppmdi.  ... 
doi:10.1109/dcc.2008.14 dblp:conf/dcc/FarinaNP08 fatcat:d5kv4fzt5fff3pksm36xvwhnae

Ontology Guided Semantic Self Learning Framework

Darshika N. Koggalahewa, Asoka S. Karunananda
2015 International Journal of Knowledge Engineering-IACSIT  
Ontology Guided Semantic Self Learning Framework is a software which is capable of learning from natural language sources.  ...  It introduced a novel approach of knowledge capturing from unstructured natural language text.  ...  The knowledge available in natural language text book will be extracted by relating the contents available in different levels of text book such as Table of contents, Chapters, and Indexes and represented  ... 
doi:10.7763/ijke.2015.v1.5 fatcat:urnd23am4bggligwixp7yuj7lu

Profiling a set of personality traits of text author: what our words reveal about us

Tatiana Litvinova, Pavel Seredin, Olga Litvinova, Olga Zagorovskaya
2016 Research in Language  
Here we have used the "Personality Corpus", which consists of Russian-language texts.  ...  The aim of the study is to assess the probability of self-destructive behaviour of an individual via formal parameters of their texts.  ...  For the Russian language, this is calculated according to the formula (Oborneva, 2005): Flesch index = 206.835 -1.3 -60.1 , 1.2. Hanning Index (or Fog Index).  ... 
doi:10.1515/rela-2016-0019 fatcat:relaj3lls5aw3ce6eqgxj7pjym

Smaller Self-indexes for Natural Language [chapter]

Nieves R. Brisaboa, Gonzalo Navarro, Alberto Ordóñez
2012 Lecture Notes in Computer Science  
Self-indexes for natural-language texts, where these are regarded as token (word or separator) sequences, achieve very attractive space and search time.  ...  Interestingly, self-indexes also offer improvements on natural language indexing [5] .  ...  As a result, we show that we reduce both the space and the time of word-based self-indexes.  ... 
doi:10.1007/978-3-642-34109-0_39 fatcat:djlakjelgjbyncpdoguerxj7lu

Learning Deep Semantic Model for Code Search using CodeSearchNet Corpus [article]

Chen Wu, Ming Yan
2022 arXiv   pre-print
Different from typical information retrieval tasks, code search requires to bridge the semantic gap between the programming language and natural language, for better describing intrinsic concepts and semantics  ...  Semantic code search is the task of retrieving relevant code snippet given a natural language query.  ...  Text Retrieval-based Code Search A conventional Lucene-based search tool could be used for code search.  ... 
arXiv:2201.11313v1 fatcat:er5p53ejsbenjaywdppdrhtcyu

TALM: Tool Augmented Language Models [article]

Aaron Parisi, Yao Zhao, Noah Fiedel
2022 arXiv   pre-print
In this work, we present Tool Augmented Language Models (TALM), combining a text-only approach to augment language models with non-differentiable tools, and an iterative "self-play" technique to bootstrap  ...  Transformer based language models (LMs) demonstrate increasing performance with scale across a wide variety of tasks.  ...  Acknowledgements The authors would like to thank Noam Shazeer for early brainstorming on the path towards this work. We also thank Igor Mordatch for discussions and feedback.  ... 
arXiv:2205.12255v1 fatcat:ypwec4ponffxdnuwdybtqsaady

Page 349 of Linguistics and Language Behavior Abstracts: LLBA Vol. 28, Issue 1 [page]

1994 Linguistics and Language Behavior Abstracts: LLBA  
subject index movement analysis, static discontinuity grammar framework; 9400830 natural language negation semantics, computational model; 9401348 natural language processing advances, artificial intelligence  ...  English language instruction, student needs, English for ac- ademic purposes, Warwick U; 9400322 self-directed learning/self-access, English as a second language; col- lege students, U of Hong Kong; 9400386  ... 

Arabic Natural Language Processing: Models, systems and applications

Vito Pirrelli, Arsalane Zarghili
2017 Journal of King Saud University: Computer and Information Sciences  
a few general lessons we can learn from current research on Arabic Natural Language Processing.  ...  On reflection, in real language-based communication, noise is not simply overlaid on the message, but is actually PART of the message.  ...  Acknowledgements The original impulse of the present volume comes from the 1st International Workshop on Arabic Natural Language Processing, convened in Tetouan (Morocco)  ... 
doi:10.1016/j.jksuci.2017.04.004 fatcat:cfsf4xomjjhzraxy4ave2ked7a

Using Conceptual Graphs for Text Mining in Technical Support Services [chapter]

Michael Bogatyrev, Alexey Kolosoff
2011 Lecture Notes in Computer Science  
Text mining problems of natural text classification and fact extraction are important in developing information systems for Technical Support Services.  ...  An approach which is based on joining acquisition of conceptual graphs and keywords search technique is presented to their solution.  ...  Text filtering is standard and evidently necessary technique for e-mail texts in natural language.  ... 
doi:10.1007/978-3-642-21786-9_75 fatcat:s5jj3c6sprbc3lhj23stjngqvm

The Status and Trend of Chinese News Forecast Based on Graph Convolutional Network Pooling Algorithm

Xiao Han, Jing Peng, Tailai Peng, Rui Chen, Boyuan Hou, Xinran Xie, Zhe Cui
2022 Applied Sciences  
that cannot handle very long texts.  ...  Therefore, it can be concluded that our research has strong processing capabilities for analyzing and predicting the development trend of Chinese news events.  ...  We propose a new graph pooling method based on self-attention mechanism and the Graph U-Nets method.  ... 
doi:10.3390/app12020900 fatcat:qu3cf66hzvbmzl4ljwercrmo2u

A Hybrid Chinese Language Model based on a Combination of Ontology with Statistical Method

Dequan Zheng, Tiejun Zhao, Sheng Li, Hao Yu
2005 International Joint Conference on Natural Language Processing  
To evaluate the performance of this language model, we completed two groups of experiments on texts reordering for Chinese information retrieval and texts similarity computing.  ...  Compared with previous works, the proposed method improved the precision of nature language processing.  ...  After that, other approaches were put forward, such as the combination of statistical-based approach and rule-based approachP [4, 5] P, self-adaptive language modelsP [6] P, topic-based modelP [7]  ... 
dblp:conf/ijcnlp/ZhengZLY05 fatcat:fqnrrdqyejbptlrfwzdwkybyke

Automated COVID-19 Dialogue System Using a New Deep Learning Network

Khaldoon H. Alhussayni, Eman S. Alshamery
2021 Periodicals of Engineering and Natural Sciences (PEN)  
The interest in task-oriented dialogue systems has grown remarkably in healthcare, using natural language in the dialogue between patients and doctors.  ...  The encoder extracts important words using text normalization, resulting in two vectors: symptom vectors and doctor utterance vectors.  ...  Several NLP applications exist, including speech recognition, natural language understanding, dialogue systems, question answering, sentiment analysis, natural language generation, and natural language  ... 
doi:10.21533/pen.v9i2.1862 fatcat:ihwrog35fral5cbhqkbh6gmi4a

Segatron: Segment-Aware Transformer for Language Modeling and Understanding [article]

He Bai, Peng Shi, Jimmy Lin, Yuqing Xie, Luchen Tan, Kun Xiong, Wen Gao, Ming Li
2020 arXiv   pre-print
Transformers are powerful for sequence modeling. Nearly all state-of-the-art language models and pre-trained language models are based on the Transformer architecture.  ...  However, it distinguishes sequential tokens only with the token position index.  ...  We would like to thank Wei Zeng and his team in Peng Cheng Laboratory (PCL) for the computing resource support to this project.  ... 
arXiv:2004.14996v2 fatcat:7vbetkq6obgnvf4xwydpxa4z2q
« Previous Showing results 1 — 15 out of 144,156 results