9,322 Hits in 4.6 sec

About compression of vocabulary in computer oriented languages [article]

V. P. Maslov
2003 arXiv   pre-print
The author uses the entropy of the ideal Bose-Einstein gas to minimize losses in computer-oriented languages.  ...  This is a Pidgin language, which can serve as a computer oriented language and whose vocabulary is significantly simplified (the number of words is essentially decreased).  ...  etc.), the problem of "compressing" and economical coding of texts arises first of all.  ... 
arXiv:cs/0303002v2 fatcat:o2ncsq2pubh3lg6c4cy3n5afsu

Simple, Fast, and Efficient Natural Language Adaptive Compression [chapter]

Nieves R. Brisaboa, Antonio Fariña, Gonzalo Navarro, José R. Paramá
2004 Lecture Notes in Computer Science  
One of the most successful natural language compression methods is word-based Huffman.  ...  A one-pass adaptive variant of Huffman exists, but it is character-oriented and rather complex.  ...  The word-based Huffman byte oriented codes proposed in [7] obtain compression ratios on natural language close to 30% by coding with bytes instead of bits (in comparison to the bit oriented approach  ... 
doi:10.1007/978-3-540-30213-1_34 fatcat:6zyp5soayjcglf7b6nr7jjquvq

The Roumanian spelling checker ROMSP: the project overview

1995 Computer Science Journal of Moldova  
Problems of user interface engineering support by object oriented methods are of special interest.  ...  Aspects of the Roumanian spelling checker ROMSP are presented: effective vocabulary representation, similar words detection algorithms, automatic word inflection, the user interface, supporting tools,  ...  Of course, something intermediate is of interest. In our case, about 220,000 roots, 144 endings, and about 2,000 ending sets from 2 144 were sufficient.  ... 
doaj:f51d0ee9c11245b780c92578cd5ec0fd fatcat:ppfrn4ra5ndftltrhckdtw3qgi

A Two-Level Structure for Compressing Aligned Bitexts [chapter]

Joaquín Adiego, Nieves R. Brisaboa, Miguel A. Martínez-Prieto, Felipe Sánchez-Martínez
2009 Lecture Notes in Computer Science  
A bitext, or bilingual parallel corpus, consists of two texts, each one in a different language, that are mutual translations.  ...  Our strategy is based on a two-level structure for the vocabularies, and on the use of biwords, a pair of associated words, one from each language, as basic symbols to be encoded with an ETDC [2] compressor  ...  Moreover, to evaluate the effect of our strategy of using a biword-oriented model, we also implemented ETDC compression over the bitext using two different word-oriented models.  ... 
doi:10.1007/978-3-642-03784-9_11 fatcat:w22wobqwxncm3ap7lydw5xfj5e

Efficiently decodable and searchable natural language adaptive compression

Nieves R. Brisaboa, Antonio Fariña, Gonzalo Navarro, José R. Paramá
2005 Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '05  
We address the problem of adaptive compression of natural language text, focusing on the case where low bandwidth is available and the receiver has little processing power, as in mobile applications.  ...  Moreover, we show that our technique can be adapted to avoid decompression at all in cases where the receiver only wants to detect the presence of some keywords in the document, which is useful in scenarios  ...  For example, a language classification system might look for a small set of common words of each language and use it to classify the incoming compressed text, forwarding it to a specific directory or computer  ... 
doi:10.1145/1076034.1076076 dblp:conf/sigir/BrisaboaFNP05 fatcat:fw7diimi6fht3kpjrievlthtyy

New adaptive compressors for natural language text

N. R. Brisaboa, A. Fariña, G. Navarro, J. R. Parama
2008 Software, Practice & Experience  
Semistatic byte-oriented word-based compression codes have been shown to be an attractive alternative to compress natural language text databases, because of the combination of speed, effectiveness, and  ...  In particular, our recently proposed family of dense compression codes has been shown to be superior to the more traditional byte-oriented word-based Huffman codes in most aspects.  ...  A first pass over the text gathers global statistical information about the vocabulary (list of source symbols) in order to obtain a model of the text.  ... 
doi:10.1002/spe.882 fatcat:oltpsdjtezf53jqx2krvv3ex2e

A fast dynamic compression scheme for natural language texts

Ashutosh Gupta, Suneeta Agarwal
2010 Computers and Mathematics with Applications  
The aim of designing a dynamic version of WBTC is to adapt it for real-time transmission.  ...  The problem in the semi-static technique is to perform two passes over the source text, and therefore encoding cannot start before the whole first pass has been completed.  ...  In decompression, Dynamic WBTC is about 23%-80% faster than MLZW. Comparing DyWBTC with classical compressors, ours DyWBTC outperforms in compression time.  ... 
doi:10.1016/j.camwa.2010.10.019 fatcat:4vvx463w7nbopjyf3slijyk7mq

Fast searching on compressed text allowing errors

Edleno Silva de Moura, Gonzalo Navarro, Nivio Ziviani, Ricardo Baeza-Yates
1998 Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '98  
We compress typical English texts to about 30% of their original size, against 40% and 35% for Compress and Gaip, respectively.  ...  The algorithm is based on a word-oriented shift-or algorithm and a fast Boyer-Moore-type filter. It concomitantly uses the vocabulary of the text available as part of the Huffman coding data.  ...  Aratijo, who helped particularly with the algorithms for approximate searching in the text vocabulary.  ... 
doi:10.1145/290941.291013 dblp:conf/sigir/MouraNZB98 fatcat:lwmninhsczezxgkir3txt3fh6a

Measuring Perceptual and Linguistic Complexity in Multilingual Grounded Language Data

Nisha Pillai, Cynthia Matuszek, Francis Ferraro
2021 Proceedings of the ... International Florida Artificial Intelligence Research Society Conference  
The success of grounded language acquisition using perceptual data (e.g., in robotics) is affected by the complexity of both the perceptual concepts being learned and the language describing those concepts  ...  Our work illuminates core, quantifiable statistical differences in how language is used to describe different traits of objects, and the visual representation of those objects.  ...  To measure shape complexity, we compute the compression loss of detected edges.  ... 
doi:10.32473/flairs.v34i1.128450 fatcat:ep45qo7pwnfhjc7ppocklkqofm

(S,C)-Dense Coding: An Optimized Compression Code for Natural Language Text Databases [chapter]

Nieves R. Brisaboa, Antonio Fariña, Gonzalo Navarro, María F. Esteller
2003 Lecture Notes in Computer Science  
This work presents (s, c)-Dense Code, a new method for compressing natural language texts.  ...  We formally describe the (s, c)-Dense Code and show how to compute the parameters s and c that optimize the compression for a specific corpus.  ...  Therefore, the vocabulary will be slightly smaller than in the case of the Huffman code, where some information about the shape of the tree must be stored (even when a canonical Huffman tree is used).  ... 
doi:10.1007/978-3-540-39984-1_10 fatcat:edztgxtibzcjtj7dcc66ewfv7u

Instructional Design and Practice of Specialized English, an Application-oriented Graduate Course for Mechanical Engineering

Rong ZHANG, Jian-Jun WEI, Hua GUO
2017 DEStech Transactions on Social Science Education and Human Science  
The features of Specialized English, a graduate course for mechanical engineering for applied universities, are analyzed; The goal, contents, and corresponding application-oriented instructional strategy  ...  of the course are designed and determined based on application investigation; Explorational instruction practice with multiple methods have been carried out.  ...  Acknowledgement This research was financially supported by Innovation Project of Guangxi Graduate Education, 2015.  ... 
doi:10.12783/dtssehs/esem2017/15083 fatcat:adyuridocjb4la6ivpgk3sisv4

Fast and flexible word searching on compressed text

Edleno Silva de Moura, Gonzalo Navarro, Nivio Ziviani, Ricardo Baeza-Yates
2000 ACM Transactions on Information Systems  
We compress typical English texts to about 30% of their original size, against 40% and 35% for Compress and Gzip, respectively.  ...  We present a fast compression and decompression technique for natural language texts.  ...  Araújo, who helped particularly with the algorithms for approximate searching in the text vocabulary. We also thank the many comments of the referees that helped us to improve this work.  ... 
doi:10.1145/348751.348754 fatcat:gtwbmlqconbn5jz3le3ptj4fmm


Natalia Drobysheva
2018 Vìsnik Nacìonalʹnogo Avìacìjnogo Unìversitetu  
The general trend of the economy of linguistic means for the modern period is manifested in the aviation terminology in an effort to compress the form of the term, and the form of the nomenclature terms  ...  Conclusions: Aviation vocabulary develops along the lines of general trends in scientific and technical terminology.  ...  Anthropocentrism in lexicology takes a leading position and is understood as a way out of the boundaries of the language in the field of knowledge about the surrounding reality, about the person (language  ... 
doi:10.18372/2306-1472.77.13503 fatcat:f4f2d3d4qfcxnj4jlq7rungwqe

Page 5636 of Mathematical Reviews Vol. , Issue 2004g [page]

2004 Mathematical Reviews  
In particular, we propose a logic language with a clear and declarative semantics to specify the structural features of object oriented databases and authorizations associated with complex data objects  ...  A direct advantage of this approach is that we can formally specify and reason about authorizations on data objects without losing inheritance and abstraction features of object oriented databases.” 2004g  ... 

Speech-based Interaction

Cosmin Munteanu, Gerald Penn
2016 Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems - CHI EA '16  
-Goal orientation (system-initiated prompts or mixed initiative) -Informing the user of their progress toward achieving the goal Human-Computer Interaction and ASR • HCI needs to be aware of ASR's capabilities  ...  • All these employ not only ASR, but significantly more Natural Language Processing, and a good amount of Human-Computer Interaction -not all are dedicated to speech-based input!  ...  -Codecs (lossy formats, compression, non-linear representation) • Use lossless compression (e.g. flac codec or zip) if low bandwidth • Ideally use only uncompressed formats (wav or raw)!  ... 
doi:10.1145/2851581.2856689 dblp:conf/chi/MunteanuP16 fatcat:c2qfv4eukjhyti7brvvzuyv6je
« Previous Showing results 1 — 15 out of 9,322 results