Filters








13,195 Hits in 5.7 sec

Compression techniques for Chinese text

Phil Vines, Justin Zobel
1998 Software, Practice & Experience  
In this paper we survey proposals for compressing Chinese text, then examine in detail the application to Chinese text of the partial predictive matching compression technique (PPM).  ...  We propose several refinements to PPM to make it more effective for Chinese text, and, on our publicly-available test corpus of around 50 Mb of Chinese text documents, show that these refinements can significantly  ...  Acknowledgements We thank Alistair Moffat, for his advice and for the initial implementation of PPM used in our experiments, and George Fernandez.  ... 
doi:10.1002/(sici)1097-024x(1998100)28:12<1299::aid-spe203>3.0.co;2-e fatcat:jevbl2f4brbkvk7bgihtga24n4

Compression techniques for Chinese text

Phil Vines, Justin Zobel
1998 Software, Practice & Experience  
In this paper we survey proposals for compressing Chinese text, then examine in detail the application to Chinese text of the partial predictive matching compression technique (PPM).  ...  We propose several refinements to PPM to make it more effective for Chinese text, and, on our publicly-available test corpus of around 50 Mb of Chinese text documents, show that these refinements can significantly  ...  Acknowledgements We thank Alistair Moffat, for his advice and for the initial implementation of PPM used in our experiments, and George Fernandez.  ... 
doi:10.1002/(sici)1097-024x(1998100)28:12<1299::aid-spe203>3.3.co;2-5 fatcat:cjkvk2nemnhjpm5ezaumyehjki

A Syllable-Based Technique for Uyghur Text Compression

Wayit Abliz, Hao Wu, Maihemuti Maimaiti, Jiamila Wushouer, Kahaerjiang Abiderexiti, Tuergen Yibulayin, Aishan Wumaier
2020 Information  
To enable the coding scheme to process Uyghur texts mixed with other language symbols, we introduced a flag code in the compression process to distinguish the Unicode encodings that were not in the code  ...  Our compression schemes outperformed GZip, BZip2, and the LZW algorithm on short text and could be effectively applied to the compression of Uyghur short text for storage and applications.  ...  for B12 coding scheme.  ... 
doi:10.3390/info11030172 doaj:7e8010d68511489a801b50c86837fe55 fatcat:xlb4wxoxmrax5bl5h3irtqfo5y

A Compression-based Algorithm for Chinese Word Segmentation

W. J. Teahan, Yingying Wen, Rodger McNab, Ian H. Witten
2000 Computational Linguistics  
We describe a scheme that infers appropriate positions for word boundaries using an adaptive language model that is standard in text compression.  ...  This simple and general method performs well with respect to specialized schemes for Chinese language segmentation.  ...  The corrected version of Guo Jin's PH corpus and the Rocling corpus were provided by Julia Hockenmaier and Chris Brew at the University of Edinburgh and the Chinese Knowledge Information Processing Group  ... 
doi:10.1162/089120100561746 fatcat:k6l4pffv3jabdhrm35k3srzr3y

Detection Method of Data Integrity in Network Storage Based on Symmetrical Difference

Xiaona Ding
2020 Symmetry  
According to the automatic word segmentation, pos tagging and Chinese word segmentation, the feature analysis of text data was achieved.  ...  Combined with the accountability scheme of data security of the trusted third party, the trusted third party was taken as the core. The online state judgment was made for each user operation.  ...  For data recovery, this paper provides a strong anti-corruption ability by two rounds of coding for the original file.  ... 
doi:10.3390/sym12020228 fatcat:hcs2kh2iknhrjl2mpiobgi4g6y

DESIGN AND IMPLEMENTATION OF AN ALL-DIGITAL REAL-TIME UNDERWATER ACOUSTIC TRANSCEIVER USING DIGITAL SIGNAL PROCESSORS

Fu-Sheng Lu, Ching-Hsiang Tseng, Bin-Chong Wu
2008 Journal of Marine Science and Technology  
a modified noncoherent delay-locked loop for code acquisition and tracking, respectively.  ...  The test data include plain text and simple image files. The test result shows that reliable real-time UWA communications can be accomplished by the proposed transceiver.  ...  In the first experiment, an English text file and a Chinese text file whose contents are shown in Figs. 15 and 16 were transmitted using the implemented TMS320 C6711 DSK transmitter.  ... 
doi:10.51400/2709-6998.1995 fatcat:y6jp7koq35bq7mrtj7qfb77i4a

On Automatic Conversion from E-born PDF into Accessible EPUB3 and Audio-Embedded HTML5 [chapter]

Masakazu Suzuki, Katsuhito Yamaguchi
2020 Lecture Notes in Computer Science  
In the conversion, various local languages can be chosen for reading out STEM contents.  ...  As a promising method to make digital STEM books in PDF accessible, a new assistive technology to convert inaccessible PDF into accessible digital books in some different-type formats are shown.  ...  It shows that ChattyInfty3 is actually customizable for various local languages by making use of the localization scheme.  ... 
doi:10.1007/978-3-030-58796-3_48 fatcat:mokryqwslrcrfiggrlqu27yxse

Improved Research on Fuzzy Search over Encrypted Cloud Data Based on Keywords

Ping Zhang, Jianzhong Wang
2015 Journal of Computer and Communications  
A search strategy over encrypted cloud data based on keywords has been improved and has presented a method using different strategies on the client and the server to improve the search efficiency in this  ...  The client uses the Chinese and English to achieve the synonym construction of the keywords, the establishment of the fuzzy-syllable words and synonyms set of keywords and the implementation of fuzzy search  ...  In the paper [12] , a fuzzy search scheme based on key words is proposed, which realizes the search for the Chinese fuzzy tone and synonymous keywords and uses the pseudo random function to protect the  ... 
doi:10.4236/jcc.2015.39010 fatcat:oojajqxk35eeziwdbqpe47xiui

Embedding adaptive arithmetic coder in chaos-based cryptography

Li Heng-Jian, Zhang Jia-Shu
2010 Chinese Physics B  
Compared with original arithmetic coding, simulation results on Calgary Corpus files show that the proposed scheme suffers from a reduction in compression performance less than 12% and is not susceptible  ...  In this study an adaptive arithmetic coder is embedded in the Baptista-type chaotic cryptosystem for implementing secure data compression.  ...  There are 18 distinct files of different types, including text, executable, geophysical data, and picture. There are two configurations for experiments.  ... 
doi:10.1088/1674-1056/19/5/050508 fatcat:ncyhu7juefca7jtsxsmgpxf32y

Chinese localisation of Evergreen: an open source integrated library system

Qing Zou, Guoying Liu, Lucy A. Tedd
2009 Program  
Indexing, searching, sorting and other locale related issues should be tackled not only language by language, but locale by locale.  ...  Design/methodology/approach -A Simplified Chinese version of Evergreen was implemented and tested and various issues such as encoding, indexing, searching, and sorting specifically associated with Simplified  ...  Coding Chinese script into computer is not a big problem any more.  ... 
doi:10.1108/00330330910934101 fatcat:z6tchqo2pfasjpb6zjzxn5cjby

BLOCK TIME STEP STORAGE SCHEME FOR ASTROPHYSICALN-BODY SIMULATIONS

Maxwell Xu Cai (蔡栩), Yohai Meiron (林友海), M. B. N. Kouwenhoven (柯文, Paulina Assmann, Rainer Spurzem
2015 Astrophysical Journal Supplement Series  
As an urgent response to these challenges, in this paper we propose an adaptive storage scheme for simulation data, inspired by the block time step integration scheme found in a number of direct N-body  ...  As demonstrated by benchmarks, the proposed scheme is applicable to a wide variety of simulations.  ...  The funds from John Templeton Foundation were awarded in a grant to The University of Chicago which also managed the program in conjunction with We also implemented a vispy-based visualization script for  ... 
doi:10.1088/0067-0049/219/2/31 fatcat:zqcihg4qtrctfmvebv4nz3im5q

An Existential Review on Text Watermarking Techniques

Manmeet Kaur, Kamna Mahajan
2015 International Journal of Computer Applications  
Text watermarking is an active area of research from several years. This paper presents a review of various text watermarking techniques described in literature.  ...  We also highlight the security issues like copyright protection, tamper detection and data hiding which need to be focussed for ensuring text security.  ...  A novel semi-fragile text watermarking scheme for content authentication of Chinese text documents was proposed by Xinmin Zhou et.al [3] .  ... 
doi:10.5120/21330-4300 fatcat:tsg3suxqz5hu7b3plzyrtb2bbi

IEEE Access Special Section Editorial: Artificial Intelligence in Cybersecurity

Chi-Yuan Chen, Wei Quan, Nan Cheng, Shui Yu, Jong-Hyouk Lee, Gregorio Martínez Pérez, Hongke Zhang, Shiuhpyng Shieh
2020 IEEE Access  
The article ''DeepTAL: Deep learning for TDOA-based asynchronous localization security with measurement error and missing data,'' by Xue et al., proposes an improved localization algorithm for source localization  ...  The article ''Machine learning based file entropy analysis for ransomware detection in backup systems,'' by Lee et al., proposes to use machine learning for classifying infected files based on file entropy  ... 
doi:10.1109/access.2020.3021604 dblp:journals/access/ChenQCYLPZS20 fatcat:mjvkkgt3wjfo3ditb4eibdc3s4

A general compression algorithm that supports fast searching

Kimmo Fredriksson, Szymon Grabowski
2006 Information Processing Letters  
Acknowledgements We thank Sebastian Deorowicz for suggesting improvements to the final version of the manuscipt.  ...  This requires that the text is locally decompressed for those codewords.  ...  For natural texts this scheme, however, cannot match, e.g., the original (s, c)-dense code in compression ratio, but this is the price we pay for removing the limitation to word based textual data. q 2  ... 
doi:10.1016/j.ipl.2006.04.020 fatcat:2byrxiqeojcm3eatppj4i2yfsy

Early challenges to multilingualism on the Internet: the case of Han character-based scripts

Mark McLelland
2017 Internet Histories  
raised by "Han" character-based scripts such as Chinese, Japanese (and to a lesser extent, Korean).  ...  In this paper, I consider the orthographic factors that delayed the implementation of cross-platform protocols allowing for the input, display and transmission of character-based scripts across early computer  ...  Hence, the possibilities for easier text input, display and retrieval afforded by computers were of interest to East Asian governments and from the 1970s onward various schemes and protocols were explored  ... 
doi:10.1080/24701475.2017.1280889 fatcat:y3nkuljonjepncl6lbxygzwo2u
« Previous Showing results 1 — 15 out of 13,195 results