12,874 Hits in 7.8 sec

Overlapping statistical word indexing

Yasushi Ogawa, Toru Matsuda
1997 Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '97  
We fist propose a segmentation method for Japanese text which uses statistical information of characters.  ...  N-gram indexing, another conventional indexing method, suffers from increase in index size. This paper proposes a new statistical indexing method.  ...  This paper proposes a new method for statistical indexing of Japanese documents.  ... 
doi:10.1145/258525.258576 dblp:conf/sigir/OgawaM97 fatcat:kszpjgje2bfaxo7ljx2f3qgnqi


2006 British Journal of Clinical Pharmacology  
does not detect compounds other than those intended Statistical methods Statistical methods should be described clearly in the Methods section, with references when appropriate.  ...  , tables, and tigures the numbers « Mf tables and figures Summary The text must be preceded by a structured summary, including the following headings Aim(s) Methods.  ... 
doi:10.1111/j.1365-2125.2006.subindex_1.x fatcat:l737ydmh55afng4kxuzcvzrvc4


2007 British Journal of Clinical Pharmacology  
Statistical methods Statistical methods should be described clearly in the Methods section, with references when appropriate.  ...  For a prior year volume, this information is at the one of the microfilm. | For microfiche users, the index and/or contents is contained on a separate fiche.  ... 
doi:10.1111/j.1365-2125.2007.02945.x fatcat:thoboc4zwjgvjfqdwgjvlgghoq


2008 British Journal of Clinical Pharmacology  
Statistical methods Statistical methods should be described clearly in the Methods section, with references when appropriate.  ...  For microfiche users, the index and/or contents is contained on a separate fiche.  ... 
doi:10.1111/j.1365-2125.2008.03210.x fatcat:5f7w2zjykraynjbuujo635lc3u


2008 British Journal of Clinical Pharmacology  
.the standard text for professional medical statisticians" Aslib Book Guide e Encyclopaedic reference text for all aspects of advanced medical statistics Explanation and implementation of statistical methods  ...  A mi Statistical Methods in Medical Research P.  ... 
doi:10.1111/j.1365-2125.2008.03337.x fatcat:yyf7cmp3bfbffg2a5ihiewm6aa

Indexing and weighting of multilingual and mixed documents

Mohammed Mustafa, Izzedin Osman, Hussein Suleman
2011 Proceedings of the South African Institute of Computer Scientists and Information Technologists Conference on Knowledge, Innovation and Leadership in a Diverse, Multidisciplinary Environment - SAICSIT '11  
New techniques that are better suited to the unique characteristics of this problem, in terms of indexing and weighting, are proposed.  ...  A new multilingual and mixed test collection containing mixed-language (Arabic and English) computer science documents and mixed-language queries has been created.  ...  ACKNOWLEDGMENTS Our thanks go to the CLIP research group at the University of Maryland -College Park, USA for their valuable comments.  ... 
doi:10.1145/2072221.2072240 dblp:conf/saicsit/MustafaOS11 fatcat:dewybrrh35gjblhv7opqv4lkau

Combination and boundary detection approaches on Chinese indexing

Christopher C. Yang, Johnny W.K. Luk, Stanley K. Yung, Jerome Yen
2000 Journal of the American Society for Information Science  
The segmented words can be submitted for indexing or new unknown words can be identified and submitted to a dictionary.  ...  A repository is an indexed collection of objects. Indexing is an important task for searching. The better the indexing, the better the searching result.  ...  For example, colleagues [1994, 1995] first segment the text using the maximum-matching approach and then utilize the statistical method to locate and propose candidates for the unknown words contained  ... 
doi:10.1002/(sici)1097-4571(2000)51:4<340::aid-asi4>;2-i fatcat:o7rmg3fdy5bgxebvgiyrs3s3e4

A new character-based indexing method using frequency data for Japanese documents

Ogawa Yasushi, Iwasaki Masajirou
1995 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '95  
A character based indexing is preferable for Japanese IR systems since Japanese words are not segmented.  ...  This paper proposes a new character indexing method to enhance our previous method which divided character pair index entries into disjoint groups based on character classes.  ...  This is a suitable method for Japanese texts, and has been widely used in retrieval systems for Japanese documents Permission to make ciigit:ll/lm-d cupicx of :tll or p:ut of this ln:~[eri:ll without fee  ... 
doi:10.1145/215206.215347 dblp:conf/sigir/OgawaI95 fatcat:gnxviy4ryrhzlmnmbydlwrc7c4

The Indexing and Retrieval of Document Images: A Survey

David Doermann
1998 Computer Vision and Image Understanding  
In this paper, we provide a survey of methods developed by researchers to access and manipulate document images without the need for complete and accurate conversion.  ...  This is followed by a more comprehensive review of techniques for the direct characterization, manipulation, and retrieval, of images of documents containing text, graphics, and scene images.  ...  Fujisawa and Marukawa use a similar approach in which they use confusion statistics to generate an enhanced finite state machine for query terms in Japanese text [25] .  ... 
doi:10.1006/cviu.1998.0692 fatcat:xxl2ynzjkjbeddk2xwkds2g3jq

Pipeline and Data Parallel Hybrid Indexing Algorithm for Multi-core Platform [chapter]

Suqing Zhang, Jirui Li
2014 IFIP Advances in Information and Communication Technology  
The scale and growth rate of today's text collection bring new challenges for index construction.  ...  Evaluations showed this algorithm can improve index construction speed for multi-core platform.  ...  The scale and growth rate of text collection bring new challenges for index construction.  ... 
doi:10.1007/978-3-642-55355-4_28 fatcat:bjycntu5szelbckpboul3yy6ua

Extraction of newspaper headlines from microfilm for automatic indexing

Chew Lim Tan, Qing Hong Liu
2003 International Journal on Document Analysis and Recognition  
This paper proposes a document image analysis system that extracts newspaper headlines from microfilm images with the view to providing automatic indexing for news articles in the microfilm.  ...  To overcome the problem we propose a new effective method for separating characters from noisy background since conventional threshold selection techniques are inadequate to deal with this kind of images  ...  Acknowledgements: This project is supported in part by the Agency for Science, Technology and Research (A*STAR) and Ministry of Education, Singapore under grant R-252-000-071-112/303.  ... 
doi:10.1007/s10032-003-0111-2 fatcat:32kuoarwfvejtndxuvohsfi23y

Sub-Word Indexing and Blind Relevance Feedback for English, Bengali, Hindi, and Marathi IR

Johannes Leveling, Gareth J. F. Jones
2010 ACM Transactions on Asian Language Information Processing  
Sub-words such as consonant-vowel sequences and word prefixes can yield similar or better performance in comparison to word indexing. There is no best performing method for all languages.  ...  Sub-word identification is a general case of decompounding. It results in one or more index terms for a single word form and increases the number of index terms but decreases their average length.  ...  The authors are grateful to the reviewers for providing immensely useful feedback and suggestions.  ... 
doi:10.1145/1838745.1838749 fatcat:73ab5osnhfbw7mvcn5draecodu

Experiments in the Retrieval of Unsegmented Japanese Text at the NTCIR-2 Workshop

Paul McNamee
2001 NTCIR Conference on Evaluation of Information Access Technologies  
Our work with the Hopkins Automated Information Retriever for Combing Unstructured Text (HAIRCUT) system has made use of overlapping character n-grams in the indexing and retrieval of text.  ...  We found that 6-grams performed comparably with English words and that 2-grams and 3-grams perform equally well in Japanese text.  ...  Ogawa and Matsuda have studied a variety of ngram methods for indexing Japanese text.  ... 
dblp:conf/ntcir/McNamee01 fatcat:c6qaboh64ncdfgc7phhvh4v3dm

Learning a Better Motif Index: Toward Automated Motif Extraction

W. Victor H. Yarlott, Mark A. Finlayson, Marc Herbstritt
2016 Workshop on Computational Models of Narrative  
Automatic extraction would enable the construction of a truly comprehensive motif index, which does not yet exist, as well as the automatic detection of motifs in cultural materials, opening up a new world  ...  We outline an experimental design, and report on our efforts to produce a structured form of Thompson's motif index, as well as a development annotation of motifs in a small collection of Russian folklore  ...  , or Japanese folk-literature [21] .  ... 
doi:10.4230/oasics.cmn.2016.7 dblp:conf/cmn/YarlottF16 fatcat:pkin77kamfbrvfpyrnfnbozdae

Index-Based Approach to Similarity Search in Protein and Nucleotide Databases

David Hoksza, Tomás Skopal
2007 Databases, Texts, Specifications, Objects  
On the other side, the results show MAMs could provide a basis for specialized access methods capable of precision/efficiency trade-off control.  ...  In this paper we propose an approach of exact and approximate indexing using several metric access methods (MAMs) in combination with the TriGen algorithm, in order to reduce the number of alignments (  ...  Fatima Cvrčková (Department of Plant Physiology, Faculty of Science, Charles University in Prague) for helping to get in touch with biologist point of view to the problem and contributing to the paper  ... 
dblp:conf/dateso/HokszaS07 fatcat:wzdlbyawtbdedfw4zhqpqtys5q
« Previous Showing results 1 — 15 out of 12,874 results