2,927 Hits in 5.6 sec

Large-scale cluster-based retrieval experiments on Turkish texts

Ismail Sengor Altingovde, Rifat Ozcan, Huseyin Cagdas Ocalan, Fazli Can, Özgür Ulusoy
2007 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '07  
We present cluster-based retrieval (CBR) experiments on the largest available Turkish document collection.  ...  Our experiments evaluate retrieval effectiveness and efficiency on both an automatically generated clustering structure and a manual classification of documents.  ...  INTRODUCTION This paper presents cluster-based retrieval (CBR) experiments using the largest Turkish information retrieval (IR) test collection in the literature.  ... 
doi:10.1145/1277741.1277961 dblp:conf/sigir/AltingovdeOOCU07 fatcat:ngxlnkzr3rdjteyzv4bhhuafbm


Ozgur Yilmazel
2010 Anadolu University Journal of Science and Technology. A : Applied Sciences and Engineering  
Deneylerimiz, dile özgü ön işleme tekniklerinin tüm geri getirim modelleri için geri getirme performansını artırdığını gösterdi.  ...  Bilgi erişiminde dil modelleme yaklaşımı başta olmak üzere Lemur tarafından desteklenen üç geri getirme modeli ve dile özgü ön işleme teknikleri araştırıldı.  ...  Later they performed cluster-based retrieval (CBR) experiments on the same test collection (Altingovdeet al., 2007) .  ... 
doaj:42f4b8abecb14addb9c0c657fd17cfde fatcat:csoibu7ga5d5tntewz2oma7jbq

Matching ottoman words

Esra Ataer, Pinar Duygulu
2007 Proceedings of the 6th ACM international conference on Image and video retrieval - CIVR '07  
Similar words are then matched based on the similarity of the distributions of the visual terms. The experiments are carried out on printed and handwritten documents which included over 10,000 words.  ...  Due to these reasons, in this study we treat the problem as an image retrieval problem with the view that Ottoman words are images, and we propose a solution based on image matching techniques.  ...  Figure 11 shows retrieval results for some selected words on small-printed and large-printed data sets.  ... 
doi:10.1145/1282280.1282332 dblp:conf/civr/AtaerD07 fatcat:aimcriydpvgl3iysylukv3ukaa

Information retrieval on Turkish texts

Fazli Can, Seyit Kocberber, Erman Balcik, Cihan Kaynak, H. Cagdas Ocalan, Onur M. Vursavas
2008 Journal of the American Society for Information Science and Technology  
We study information retrieval (IR) on Turkish texts using a large-scale test collection that contains 408,305 documents and 72 ad hoc queries.  ...  in Turkish IR.  ...  CONCLUSIONS AND FUTURE RESEARCH In this study we provide the first thorough investigation of information retrieval on Turkish texts using a large-scale test collection.  ... 
doi:10.1002/asi.20750 fatcat:rpnxvdc7gfd75lyrhcymfgwcde

A comparison of Relational Databases and information retrieval libraries on Turkish text retrieval

Ahmet Arslan, Ozgur Yilmazel
2008 2008 International Conference on Natural Language Processing and Knowledge Engineering  
The present work covers a comparison of the text retrieval performances of relational databases and IR Systems over a TREC-like test collection for Turkish.  ...  The effects of language specific preprocessing and different query lengths for different information retrieval systems are investigated.  ...  At SIGIR '07 they presented cluster-based retrieval (CBR) experiments on the same test collection [2] .  ... 
doi:10.1109/nlpke.2008.4906748 dblp:conf/nlpke/ArslanY08 fatcat:imn5btfxizdg7hrfj3mlv2imsa

MURAL: Multimodal, Multitask Retrieval Across Languages [article]

Aashi Jain, Mandy Guo, Krishna Srinivasan, Ting Chen, Sneha Kudugunta, Chao Jia, Yinfei Yang, Jason Baldridge
2021 arXiv   pre-print
We additionally show that MURAL's text representations cluster not only with respect to genealogical connections but also based on areal linguistics, such as the Balkan Sprachbund.  ...  On the Wikipedia Image-Text dataset, for example, MURAL-base improves zero-shot mean recall by 8.1% on average for eight under-resourced languages and by 6.8% on average when fine-tuning.  ...  We thank Zarana Parekh for helping us setup evaluation on the CxC dataset, Orhan Firat for providing guidance on vocabulary coverage, Yuqing Chen and Apu Shah for assisting with latency metrics of the  ... 
arXiv:2109.05125v1 fatcat:3ejouvqr4fau5hz7nbtz2qqwwq

Towards Zero-shot Cross-lingual Image Retrieval [article]

Pranav Aggarwal, Ajinkya Kale
2020 arXiv   pre-print
We also introduce a new objective function which tightens the text embedding clusters by pushing dissimilar texts from each other.  ...  We present a simple yet practical approach for building a cross-lingual image retrieval model which trains on a monolingual training dataset but can be used in a zero-shot cross-lingual fashion during  ...  These approaches are limited based on availability of large parallel language corpora.  ... 
arXiv:2012.05107v1 fatcat:k55hgclu7bavdesqozgzctyfjq

Music emotion classification for Turkish songs using lyrics

Ahmet Onur Durahim, Abide Coşkun Setirek, Birgül Başarır Özel, Hanife Kebapçı
2018 Pamukkale University Journal of Engineering Sciences  
Consequently, the form of music retrieval is changed from catalogue based searches to searches made based on emotion tags in order for easy and effective musical information access.  ...  Thereafter, Unigram, Bigram and Trigram word features are extracted from song lyrics after performing text preprocessing where stemming of the Turkish words is an essential part.  ...  Today, there are several music services which provide large scale music datasets for information extraction and most of the musical content is easily accessible [4] .  ... 
doi:10.5505/pajes.2017.15493 fatcat:mwxmp6jz3zaxzj6fd2b3vl5bm4

Turkish Information Retrieval: Past Changes Future [chapter]

Fazli Can
2006 Lecture Notes in Computer Science  
We briefly review the information explosion problem and information retrieval systems, convey the past and state of the art in Turkish information retrieval research, illustrate some recent developments  ...  One of the most exciting accomplishments of computer science in the lifetime of this generation is the World Wide Web. The Web is a global electronic publishing medium.  ...  Esen A.Özkarahan; my friend, teacher, Ph.D. advisor, mentor, and colleague; who traveled with me and introduced me to the field of information retrieval. I would like to thank Ismail Sengör  ... 
doi:10.1007/11890393_2 fatcat:3uykpsqqjbcfxe66pyvxd2e6qi

Towards Zero-shot Cross-lingual Image Retrieval and Tagging [article]

Pranav Aggarwal, Ritiz Tambi, Ajinkya Kale
2021 arXiv   pre-print
We also introduce a new objective function which tightens the text embedding clusters by pushing dissimilar texts away from each other.  ...  We present a simple yet practical approach for building a cross-lingual image retrieval model which trains on a monolingual training dataset but can be used in a zero-shot cross-lingual fashion during  ...  These approaches are limited based on availability of large parallel language corpora.  ... 
arXiv:2109.07622v1 fatcat:hvegymlhybgjhjqgj3ysnljmlu

A unified language model for large vocabulary continuous speech recognition of Turkish

Ebru Arısoy, Helin Dutağacı, Levent M. Arslan
2006 Signal Processing  
We have designed a Turkish dictation system for newspaper content transcription application. Turkish is an agglutinative language with free word order.  ...  These error rates are smaller compared to the traditional word-based model for newspaper content transcription application. r  ...  Text corpora The lack of a large Turkish text corpus is an important drawback for research in Turkish language processing.  ... 
doi:10.1016/j.sigpro.2005.12.002 fatcat:kvh2k6pr5feg3nkoenuxnnifra

Joint PoS Tagging and Stemming for Agglutinative Languages [article]

Necva Bölücü, Burcu Can
2017 arXiv   pre-print
Part-of-speech tagging (PoS tagging) is one of these tasks that often suffers from sparsity.  ...  We present results for Turkish and Finnish as agglutinative languages and English as a morphologically poor language.  ...  Experiments & Results Data: We used three datasets for the experiments and evaluation: -Turkish: METU-Sabancı Turkish Treebank [ ] that consists of word tokens.  ... 
arXiv:1705.08942v1 fatcat:hauvrqaoobfgbly32rkw6ukuza

Automated Text Analysis and International Relations: The Introduction and Application of a Novel Technique for Twitter

Emre Hatipoğlu, Osman Zeki Gökçe, İnanç Arın, Yücel Saygın
2018 All Azimuth A journal of foreign policy and peace  
More specifically, we develop a clustering methodology based on Longest Common Subsequence Similarity Metric, which automatically groups tweets with similar content.  ...  To illustrate the usefulness of this technique, we present some of our findings from a project we conducted on Turkish sentiments on Twitter towards Syrian refugees.  ...  However, experiments show that tweets belonging to even very large clusters are similar to each other in content as well as to the representative tweet.  ... 
doi:10.20991/allazimuth.476852 fatcat:hnxdgvdqq5hbvpbvc5z7gwfs7a

Multilingual Visual Sentiment Concept Matching

Nikolaos Pappas, Miriam Redi, Mercan Topkara, Brendan Jou, Hongyi Liu, Tao Chen, Shih-Fu Chang
2016 Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval - ICMR '16  
Based on these representations, we design clustering schemes to group multilingual visual concepts, and evaluate them with novel metrics based on the crowdsourced sentiment annotations as well as visual  ...  We compare a variety of concept representations through a novel evaluation task based on the notion of visual semantic relatedness.  ...  large text corpora.  ... 
doi:10.1145/2911996.2912016 dblp:conf/mir/PappasRTJLCC16 fatcat:l3jgpy2nzbg2fdfi3ei7oj4kpa

Has Computational Linguistics Become More Applied? [chapter]

Kenneth Church
2009 Lecture Notes in Computer Science  
In this work we experiment by using natural language techniques such as lemmatizing and using manual and automatic thesauri for improving question based document retrieval.  ...  We are building a large-scale lexical resource for the biology domain, providing information about predicateargument structure that has been bootstrapped from a biomedical corpus on the subject of E.  ...  of large scale annotated resources.  ... 
doi:10.1007/978-3-642-00382-0_1 fatcat:oddvfzds4nfwjam2ccqeaxe2y4
« Previous Showing results 1 — 15 out of 2,927 results