Filters








19 Hits in 2.4 sec

Cross-Lingual Information Retrieve in Sogou Search

Jingfang Xu, Feifei Zhai, Zhengshan Xue
2017 Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '17  
In order to break the language barrier and connect Chinese people to the foreign language information in the world, Sogou has built a crosslingual information retrieval (CLIR) system named Sogou English  ...  (http://english.sogou.com), which enables Chinese people to search and browse foreign language information with Chinese.  ...  Sogou English is built based on the second largest search engine in China, Sogou Search. Besides, the neural machine translation (NMT) technology is adopted to do the translation part.  ... 
doi:10.1145/3077136.3096463 dblp:conf/sigir/XuZX17 fatcat:7oytbpkl4ffghiamkjxeoezvrq

SOGOU-2012-CRAWL

Stewart Whiting, Joemon M. Jose, Omar Alonso
2016 Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval - SIGIR '16  
In 2012, Sogou, a major Chinese web search engine released a large-scale query log containing 43.5M user interactions, including submitted queries and clicked web page search results.  ...  A real large-scale query log with accompanying crawl such as this offers several opportunities for reproducible information retrieval (IR) research, including query classification, intent modelling and  ...  INTRODUCTION A great deal of research in information retrieval (IR) is based on analysing the past behaviour of real IR system users.  ... 
doi:10.1145/2911451.2914668 dblp:conf/sigir/WhitingJA16 fatcat:ribprbmywfdbfb65hsrbpd7ewy

Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-Shot Learning [chapter]

Sean MacAvaney, Luca Soldaini, Nazli Goharian
2020 Lecture Notes in Computer Science  
While billions of non-English speaking users rely on search engines every day, the problem of ad-hoc information retrieval is rarely studied for non-English languages.  ...  In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents.  ...  This work was supported in part by ARCS Foundation.  ... 
doi:10.1007/978-3-030-45442-5_31 fatcat:vjxxtqp345an5jsdgrhpenm6u4

Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning [article]

Sean MacAvaney, Luca Soldaini, Nazli Goharian
2019 arXiv   pre-print
While billions of non-English speaking users rely on search engines every day, the problem of ad-hoc information retrieval is rarely studied for non-English languages.  ...  In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents.  ...  While most of recent approaches have focused on ad hoc retrieval for English, some researchers have studied the problem of cross-lingual information retrieval.  ... 
arXiv:1912.13080v1 fatcat:3fsqiservbbcvfrrwg4krubrzu

CN-DBpedia: A Never-Ending Chinese Knowledge Extraction System [chapter]

Bo Xu, Yong Xu, Jiaqing Liang, Chenhao Xie, Bin Liang, Wanyun Cui, Yanghua Xiao
2017 Lecture Notes in Computer Science  
These knowledge bases play important roles in enabling machines to understand texts.  ...  However, most current knowledge bases are in English and non-English knowledge bases, especially Chinese ones, are still very rare.  ...  Moreover, some search engines such as Sogou even show the top-10 searched movies, songs, games, etc.  ... 
doi:10.1007/978-3-319-60045-1_44 fatcat:a2psegja5ngqreul7zhjappara

Cross-lingual Lexical Sememe Prediction

Fanchao Qi, Yankai Lin, Maosong Sun, Hao Zhu, Ruobing Xie, Zhiyuan Liu
2018 Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing  
Thus we present a task of cross-lingual lexical sememe prediction, aiming to automatically predict sememes for words in other languages.  ...  Experimental results on real-world datasets show that our proposed model achieves consistent and significant improvements as compared to baseline methods in cross-lingual sememe prediction.  ...  We will explore the effectiveness of our model in these tasks such as cross-lingual information retrieval. Figure 1 : 1 An example of HowNet.  ... 
doi:10.18653/v1/d18-1033 dblp:conf/emnlp/QiLSZX018 fatcat:tacr73frbvhp3muffmlhrjwkoa

CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark [article]

Yuan Yao, Qingxiu Dong, Jian Guan, Boxi Cao, Zhengyan Zhang, Chaojun Xiao, Xiaozhi Wang, Fanchao Qi, Junwei Bao, Jinran Nie, Zheni Zeng, Yuxian Gu (+23 others)
2021 arXiv   pre-print
When con- Sogou-Log Sogou-Log consists of search logs of structing options, crowd-sourced annotators were Sogou.com, a major Chinese commercial search asked to extract a sentence from the story  ...  NCLS: Neural cross-lingual summarization. In Pro- ceedings of EMNLP-IJCNLP, pages 3054–3064.  ... 
arXiv:2112.13610v1 fatcat:eks56wvqtbhmfkq7wvs5n46lte

Analyzing chinese-english mixed language queries in a web search engine

Hengyi Fu, Shuheng Wu
2014 Proceedings of the American Society for Information Science and Technology  
and cross-lingual query expansion.  ...  Keywords Information retrieval, multilingual search query, search behavior, search topics, user intent.  ...  DATA COLLECTION AND RESEARCH METHOD This study uses queries submitted to the Sogou web search engine (http://www.sogou.com/), which is one of the most popular search engines in China.  ... 
doi:10.1002/meet.2014.14505101114 fatcat:wftyh2vzyveo5c4qpx6revdv44

Pre-training Methods in Information Retrieval [article]

Yixing Fan, Xiaohui Xie, Yinqiong Cai, Jia Chen, Xinyu Ma, Xiangsheng Li, Ruqing Zhang, Jiafeng Guo
2022 arXiv   pre-print
The core of information retrieval (IR) is to identify relevant information from large-scale resources and return it as a ranked list to respond to the user's information need.  ...  In recent years, the resurgence of deep learning has greatly advanced this field and leads to a hot topic named NeuIR (i.e., neural information retrieval), especially the paradigm of pre-training methods  ...  Acknowledgements References Pre-training Methods in Information Retrieval Acknowledgements  ... 
arXiv:2111.13853v3 fatcat:pilemnpphrgv5ksaktvctqdi4y

Exploiting bilingual information to improve web search

Wei Gao, John Blitzer, Ming Zhou, Kam-Fai Wong
2009 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - ACL-IJCNLP '09   unpublished
Web search quality can vary widely across languages, even for the same information need.  ...  We propose to exploit this variation in quality by learning a ranking function on bilingual queries: queries that appear in query logs for two languages but represent equivalent search interests.  ...  The thrust of our technique is using search ranking of one language and cross-lingual information to help ranking of another language.  ... 
doi:10.3115/1690219.1690296 fatcat:se5yimxnrfbyphxijq6ibmwkvy

A Survey on Text Classification: From Shallow to Deep Learning [article]

Qian Li, Hao Peng, Jianxin Li, Congying Xia, Renyu Yang, Lichao Sun, Philip S. Yu, Lifang He
2021 arXiv   pre-print
Text classification is the most fundamental and essential task in natural language processing.  ...  The last decade has seen a surge of research in this area due to the unprecedented success of deep learning.  ...  information retrieval and mining technology -plays a vital role in managing text data.  ... 
arXiv:2008.00364v6 fatcat:a6zp52rtf5awlh253yp62wqt3a

Cross Language Information Retrieval System

Zhidan Yang, Zhiting Yang
2018 Proceedings of the 2nd International Forum on Management, Education and Information Technology Application (IFMEITA 2017)   unpublished
In the paper, we describe a Cross Language Information Retrieval System (CLIR), which allows user input English queries and search Chinese documents.  ...  We also explore the solution for online search engine, which can meet commercial requirements. 373  ...  A research show the retrieval effectiveness of EC and CE cross lingual search in google and yahoo is much lower than that of EE and CC monolingual search [14] .  ... 
doi:10.2991/ifmeita-17.2018.63 fatcat:baohr5hgxzejroysfi3pyqzj24

A Survey on Text Classification: From Traditional to Deep Learning

Qian Li, Hao Peng, Jianxin Li, Congying Xia, Renyu Yang, Lichao Sun, Philip S. Yu, Lifang He
2022 ACM Transactions on Intelligent Systems and Technology  
Text classification is the most fundamental and essential task in natural language processing.  ...  The last decade has seen a surge of research in this area due to the unprecedented success of deep learning.  ...  Text classification -as efficient information retrieval and mining technology -plays a vital role in managing text data.  ... 
doi:10.1145/3495162 fatcat:ehrzpu4eezf7lah6jm3gyksyaq

D4.1 Report on Multimodal Machine Translation

Stig-Arne Grönroos, Umut Sulubacak, Jörg Tiedemann
2018 Zenodo  
In MeMAD, multimodal translation is of particular interest in facilitating cross-lingual multimodal content retrieval, and is one of the main focuses of WP4.  ...  Multimodal machine translation involves drawing information from more than one modality (text, audio, and visuals), and is an emerging subject within the machine translation community.  ...  In addition the Finnish IT Center for Science (CSC) provided computational resources. We would also like to acknowledge the support by NVIDIA and their GPU grant.  ... 
doi:10.5281/zenodo.3690761 fatcat:n3b34ooubfayxphgyf6bli6bya

Deep Learning Based Text Classification: A Comprehensive Review [article]

Shervin Minaee, Nal Kalchbrenner, Erik Cambria, Narjes Nikzad, Meysam Chenaghlu, Jianfeng Gao
2021 arXiv   pre-print
In this paper, we provide a comprehensive review of more than 150 deep learning based models for text classification developed in recent years, and discuss their technical contributions, similarities,  ...  Deep learning based models have surpassed classical machine learning based approaches in various text classification tasks, including sentiment analysis, news categorization, question answering, and natural  ...  [61] extended the hierarchical attention model to cross-lingual sentiment classification. In each language, a LSTM network is used to model the documents.  ... 
arXiv:2004.03705v3 fatcat:al5hstylsbhfpldvokuvlpomam
« Previous Showing results 1 — 15 out of 19 results