Filters








696 Hits in 2.9 sec

Pre-trained Language Model based Ranking in Baidu Search [article]

Lixin Zou, Shengqiang Zhang, Hengyi Cai, Dehong Ma, Suqi Cheng, Daiting Shi, Zhifan Zhu, Weiyue Su, Shuaiqiang Wang, Zhicong Cheng, Dawei Yin
2021 arXiv   pre-print
More recently, neural rankers fine-tuned from pre-trained language models (PLMs) establish state-of-the-art ranking effectiveness.  ...  In this work, we contribute a series of successfully applied techniques in tackling these exposed issues when deploying the state-of-the-art Chinese pre-trained language model, i.e., ERNIE, in the online  ...  With the recent significant progress of pre-training language models (PLMs) like BERT [13] and ERNIE [44] in many language understanding tasks, large-scale pre-trained models also demonstrate increasingly  ... 
arXiv:2105.11108v3 fatcat:dbvj65ugovaani4hsiwtl6bcdi

Cross-Lingual Cross-Platform Rumor Verification Pivoting on Multimedia Content

Weiming Wen, Songwen Su, Zhou Yu
2018 Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing  
We propose not to use the multimedia content but to find external information in other news platforms pivoting on it.  ...  Because this pre-trained model is for binary classification, only webpages labeled as real or fake in CCMR Baidu are involved.  ...  During the training process, we embed the sentences in the Fake News Challenge dataset using our pre-trained multilingual sentence embedding.  ... 
doi:10.18653/v1/d18-1385 dblp:conf/emnlp/WenSY18 fatcat:6jvnfohi6vd2zecmtge2umagdy

Cross-Lingual Cross-Platform Rumor Verification Pivoting on Multimedia Content [article]

Weiming Wen and Songwen Su and Zhou Yu
2018 arXiv   pre-print
We propose not to use the multimedia content but to find external information in other news platforms pivoting on it.  ...  Because this pre-trained model is for binary classification, only webpages labeled as real or fake in CCMR Baidu are involved.  ...  During the training process, we embed the sentences in the Fake News Challenge dataset using our pre-trained multilingual sentence embedding.  ... 
arXiv:1808.04911v2 fatcat:3exzmmmuenbsbev4qoysxto5r4

Baidu Neural Machine Translation Systems for WMT19

Meng Sun, Bojian Jiang, Hao Xiong, Zhongjun He, Hua Wu, Haifeng Wang
2019 Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)  
Data selection, back translation, data augmentation, knowledge distillation, domain adaptation, model ensemble and re-ranking are employed and proven effective in our experiments.  ...  In this paper we introduce the systems Baidu submitted for the WMT19 shared task on Chinese↔English news translation.  ...  Acknowledgements We thank Shikun Feng at Baidu for providing the pre-trained language model. We thank the anonymous reviews for their careful reading and their thoughtful comments.  ... 
doi:10.18653/v1/w19-5341 dblp:conf/wmt/SunJXHWW19 fatcat:presm4xmzbgyrguli3d3udqiym

Pre-trained Language Model for Web-scale Retrieval in Baidu Search [article]

Yiding Liu, Guan Huang, Jiaxiang Liu, Weixue Lu, Suqi Cheng, Yukun Li, Daiting Shi, Shuaiqiang Wang, Zhicong Cheng, Dawei Yin
2021 arXiv   pre-print
In particular, we developed an ERNIE-based retrieval model, which is equipped with 1) expressive Transformer-based semantic encoders, and 2) a comprehensive multi-stage training paradigm.  ...  In this paper, we describe the retrieval system that we developed and deployed in Baidu Search.  ...  To the best of our knowledge, this is one of the largest applications of pre-trained language models for web-scale retrieval.  ... 
arXiv:2106.03373v4 fatcat:bkaz3q5dlrcr7nuxdrcrmgu3ti

TravelBERT: Pre-training Language Model Incorporating Domain-specific Heterogeneous Knowledge into A Unified Representation [article]

Hongyin Zhu, Hao Peng, Zhiheng Lyu, Lei Hou, Juanzi Li, Jinghui Xiao
2021 arXiv   pre-print
In this paper, we propose a heterogeneous knowledge language model (HKLM), a unified pre-trained language model (PLM) for all forms of text, including unstructured text, semi-structured text and well-structured  ...  Existing technologies expand BERT from different perspectives, e.g. designing different pre-training tasks, different semantic granularities and different model architectures.  ...  BERT BASE represents the common pretrained Chinese BERT model. TravelBERT C and TravelBERT K represent the use of plain text and HKLM to further pre-train the language model, respectively.  ... 
arXiv:2109.01048v2 fatcat:eaegcialtrbtdjuqxl46nuwedm

Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning [article]

Tao Shen, Yi Mao, Pengcheng He, Guodong Long, Adam Trischler, Weizhu Chen
2020 arXiv   pre-print
In this work, we aim at equipping pre-trained language models with structured knowledge. We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs.  ...  In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training, to inject language models with structured knowledge via learning from raw text.  ...  A detailed comparison between our model and span-level pre-trained language models can be found in §3.4.  ... 
arXiv:2004.14224v1 fatcat:uqfcngb2hjho5n3tzljefeeewa

Will Baidu's "All in AI" Strategy Bring It Back to the High-Speed Growth Train?

2021 Journal of Applied Business and Economics  
Baidu has suffered a slowdown in recent years, mainly due to the decline of search ads and revenue in a mobile era where Baidu is lagging its competitors, such as Alibaba, Tencent, or ByteDance.  ...  Baidu is aggressively investing in artificial intelligence (AI) technologies and striving to be a leader in AI.  ...  Baidu is exploring revenue sharing or fee-based models with automakers in this area.  ... 
doi:10.33423/jabe.v23i4.4472 fatcat:sdusoapdvrajfacifyde6ydctu

DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications [article]

Wei He, Kai Liu, Jing Liu, Yajuan Lyu, Shiqi Zhao, Xinyan Xiao, Yuan Liu, Yizhong Wang, Hua Wu, Qiaoqiao She, Xuan Liu, Tian Wu (+1 others)
2018 arXiv   pre-print
DuReader has three advantages over previous MRC datasets: (1) data sources: questions and documents are based on Baidu Search and Baidu Zhidao; answers are manually generated. (2) question types: it provides  ...  We also organize a shared competition to encourage the exploration of more models. Since the release of the task, there are significant improvements over the baselines.  ...  ., 2016) is based on Bing logs (in English), and DuReader (this paper) is based on the logs of Baidu Search (in Chinese).  ... 
arXiv:1711.05073v4 fatcat:b5xrn4a5xbg2jghdszl57ztrim

DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

Wei He, Kai Liu, Jing Liu, Yajuan Lyu, Shiqi Zhao, Xinyan Xiao, Yuan Liu, Yizhong Wang, Hua Wu, Qiaoqiao She, Xuan Liu, Tian Wu (+1 others)
2018 Proceedings of the Workshop on Machine Reading for Question Answering  
DuReader has three advantages over previous MRC datasets: (1) data sources: questions and documents are based on Baidu Search and Baidu Zhidao 1 ; answers are manually generated. (2) question types: it  ...  We also organize a shared competition to encourage the exploration of more models. Since the release of the task, there are significant improvements over the baselines.  ...  ., 2016) is based on Bing logs (in English), and DuReader (this paper) is based on the logs of Baidu Search (in Chinese).  ... 
doi:10.18653/v1/w18-2605 dblp:conf/acl/HeLLLZXLWWSLWW18 fatcat:oh27suoxmvaipjiavjpupblm74

A Flexible and Sentiment-Aware Framework for Entity Search [chapter]

Kerui Min, Chenghao Dong, Shiyuan Cai, Jianhao Chen
2016 Lecture Notes in Computer Science  
In this paper we propose a flexible and sentiment-aware framework for entity search.  ...  Our approach achieved the average MAP score of 0.7044 in the competition of NLPCC Baidu Challenge 2016, obtained the 3rd place among 174 teams.  ...  In particular, the entity ranking include a linear model which could incorporate various feature functions. The framework successfully demonstrated its effectiveness in the competition.  ... 
doi:10.1007/978-3-319-50496-4_63 fatcat:nul7ahiqnnc65diywcsaufsn44

MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension [article]

Adam Fisch, Alon Talmor, Robin Jia, Minjoon Seo, Eunsol Choi, Danqi Chen
2019 arXiv   pre-print
In this task, we adapted and unified 18 distinct question answering datasets into the same format.  ...  The best system achieved an average F1 score of 72.5 on the 12 held-out datasets, 10.7 absolute points higher than our initial baseline based on BERT.  ...  We are grateful to Baidu, Facebook, and Naver for providing funding for our workshop.  ... 
arXiv:1910.09753v2 fatcat:4evucot7wfgmzovt7n73zvtzu4

TinySearch – Semantics based Search Engine using Bert Embeddings [article]

Manish Patel
2019 arXiv   pre-print
Existing search engines use keyword matching or tf-idf based matching to map the query to the web-documents and rank them.  ...  In this paper, I have developed a semantics-oriented search engine using neural networks and BERT embeddings that can search for query and rank the documents in the order of the most meaningful to least  ...  These search engines use a plethora of strategies to rank the documents some of which are-keyword matching of query with documents [1], tf-idf based vector space model and BM25 based vector-space model  ... 
arXiv:1908.02451v1 fatcat:mhgnyiql55gdrep7uezf5ix4ve

A Survey of Techniques for Constructing Chinese Knowledge Graphs and Their Applications

Tianxing Wu, Guilin Qi, Cheng Li, Meng Wang
2018 Sustainability  
In recent years, knowledge graph has been widely applied in different kinds of applications, such as semantic search, question answering, knowledge management and so on.  ...  At the same time, the accumulated experience of China in developing knowledge graphs is also a good reference to develop non-English knowledge graphs.  ...  This technology overcomes the limitations of the traditional keyword-based search model and converts the Web-based search to the semantic search.  ... 
doi:10.3390/su10093245 fatcat:wrqgfkwfanfejnffn6nyr4nqbq

EfficientCLIP: Efficient Cross-Modal Pre-training by Ensemble Confident Learning and Language Modeling [article]

Jue Wang, Haofan Wang, Jincan Deng, Weijia Wu, Debing Zhang
2021 arXiv   pre-print
While large scale pre-training has achieved great achievements in bridging the gap between vision and language, it still faces several challenges. First, the cost for pre-training is expensive.  ...  Second, there is no efficient way to handle the data noise which degrades model performance.  ...  Each word in the dictionary is used as query to crawl image-text pairs from Chinese Search Engine (Baidu Pictures and Baidu Baike).  ... 
arXiv:2109.04699v2 fatcat:ezor5bfhnnbbbnwruibacpam3y
« Previous Showing results 1 — 15 out of 696 results