Filters








38 Hits in 1.5 sec

Dima

Ji Sun, Zeyuan Shang, Guoliang Li, Dong Deng, Zhifeng Bao
2017 Proceedings of the VLDB Endowment  
Data analysts in industries spend more than 80% of time on data cleaning and integration in the whole process of data analytics due to data errors and inconsistencies. It calls for effective query processing techniques to tolerate the errors and inconsistencies. In this paper, we develop a distributed in-memory similarity-based query processing system called Dima. Dima supports two core similarity-based query operations, i.e., similarity search and similarity join. Dima extends the SQL
more » ... ds the SQL programming interface for users to easily invoke these two operations in their data analysis jobs. To avoid expensive data transformation in a distributed environment, we design selectable signatures where two records approximately match if they share common signatures. More importantly, we can adaptively select the signatures to balance the workload. Dima builds signature-based global indexes and local indexes to support efficient similarity search and join. Since Spark is one of the widely adopted distributed inmemory computing systems, we have seamlessly integrated Dima into Spark and developed effective query optimization techniques in Spark. To the best of our knowledge, this is the first full-fledged distributed in-memory system that can support similarity-based query processing. We demonstrate our system in several scenarios, including entity matching, web table integration and query recommendation.
doi:10.14778/3137765.3137810 fatcat:sbj6yphv2ndjfa32grnszrhoaq

K-Join: Knowledge-Aware Similarity Join

Zeyuan Shang, Yaxiao Liu, Guoliang Li, Jianhua Feng
2017 2017 IEEE 33rd International Conference on Data Engineering (ICDE)  
Similarity join is a fundamental operation in data cleaning and integration. Existing similarity-join methods utilize the string similarity to quantify the relevance but neglect the knowledge behind the data, which plays an important role in understanding the data. Thanks to public knowledge bases, e.g., Freebase and Yago, we have an opportunity to use the knowledge to improve similarity join. To address this problem, we study knowledge-aware similarity join, which, given a knowledge hierarchy
more » ... nowledge hierarchy and two collections of objects (e.g., documents), finds all knowledge-aware similar object pairs. To the best of our knowledge, this is the first study on knowledge-aware similarity join. There are two main challenges. The first is how to quantify the knowledge-aware similarity. The second is how to efficiently identify the similar pairs. To address these challenges, we first propose a new similarity metric to quantify the knowledge-aware similarity using the knowledge hierarchy. We then devise a filter-and-verification framework to efficiently identify the similar pairs. We propose effective signature-based filtering techniques to prune large numbers of dissimilar pairs and develop efficient verification algorithms to verify the candidates that are not pruned in the filter step. Experimental results on real-world datasets show that our method significantly outperforms baseline algorithms in terms of both efficiency and effectiveness.
doi:10.1109/icde.2017.18 dblp:conf/icde/ShangLLF17 fatcat:46d26u52ffcplouks4mpwkpf44

How I Learned to Stop Worrying and Love Re-optimization [article]

Matthew Perron, Zeyuan Shang, Tim Kraska, Michael Stonebraker
2019 arXiv   pre-print
Cost-based query optimizers remain one of the most important components of database management systems for analytic workloads. Though modern optimizers select plans close to optimal performance in the common case, a small number of queries are an order of magnitude slower than they could be. In this paper we investigate why this is still the case, despite decades of improvements to cost models, plan enumeration, and cardinality estimation. We demonstrate why we believe that a re-optimization
more » ... re-optimization mechanism is likely the most cost-effective way to improve end-to-end query performance. We find that even a simple re-optimization scheme can improve the latency of many poorly performing queries. We demonstrate that re-optimization improves the end-to-end latency of the top 20 longest running queries in the Join Order Benchmark by 27%, realizing most of the benefit of perfect cardinality estimation.
arXiv:1902.08291v2 fatcat:kodfhpdykneejbmx5trjrqy5km

K-Join: Knowledge-Aware Similarity Join

Zeyuan Shang, Yaxiao Liu, Guoliang Li, Jianhua Feng
2016 IEEE Transactions on Knowledge and Data Engineering  
Similarity join is a fundamental operation in data cleaning and integration. Existing similarity-join methods utilize the string similarity to quantify the relevance but neglect the knowledge behind the data, which plays an important role in understanding the data. Thanks to public knowledge bases, e.g., Freebase and Yago, we have an opportunity to use the knowledge to improve similarity join. To address this problem, we study knowledge-aware similarity join, which, given a knowledge hierarchy
more » ... nowledge hierarchy and two collections of objects (e.g., documents), finds all knowledge-aware similar object pairs. To the best of our knowledge, this is the first study on knowledge-aware similarity join. There are two main challenges. The first is how to quantify the knowledge-aware similarity. The second is how to efficiently identify the similar pairs. To address these challenges, we first propose a new similarity metric to quantify the knowledge-aware similarity using the knowledge hierarchy. We then devise a filter-and-verification framework to efficiently identify the similar pairs. We propose effective signature-based filtering techniques to prune large numbers of dissimilar pairs and develop efficient verification algorithms to verify the candidates that are not pruned in the filter step. Experimental results on real-world datasets show that our method significantly outperforms baseline algorithms in terms of both efficiency and effectiveness.
doi:10.1109/tkde.2016.2601325 fatcat:jlpw37qbafhzleowq5xdurmkdu

Balance-aware distributed string similarity-based query processing system

Ji Sun, Zeyuan Shang, Guoliang Li, Dong Deng, Zhifeng Bao
2019 Proceedings of the VLDB Endowment  
Data analysts spend more than 80% of time on data cleaning and integration in the whole process of data analytics due to data errors and inconsistencies. Similarity-based query processing is an important way to tolerate the errors and inconsistencies. However, similarity-based query processing is rather costly and traditional database cannot afford such expensive requirement. In this paper, we develop a distributed in-memory similarity-based query processing system called Dima. Dima supports
more » ... a. Dima supports four core similarity operations, i.e., similarity selection, similarity join, top-k selection and top-k join. Dima extends SQL for users to easily invoke these similarity-based operations in their data analysis tasks. To avoid expensive data transmission in a distributed environment, we propose balance-aware signatures where two records are similar if they share common signatures, and we can adaptively select the signatures to balance the workload. Dima builds signature-based global indexes and local indexes to support similarity operations. Since Spark is one of the widely adopted distributed inmemory computing systems, we have seamlessly integrated Dima into Spark and developed effective query optimization techniques in Spark. To the best of our knowledge, this is the first full-fledged distributed in-memory system that can support complex similarity-based query processing on largescale datasets. We have conducted extensive experiments on four real-world datasets. Experimental results show that Dima outperforms state-of-the-art studies by 1-3 orders of magnitude and has good scalability. PVLDB Reference Format:
doi:10.14778/3329772.3329774 fatcat:sorbhypaijg6rjoevbvdhxyjbm

Transcriptomic encoding of sensorimotor transformation in the midbrain [article]

Zhiyong Xie, Mengdi Wang, Zeyuan Liu, Congping Shang, Changjiang Zhang, Le Sun, Huating Gu, Genxin Ran, Qing Pei, Qiang Ma, Meizhu Huang, Junjing Zhang (+5 others)
2021 bioRxiv   pre-print
The 214 effectiveness and specificity of TeNT-mediated synaptic inactivation of SC neurons have 215 been validated in earlier studies (Shang et al., 2018; Shang et al., 2019).  ...  In mice, a series of 359 projection-defined SC circuits have been linked to sensory-triggered innate behaviors 360 such as predator avoidance (Evans et al., 2018; Shang et al., 2018; Shang et al., 2015  ... 
doi:10.1101/2021.04.27.441692 fatcat:ehria5osmbcjzeycnig7n5wjy4

Towards Interactive Data Exploration [chapter]

Carsten Binnig, Fuat Basık, Benedetto Buratti, Ugur Cetintemel, Yeounoh Chung, Andrew Crotty, Cyrus Cousins, Dylan Ebert, Philipp Eichmann, Alex Galakatos, Benjamin Hättasch, Amir Ilkhechi (+10 others)
2019 Lecture Notes in Business Information Processing  
Enabling interactive visualization over new datasets at "human speed" is key to democratizing data science and maximizing human productivity. In this work, we first argue why existing analytics infrastructures do not support interactive data exploration and outline the challenges and opportunities of building a system specifically designed for interactive data exploration. Furthermore, we present the results of building IDEA, a new type of system for interactive data exploration that is
more » ... ion that is specifically designed to integrate seamlessly with existing data management landscapes and allow users to explore their data instantly without expensive data preparation costs. Finally, we discuss other important considerations for interactive data exploration systems including benchmarking, natural language interfaces, as well as interactive machine learning.
doi:10.1007/978-3-030-24124-7_11 fatcat:ixxjchbhe5awrlcjvr5kocqgmq

External Reviewers

2020 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)  
Shang, MIT Xiaojun Shang, Stony Brook University  ...  University Sajjad Rahnama, UC Davis Alejandro Ranchal-Pedrosa, University of Sydney Thibault Rieutord, CEA LIST Johannes Sedlmeir, University of Bayreuth Avi Segal, Ben-Gurion University of the Negev, Israel Zeyuan  ... 
doi:10.1109/icdcs47774.2020.00009 fatcat:vsayezvvrvbu3a7aitd4kbzu5e

The Transmission and Practice of Chinese Medicine

Éric Marié
2011 China Perspectives  
medicine" ( 國醫大師 -guo yidashi ) was published by the Chinese government at the joint instigation of three central departments: The Ministry of Human Resources and Social Security ( 人力 資源和社會保障部 -renli zeyuan  ...  For the Shang Han Lun, he also followed the teachings of a friend of his uncle, Xie Shanghu, who had an excellent knowledge of this work and the corpus stemming from it.  ... 
doi:10.4000/chinaperspectives.5613 fatcat:jvm6ysetljeepg6jf4igycu7ee

La société locale vue à travers la statuaire domestique du Hunan

Alain Arrault
2010 Cahiers d Extrême-Asie  
Fig. 12 : 12 Grand-père sieur Jiang Zeyuan 祖考蔣 澤遠 (1818 ? -?), 1910, Y 1122009. Cliché Yan Xinyuan.  ...  Extrait du Su shi wuxiu zupu 氏五修族譜, 1932, cité par Su Yejiang , « Shang Meishan Manwang chongpai » 梅山蠻王崇 , dans ibid. 116. Voir à ce sujet fi g. 39 ci-dessus. 117.  ... 
doi:10.3406/asie.2010.1350 fatcat:mjf6pvvz5bh7lf6pww7d4y4lra

Anti-Inflammatory Activities of Compounds Isolated from the Rhizome of Anemarrhena asphodeloides

Zeyuan Wang, Jianfeng Cai, Qing Fu, Lingping Cheng, Lehao Wu, Weiyue Zhang, Yan Zhang, Yu Jin, Chunzhi Zhang
2018 Molecules  
Plant Materials The rhizomes of A. asphodeloides were purchased from Lei Yun Shang Pharmaceutical store (Shanghai, China).  ... 
doi:10.3390/molecules23102631 fatcat:a745tqbyrbasjpcvej6m5a4yu4

Table of Contents

2020 2020 7th International Conference on Information Science and Control Engineering (ICISCE)  
University Changsha, China) Simulated ANNealing-Based Method to Optimize Routing Paths for City-Pair Airlines 1287 Jinglei Huang (State Key Laboratory of Air Trafc Management System and Technology), Zeyuan  ...  1395 Jie Luo (Beijing Institute of Spacecraft System Engineering), Zhe Xu (Beijing Institute of Spacecraft System Engineering), Hui Qiu (Beijing Institute of Spacecraft System Engineering), and Aihua Shang  ... 
doi:10.1109/icisce50968.2020.00004 fatcat:tfrsxa5nwjba3ib33wqou27wim

Derandomization Beyond Connectivity: Undirected Laplacian Systems in Nearly Logarithmic Space [article]

Jack Murtagh, Omer Reingold, Aaron Sidford, Salil Vadhan
2017 arXiv   pre-print
[ST2] Daniel A Spielman and Shang-Hua Teng. Spectral sparsification of graphs. SIAM Journal on Computing, 40(4):981–1025, 2011. [ST3] Daniel A Spielman and Shang-Hua Teng.  ...  [CKM+ ] Paul Christiano, Jonathan A Kelner, Aleksander Madry, Daniel A Spielman, and Shang-Hua Teng.  ... 
arXiv:1708.04634v1 fatcat:mooju4ooy5ezlpnhajkoyghcy4

Formująca się wielka strategia Chin: Ku dominacji w Azji Wschodniej, ale bez walki

Chong-Pin Lin, Uniwersytet Tamkang (Tajwan)
2006 Azja-Pacyfik  
O ile Jiang próbował angażować tamtejszych przedsiębiorców przeciwko funkcjonariuszom państwowym, albo przeciwstawiać zwykłych obywateli -władzom (yi shang wei zheng; yi min bi guan), Hu Jintao wydaje  ...  Relations], "Xuexi Shibao" (Study Times), Beijing: The Central Party School, October 2002, (materiał do użytku wewnętrznego). 8 9 Patrz analiza: Yu Zeyuan, Jiefangjun yanjiu Sunzi bingfa ying dui  ... 
doi:10.15804/ap200605 fatcat:l7wxquavl5dj7kjh3v3tw77jhm

Model-Based Single Image Deep Dehazing [article]

Zhengguo Li, Chaobing Zheng, Haiyan Shu, Shiqian Wu
2021 arXiv   pre-print
IEEE Trans. on Image Processing, 24(11): 3522-3533, 2015. 1, 2, 3 [20] Zeyuan Chen, Yangchao Wang, Yang Yang, and Dong [9] Dana Berman, Tali Treibitz, and Shai Avidan.  ...  2020. 1, 3, 4, 5 [12] Bolun Cai, Xiangmin Xu, Kui Jia, Chunmei Qing, and Dacheng Tao, DehazeNet: an end-to-end system for [23] Shang-Hua  ... 
arXiv:2111.10943v2 fatcat:qp723mjwgra7bapqynv5pg35ua
« Previous Showing results 1 — 15 out of 38 results