Filters








1,239 Hits in 2.1 sec

BiBERT: Accurate Fully Binarized BERT [article]

Haotong Qin, Yifu Ding, Mingyuan Zhang, Qinghua Yan, Aishan Liu, Qingqing Dang, Ziwei Liu, Xianglong Liu
2022 arXiv   pre-print
., 2021; Liu et al., 2018) .  ...  function with a fixed 0 threshold is applied to the original definition of the binarized neural network (Rastegari et al., 2016) and is used by default in most binarization works (Qin et al., 2020; Liu  ...  ., 2016; Liu et al., 2018) . The matrix multiplication between an m-bit number and an n-bit number requires mn/64 FLOPs for a CPU with instruction size of 64bit.  ... 
arXiv:2203.06390v1 fatcat:5vplx2ep6vd2bjkpt3sq2b3vza

Learning Diverse Fashion Collocation by Neural Graph Filtering [article]

Xin Liu, Yongbin Sun, Ziwei Liu, Dahua Lin
2020 arXiv   pre-print
Learning Diverse Fashion Collocation by Neural Graph Filtering Xin Liu , Yongbin Sun , Ziwei Liu , and Dahua Lin , Member, IEEE, Fig. 1 : Flexibility and diversity enabled by our fashion collocation framework  ... 
arXiv:2003.04888v1 fatcat:yc3nh3rlt5dk3brr3p5svfjlri

Full-Spectrum Out-of-Distribution Detection [article]

Jingkang Yang, Kaiyang Zhou, Ziwei Liu
2022 arXiv   pre-print
Inspired by Liu et al.  ...  A straightforward way is to collect auxiliary OOD data like Liu et al . [4] for building a contrastive objective.  ... 
arXiv:2204.05306v1 fatcat:h2in6mczzre2rgm2ty3mh2qhvi

Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images [article]

Hang Zhou, Jihao Liu, Ziwei Liu, Yu Liu, Xiaogang Wang
2020 arXiv   pre-print
Though face rotation has achieved rapid progress in recent years, the lack of high-quality paired training data remains a great hurdle for existing methods. The current generative models heavily rely on datasets with multi-view images of the same person. Thus, their generated results are restricted by the scale and domain of the data source. To overcome these challenges, we propose a novel unsupervised framework that can synthesize photo-realistic rotated faces using only single-view image
more » ... ctions in the wild. Our key insight is that rotating faces in the 3D space back and forth, and re-rendering them to the 2D plane can serve as a strong self-supervision. We leverage the recent advances in 3D face modeling and high-resolution GAN to constitute our building blocks. Since the 3D rotation-and-render on faces can be applied to arbitrary angles without losing details, our approach is extremely suitable for in-the-wild scenarios (i.e. no paired data are available), where existing methods fall short. Extensive experiments demonstrate that our approach has superior synthesis quality as well as identity preservation over the state-of-the-art methods, across a wide range of poses and domains. Furthermore, we validate that our rotate-and-render framework naturally can act as an effective data augmentation engine for boosting modern face recognition systems even on strong baseline models.
arXiv:2003.08124v1 fatcat:zsveqwdvgraj7m2d7wfq6citja

Person-in-Context Synthesiswith Compositional Structural Space [article]

Weidong Yin, Ziwei Liu, Leonid Sigal
2020 arXiv   pre-print
Despite significant progress, controlled generation of complex images with interacting people remains difficult. Existing layout generation methods fall short of synthesizing realistic person instances; while pose-guided generation approaches focus on a single person and assume simple or known backgrounds. To tackle these limitations, we propose a new problem, Persons in Context Synthesis, which aims to synthesize diverse person instance(s) in consistent contexts, with user control over both.
more » ... e context is specified by the bounding box object layout which lacks shape information, while pose of the person(s) by keypoints which are sparsely annotated. To handle the stark difference in input structures, we proposed two separate neural branches to attentively composite the respective (context/person) inputs into shared "compositional structural space", which encodes shape, location and appearance information for both context and person structures in a disentangled manner. This structural space is then decoded to the image space using multi-level feature modulation strategy, and learned in a self supervised manner from image collections and their corresponding inputs. Extensive experiments on two large-scale datasets (COCO-Stuff and Visual Genome ) demonstrate that our framework outperforms state-of-the-art methods w.r.t. synthesis quality.
arXiv:2008.12679v1 fatcat:nwksbmdsc5g73b45h7xsdwdthm

Vision-Infused Deep Audio Inpainting [article]

Hang Zhou, Ziwei Liu, Xudong Xu, Ping Luo, Xiaogang Wang
2019 arXiv   pre-print
We thank Yu Liu and Yu Xiong for their helpful assistance.  ... 
arXiv:1910.10997v1 fatcat:b7uwo3sx2rd6jafagi3ldk7x7q

One-shot Face Reenactment [article]

Yunxuan Zhang, Siwei Zhang, Yue He, Cheng Li, Chen Change Loy, Ziwei Liu
2019 arXiv   pre-print
To enable realistic shape (e.g. pose and expression) transfer, existing face reenactment methods rely on a set of target faces for learning subject-specific traits. However, in real-world scenario end-users often only have one target face at hand, rendering existing methods inapplicable. In this work, we bridge this gap by proposing a novel one-shot face reenactment learning framework. Our key insight is that the one-shot learner should be able to disentangle and compose appearance and shape
more » ... ormation for effective modeling. Specifically, the target face appearance and the source face shape are first projected into latent spaces with their corresponding encoders. Then these two latent spaces are associated by learning a shared decoder that aggregates multi-level features to produce the final reenactment results. To further improve the synthesizing quality on mustache and hair regions, we additionally propose FusionNet which combines the strengths of our learned decoder and the traditional warping method. Extensive experiments show that our one-shot face reenactment system achieves superior transfer fidelity as well as identity preserving capability than alternatives. More remarkably, our approach trained with only one target image per subject achieves competitive results to those using a set of target images, demonstrating the practical merit of this work. Code, models and an additional set of reenacted faces have been publicly released at the project page.
arXiv:1908.03251v1 fatcat:tg5cejfuerfavbij3ryzyc4spe

Fashion Landmark Detection in the Wild [article]

Ziwei Liu, Sijie Yan, Ping Luo, Xiaogang Wang, Xiaoou Tang
2016 arXiv   pre-print
Visual fashion analysis has attracted many attentions in the recent years. Previous work represented clothing regions by either bounding boxes or human joints. This work presents fashion landmark detection or fashion alignment, which is to predict the positions of functional key points defined on the fashion items, such as the corners of neckline, hemline, and cuff. To encourage future studies, we introduce a fashion landmark dataset with over 120K images, where each image is labeled with eight
more » ... landmarks. With this dataset, we study fashion alignment by cascading multiple convolutional neural networks in three stages. These stages gradually improve the accuracies of landmark predictions. Extensive experiments demonstrate the effectiveness of the proposed method, as well as its generalization ability to pose estimation. Fashion landmark is also compared to clothing bounding boxes and human joints in two applications, fashion attribute prediction and clothes retrieval, showing that fashion landmark is a more discriminative representation to understand fashion images.
arXiv:1608.03049v1 fatcat:tvrfez4tlranvdff6s42zax55e

Unsupervised Landmark Learning from Unpaired Data [article]

Yinghao Xu, Ceyuan Yang, Ziwei Liu, Bo Dai, Bolei Zhou
2020 arXiv   pre-print
Recent attempts for unsupervised landmark learning leverage synthesized image pairs that are similar in appearance but different in poses. These methods learn landmarks by encouraging the consistency between the original images and the images reconstructed from swapped appearances and poses. While synthesized image pairs are created by applying pre-defined transformations, they can not fully reflect the real variances in both appearances and poses. In this paper, we aim to open the possibility
more » ... f learning landmarks on unpaired data (i.e. unaligned image pairs) sampled from a natural image collection, so that they can be different in both appearances and poses. To this end, we propose a cross-image cycle consistency framework (C^3) which applies the swapping-reconstruction strategy twice to obtain the final supervision. Moreover, a cross-image flow module is further introduced to impose the equivariance between estimated landmarks across images. Through comprehensive experiments, our proposed framework is shown to outperform strong baselines by a large margin. Besides quantitative results, we also provide visualization and interpretation on our learned models, which not only verifies the effectiveness of the learned landmarks, but also leads to important insights that are beneficial for future research.
arXiv:2007.01053v1 fatcat:aq4wuyg7drgczfyjaal57x45e4

Thermomicrofluidics for biosensing applications

Fei Tian, Ziwei Han, Jinqi Deng, Chao Liu, Jiashu Sun
2021 View  
The accurate detection of biological systems containing biomolecules and bioparticles is increasingly important in diverse research fields, particularly in analytical and biological chemistry. Implementing thermophoresis in microfluidic devices (thermomicrofluidics) enables the manipulation and measurement of biomolecules and bioparticles in a label-free and high-precision manner, providing a promising avenue for biosensing. This review presents fundamentals of thermophoresis and its coupling
more » ... th other thermal-induced physical phenomena in the microfluidic setups. We overview the capabilities of thermomicrofluidics for diverse biosensing applications such as monitoring of biomolecular interactions, detection of nucleic acids, profiling of extracellular vesicles, and manipulation of cells. Biosensing by thermomicrofluidics provides insights into physiopathological processes and disease diagnostics. Current challenges and further directions of thermomicrofluidic detection are discussed.
doi:10.1002/viw.20200148 fatcat:bdylo2dl2vcuzknmg337bz6plu

Learning to Synthesize Fashion Textures [article]

Wu Shi, Tak-Wai Hui, Ziwei Liu, Dahua Lin, Chen Change Loy
2019 arXiv   pre-print
Existing unconditional generative models mainly focus on modeling general objects, such as faces and indoor scenes. Fashion textures, another important type of visual elements around us, have not been extensively studied. In this work, we propose an effective generative model for fashion textures and also comprehensively investigate the key components involved: internal representation, latent space sampling and the generator architecture. We use Gram matrix as a suitable internal representation
more » ... for modeling realistic fashion textures, and further design two dedicated modules for modulating Gram matrix into a low-dimension vector. Since fashion textures are scale-dependent, we propose a recursive auto-encoder to capture the dependency between multiple granularity levels of texture feature. Another important observation is that fashion textures are multi-modal. We fit and sample from a Gaussian mixture model in the latent space to improve the diversity of the generated textures. Extensive experiments demonstrate that our approach is capable of synthesizing more realistic and diverse fashion textures over other state-of-the-art methods.
arXiv:1911.07472v1 fatcat:6vyr2ih6bfce3c4v2of5au6c2a

Generalized Out-of-Distribution Detection: A Survey [article]

Jingkang Yang, Kaiyang Zhou, Yixuan Li, Ziwei Liu
2021 arXiv   pre-print
Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of machine learning systems. For instance, in autonomous driving, we would like the driving system to issue an alert and hand over the control to humans when it detects unusual scenes or objects that it has never seen before and cannot make a safe decision. This problem first emerged in 2017 and since then has received increasing attention from the research community, leading to a plethora of methods
more » ... , ranging from classification-based to density-based to distance-based ones. Meanwhile, several other problems are closely related to OOD detection in terms of motivation and methodology. These include anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD). Despite having different definitions and problem settings, these problems often confuse readers and practitioners, and as a result, some existing studies misuse terms. In this survey, we first present a generic framework called generalized OOD detection, which encompasses the five aforementioned problems, i.e., AD, ND, OSR, OOD detection, and OD. Under our framework, these five problems can be seen as special cases or sub-tasks, and are easier to distinguish. Then, we conduct a thorough review of each of the five areas by summarizing their recent technical developments. We conclude this survey with open challenges and potential research directions.
arXiv:2110.11334v1 fatcat:bfx67gnn6zcr5emwcrfzxs4tom

Self-Supervised Scene De-occlusion [article]

Xiaohang Zhan, Xingang Pan, Bo Dai, Ziwei Liu, Dahua Lin, Chen Change Loy
2020 arXiv   pre-print
Natural scene understanding is a challenging task, particularly when encountering images of multiple objects that are partially occluded. This obstacle is given rise by varying object ordering and positioning. Existing scene understanding paradigms are able to parse only the visible parts, resulting in incomplete and unstructured scene interpretation. In this paper, we investigate the problem of scene de-occlusion, which aims to recover the underlying occlusion ordering and complete the
more » ... e parts of occluded objects. We make the first attempt to address the problem through a novel and unified framework that recovers hidden scene structures without ordering and amodal annotations as supervisions. This is achieved via Partial Completion Network (PCNet)-mask (M) and -content (C), that learn to recover fractions of object masks and contents, respectively, in a self-supervised manner. Based on PCNet-M and PCNet-C, we devise a novel inference scheme to accomplish scene de-occlusion, via progressive ordering recovery, amodal completion and content completion. Extensive experiments on real-world scenes demonstrate the superior performance of our approach to other alternatives. Remarkably, our approach that is trained in a self-supervised manner achieves comparable results to fully-supervised methods. The proposed scene de-occlusion framework benefits many applications, including high-quality and controllable image manipulation and scene recomposition (see Fig. 1), as well as the conversion of existing modal mask annotations to amodal mask annotations.
arXiv:2004.02788v1 fatcat:ppgf2zadlfegxfn23mhbqupfk4

Conditional Prompt Learning for Vision-Language Models [article]

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu
2022 arXiv   pre-print
See Liu et al. [34] for a more comprehensive survey. In computer vision, prompt learning is a nascent research direction that has only been explored very recently [27, 42, 56, 58, 62] .  ... 
arXiv:2203.05557v1 fatcat:tfflh77tavdbhkytwd4usznx2a

Semantic Facial Expression Editing using Autoencoded Flow [article]

Raymond Yeh, Ziwei Liu, Dan B Goldman, Aseem Agarwala
2016 arXiv   pre-print
High-level manipulation of facial expressions in images --- such as changing a smile to a neutral expression --- is challenging because facial expression changes are highly non-linear, and vary depending on the appearance of the face. We present a fully automatic approach to editing faces that combines the advantages of flow-based face manipulation with the more recent generative capabilities of Variational Autoencoders (VAEs). During training, our model learns to encode the flow from one
more » ... sion to another over a low-dimensional latent space. At test time, expression editing can be done simply using latent vector arithmetic. We evaluate our methods on two applications: 1) single-image facial expression editing, and 2) facial expression interpolation between two images. We demonstrate that our method generates images of higher perceptual quality than previous VAE and flow-based methods.
arXiv:1611.09961v1 fatcat:dmhjtihakjccvl44si6qqvzqv4
« Previous Showing results 1 — 15 out of 1,239 results