Filters








2,785 Hits in 10.7 sec

Contrastive Cross-Modal Pre-Training: A General Strategy for Small Sample Medical Imaging [article]

Gongbo Liang, Connor Greenwell, Yu Zhang, Xiaoqin Wang, Ramakanth Kavuluru, Nathan Jacobs
2021 arXiv   pre-print
We use an image-text matching task to train a feature extractor and then fine-tune it in a transfer learning setting for a supervised task using a small labeled dataset.  ...  A key challenge in training neural networks for a given medical imaging task is often the difficulty of obtaining a sufficient number of manually labeled examples.  ...  Contrastive Learning Contrastive learning is a machine learning strategy that learns the general features of a dataset by comparing the similarity and dissimilarity between data samples from the same class  ... 
arXiv:2010.03060v4 fatcat:oitmf7z7dzhwjlzpnhaxqroaui

DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning [article]

Zhuo Chen, Yufeng Huang, Jiaoyan Chen, Yuxia Geng, Wen Zhang, Yin Fang, Jeff Z. Pan, Wenting Song, Huajun Chen
2022 arXiv   pre-print
Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive  ...  learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute cooccurrence and imbalance; (3) proposed a multi-task learning policy for considering  ...  sampling; (ii) the linear weighted random sampling in Eq. ( 2 ); (ii) the non-linear weighted random sampling (simple based on squaring or softmax). • When it comes to the strategy for sampling contrastive  ... 
arXiv:2207.01328v2 fatcat:x3hu4th3zngdbgijltig3hywry

Knowledge-Augmented Contrastive Learning for Abnormality Classification and Localization in Chest X-rays with Radiomics using a Feedback Loop [article]

Yan Han, Chongyan Chen, Ahmed Tewfik, Benjamin Glicksberg, Ying Ding, Yifan Peng, Zhangyang Wang
2022 arXiv   pre-print
The key knob of our framework is a unique positive sampling approach tailored for the medical images, by seamlessly integrating radiomic features as a knowledge augmentation.  ...  In this way, our framework constitutes a feedback loop for image and radiomic modality features to mutually reinforce each other.  ...  However, their method needs to apply a pre-trained ResNet on the images to generate attention for radiomic features extraction and therefore relied on a multi-stage training heuristic.  ... 
arXiv:2104.04968v5 fatcat:yhowuveo4beiheifihaw4sbyxa

WuDaoMM: A large-scale Multi-Modal Dataset for Pre-training models [article]

Sha Yuan, Shuai Zhao, Jiahong Leng, Zhao Xue, Hanyu Zhao, Peiyu Liu, Zheng Gong, Wayne Xin Zhao, Junyi Li, Jie Tang
2022 arXiv   pre-print
We also release a base version of WuDaoMM with 5 million strong-correlated image-text pairs, which is sufficient to support the common cross-modal model pre-training.  ...  In this work, we introduce a large-scale multi-modal corpora named WuDaoMM, totally containing more than 650M image-text pairs.  ...  Acknowledgements The work is supported by the National Natural Science Foundation of China (NSFC) under Grant No. 61806111, NSFC for Distinguished Young Scholar under Grant No. 61825602 and National Key  ... 
arXiv:2203.11480v5 fatcat:vjgi4jll5felhjtvo6c6bqgtde

Structure Inducing Pre-Training [article]

Matthew B. A. McDermott, Brendan Yap, Peter Szolovits, Marinka Zitnik
2022 arXiv   pre-print
Based on this review, we introduce a descriptive framework for pre-training that allows for a granular, comprehensive understanding of how relational structure can be induced.  ...  on the distance or geometry between the pre-trained embeddings of two samples x⃗_i and x⃗_j.  ...  As their focus is on cross-modal pre-training of text and image alignment, it is orthogonal to our work. [122] General domain NLP and Computer Vision; This paper proposes a framework for simultaneous (  ... 
arXiv:2103.10334v3 fatcat:jlbzciqezjgs7a7tke4rhly4ty

EfficientCLIP: Efficient Cross-Modal Pre-training by Ensemble Confident Learning and Language Modeling [article]

Jue Wang, Haofan Wang, Jincan Deng, Weijia Wu, Debing Zhang
2021 arXiv   pre-print
While large scale pre-training has achieved great achievements in bridging the gap between vision and language, it still faces several challenges. First, the cost for pre-training is expensive.  ...  Extra rich non-paired single-modal text data is used for boosting the generalization of text branch.  ...  In contrast, few studies have focused on handling noise in cross-modal pre-training.  ... 
arXiv:2109.04699v2 fatcat:ezor5bfhnnbbbnwruibacpam3y

Survey: Transformer based Video-Language Pre-training [article]

Ludan Ruan, Qin Jin
2021 arXiv   pre-print
This survey aims to give a comprehensive overview on transformer-based pre-training methods for Video-Language learning.  ...  Finally, we analyze and discuss the current challenges and possible future research directions for Video-Language pre-training.  ...  samples for calculating contrastive losses.  ... 
arXiv:2109.09920v1 fatcat:ixysz5k4vrbktmf6cqftttls7m

Resource-efficient domain adaptive pre-training for medical images [article]

Yasar Mehmood, Usama Ijaz Bajwa, Xianfang Sun
2022 arXiv   pre-print
In DAPT, models are initialized with the generic dataset pre-trained weights, and further pre-training is performed using a moderately sized in-domain dataset (medical images).  ...  The second one adopts a hybrid strategy (hybrid DAPT) by performing partial DAPT for a few epochs and then full DAPT for the remaining epochs.  ...  Most of these models are pre-trained on generic datasets like ImageNet (Deng et al., 2009) , and many studies have used ImageNet pre-trained models for medical image analysis tasks (Haghighi, Taher,  ... 
arXiv:2204.13280v1 fatcat:ubul5k7b5rfwrhopepzwhtqt7y

Pre-training Methods in Information Retrieval [article]

Yixing Fan, Xiaohui Xie, Yinqiong Cai, Jia Chen, Xinyu Ma, Xiangsheng Li, Ruqing Zhang, Jiafeng Guo
2022 arXiv   pre-print
Considering the rapid progress of this direction, this survey aims to provide a systematic review of pre-training methods in IR.  ...  Owing to sophisticated pre-training objectives and huge model size, pre-trained models can learn universal language representations from massive textual data, which are beneficial to the ranking task of  ...  Acknowledgements References Pre-training Methods in Information Retrieval Acknowledgements  ... 
arXiv:2111.13853v3 fatcat:pilemnpphrgv5ksaktvctqdi4y

RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection [article]

Hangjie Yuan, Jianwen Jiang, Samuel Albanie, Tao Feng, Ziyuan Huang, Dong Ni, Mingqian Tang
2022 arXiv   pre-print
To address this gap, we propose Relational Language-Image Pre-training (RLIP), a strategy for contrastive pre-training that leverages both entity and relation descriptions.  ...  However, the design of an appropriate pre-training strategy for this task remains underexplored by existing approaches.  ...  Comparing cross-modal regional alignment pre-training with RLIP.  ... 
arXiv:2209.01814v1 fatcat:5h2fwmdxo5c3tm62hq4cxscykq

Unsupervised pre-training of graph transformers on patient population graphs [article]

Chantal Pellegrini, Nassir Navab, Anees Kazi
2022 arXiv   pre-print
Pre-training has shown success in different areas of machine learning, such as Computer Vision, Natural Language Processing (NLP), and medical imaging.  ...  In this paper, we propose novel unsupervised pre-training techniques designed for heterogeneous, multi-modal clinical data for patient outcome prediction inspired by masked language modeling (MLM), by  ...  In the medical domain, next to medical imaging, pre-training was applied for instance on medical code data [21, 22, 23, 24] and textual EHR data [25] .  ... 
arXiv:2207.10603v1 fatcat:zvrspnxwynd5ncjlg3kh6v27ki

Self-supervised multimodal reconstruction pre-training for retinal computer-aided diagnosis

Álvaro S. Hervella, José Rouco, Jorge Novo, Marcos Ortega
2021 Expert systems with applications  
In particular, we explore the use of a multimodal reconstruction task between complementary retinal imaging modalities.  ...  Nevertheless, the latter is typically faced using networks that were pre-trained on additional annotated data.  ...  Instead, they explore the use of synthetic complementary image modalities as an additional augmentation strategy in a contrastive learning instance discrimination setting.  ... 
doi:10.1016/j.eswa.2021.115598 fatcat:jnqnrmbt6rdo5aovd62xnuyei4

GPPF: A General Perception Pre-training Framework via Sparsely Activated Multi-Task Learning [article]

Benyuan Sun, Jin Dai, Zihao Liang, Congying Liu, Yi Yang, Bo Bai
2022 arXiv   pre-print
In this paper, we propose GPPF, a General Perception Pre-training Framework, that pre-trains a task-level dynamic network, which is composed by knowledge "legos" in each layers, on labeled multi-task and  ...  Pre-training over mixtured multi-task, multi-domain, and multi-modal data remains an open challenge in vision perception pre-training.  ...  The idea of cross-modal pre-training has also been extended to more modalities [40, 1] to form more general pre-training architecture.  ... 
arXiv:2208.02148v2 fatcat:2iecrxyd4jgvrhbwi2d4qgh53m

Self-Supervised Contrastive Pre-Training For Time Series via Time-Frequency Consistency [article]

Xiang Zhang, Ziyuan Zhao, Theodoros Tsiligkaridis, Marinka Zitnik
2022 arXiv   pre-print
To this end, we posit that time-frequency consistency (TF-C) -- embedding a time-based neighborhood of a particular example close to its frequency-based neighborhood and back -- is desirable for pre-training  ...  Motivated by TF-C, we define a decomposable pre-training model, where the self-supervised signal is provided by the distance between time and frequency components, each individually trained by contrastive  ...  for a generalizable pre-training strategy for time series.  ... 
arXiv:2206.08496v2 fatcat:db4oalkze5espftdexgo3oikm4

Retinal microaneurysms detection using adversarial pre-training with unlabeled multimodal images

Álvaro S. Hervella, José Rouco, Jorge Novo, Marcos Ortega
2021 Information Fusion  
In particular, we propose a novel adversarial multimodal pre-training consisting in the prediction of fluorescein angiography from retinography using generative adversarial networks.  ...  However, the detection of these lesions in retinography, the most widely available retinal imaging modality, remains a very challenging task.  ...  multimodal pre-training in medical image analysis [16] .  ... 
doi:10.1016/j.inffus.2021.10.003 fatcat:wnrq4bp7dzg25hzbsxsey7mfme
« Previous Showing results 1 — 15 out of 2,785 results