Filters








486 Hits in 5.5 sec

Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards [article]

Youngeun Kwon, Minsoo Rhu
2022 arXiv   pre-print
Personalized recommendation models (RecSys) are one of the most popular machine learning workload serviced by hyperscalers.  ...  In RecSys, the so-called embedding layers account for the majority of memory usage so current systems employ a hybrid CPU-GPU design to have the large CPU memory store the memory hungry embedding layers  ...  We also appreciate the support from Samsung Electronics Co., Ltd. Minsoo Rhu is the corresponding author.  ... 
arXiv:2205.04702v1 fatcat:mvgjodrsrvesxodk2xnfvt3khy

Interactive GPU-based "Visulation" and Structure Analysis of 3-D Implicit Surfaces for Seismic Interpretation

Benjamin J. Kadlec
2010 Geophysics  
This work has been implemented on the GPU for increased performance and interaction.  ...  Most importantly, Henry has brought me into his personal life in a way that elevates the studentadvisor relationship to something closer to friendship.  ...  A.1 Research Methodology Research will be conducted at TerraSpark Geosciences LLP with advising from Henry Tufo.  ... 
doi:10.1190/1.3426303 fatcat:ecwwmgogvvdr5dxvkql7bbuvuu

NNTrainer: Light-Weight On-Device Training Framework [article]

Ji Joong Moon, Parichay Kapoor, Ji Hoon Lee, Myung Joo Ham, Hyun Suk Lee
2022 arXiv   pre-print
We find such a trend as the opportunity to personalize intelligence services by updating neural networks with user data without exposing the data out of devices: on-device training.  ...  The evaluations show that NNTrainer can reduce memory consumption down to 1/28 without deteriorating accuracy or training time and effectively personalizes applications on devices.  ...  Various Applications and Personalization Figure 12 shows the memory consumed to train neural networks from scratch or partially for personalization with a batch size of 32 and compares between NNTrainer  ... 
arXiv:2206.04688v1 fatcat:knu6lw3cnzaldndvqrqpm247j4

Understanding Training Efficiency of Deep Learning Recommendation Models at Scale [article]

Bilge Acun, Matthew Murphy, Xiaodong Wang, Jade Nie, Carole-Jean Wu, Kim Hazelwood
2020 arXiv   pre-print
The goal of this paper is to explain the intricacies of using GPUs for training recommendation models, factors affecting hardware efficiency at scale, and learnings from a new scale-up GPU server design  ...  Meanwhile, when training state-of-the-art personal recommendation models, which consume the highest number of compute cycles at our large-scale datacenters, the use of GPUs came with various challenges  ...  The batch size of the model dictates the number of examples processed through a forward/backward pass during model training.  ... 
arXiv:2011.05497v1 fatcat:6nddgrsi25fbhiwxswnz263pda

Petals: Collaborative Inference and Fine-tuning of Large Models [article]

Alexander Borzunov, Dmitry Baranchuk, Tim Dettmers, Max Ryabinin, Younes Belkada, Artem Chumachenko, Pavel Samygin, Colin Raffel
2022 arXiv   pre-print
Many NLP tasks benefit from using large language models (LLMs) that often have more than 100 billion parameters.  ...  However, these techniques have innate limitations: offloading is too slow for interactive inference, while APIs are not flexible enough for research.  ...  A part of the experiments was conducted on a personal server of Elena Voita.  ... 
arXiv:2209.01188v1 fatcat:vztu7iwljjbnhd4ea4i7ipqbwy

Prediction of Hashtags for Images

Taurunika Shivashankaran, Vidyavardhaka College of Engineering
2020 International Journal of Engineering Research and  
In this paper, we have labored on constructing our personal dataset of images that can be used to predict appropriate hashtags for images.  ...  Hence, alternate methods to routinely generate training sets, like pairs of pictures and also tags are grasped.  ...  Batch size is known as the number of training examples in 1 Forward/1 Backward pass. (With increase in Batch size, required memory space increases.)  ... 
doi:10.17577/ijertv9is070419 fatcat:2jx3ljdoj5aztnjayx4pyzavvq

Machine Learning In Astroinformatics Using Massively Parallel Data Processing

Tomas Peterka, CSc. RNDr. Petr Skoda
2015 Zenodo  
NVIDIA DIGITS The NVIDIA Deep Learning GPU Training System (DIGITS) is a software built on top of cuDNN and Caffe and released to public on 14th of March 2015.  ...  The key features are • Visualize DNN topology and how training data activates your network • Manage training of many DNNs in parallel on multi-GPU systems • Simple setup and launch • Import a wide variety  ...  GPU graphical processing unit. GT/s giga transfers per second. ML machine learning.  ... 
doi:10.5281/zenodo.44728 fatcat:dyga25itxvaf3loliwrjgz7vde

PrivFT: Private and Fast Text Classification with Homomorphic Encryption [article]

Ahmad Al Badawi, Luong Hoang, Chan Fook Mun, Kim Laine, Khin Mi Mi Aung
2019 arXiv   pre-print
For inference, we train a supervised model and outline a system for homomorphic inference on encrypted user inputs with zero loss to prediction accuracy.  ...  Our system (named Private Fast Text (PrivFT)) performs two tasks: 1) making inference of encrypted user inputs using a plaintext model and 2) training an effective model using an encrypted dataset.  ...  , we focus on how to train a model from scratch using encrypted dataset.  ... 
arXiv:1908.06972v2 fatcat:qgr3yrqdxvdfpke2lpmqw3q2ra

CoNT: Contrastive Neural Text Generation [article]

Chenxin An, Jiangtao Feng, Kai Lv, Lingpeng Kong, Xipeng Qiu, Xuanjing Huang
2022 arXiv   pre-print
CoNT addresses bottlenecks that prevent contrastive learning from being widely adopted in generation tasks from three aspects -- the construction of contrastive examples, the choice of the contrastive  ...  It introduces a sequence-level training signal which is crucial to generation tasks that always rely on auto-regressive decoding.  ...  We train our model until the validation loss do not decrease. The total training hours using 4 GPUs is about 6 hours for small model and 12 hours for large model.  ... 
arXiv:2205.14690v1 fatcat:c37e2qw62ba5nlllg45lkvfbye

Studio2Shop: From Studio Photo Shoots to Fashion Articles

Julia Lasserre, Katharina Rasch, Roland Vollgraf
2018 Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods  
This paper focuses on finding pieces of clothing worn by a person in full-body or half-body images with neutral backgrounds.  ...  Solving this problem promises new means of making fashion searchable and helping shoppers find the articles they are looking for.  ...  A second observation is that learning our feature representation from scratch does not perform as well as a pre-trained feature representation. The reason is two-fold.  ... 
doi:10.5220/0006544500370048 dblp:conf/icpram/LasserreRV18 fatcat:35ozxjfaufdkphtloq7t44gz3y

Counter-Strike Deathmatch with Large-Scale Behavioural Cloning [article]

Tim Pearce, Jun Zhu
2021 arXiv   pre-print
This paper describes an AI agent that plays the popular first-person-shooter (FPS) video game 'Counter-Strike; Global Offensive' (CSGO) from pixel input.  ...  Our solution uses behavioural cloning - training on a large noisy dataset scraped from human play on online servers (4 million frames, comparable in size to ImageNet), and a smaller dataset of high-quality  ...  CSGO Environment CSGO is played from a first person perspective, with mechanics and controls that are standard across FPS games -the keyboard is used to move the player left/right/forward/backwards, while  ... 
arXiv:2104.04258v2 fatcat:5hstp4kdhba3fnprs5dskyyctq

How hard is it to cross the room? -- Training (Recurrent) Neural Networks to steer a UAV [article]

Klaas Kelchtermans, Tinne Tuytelaars
2017 arXiv   pre-print
This work explores the feasibility of steering a drone with a (recurrent) neural network, based on input from a forward looking camera, in the context of a high-level navigation task.  ...  Further, end-to-end training requires a lot of data which often is not available.  ...  view coming from a forward looking camera on a drone spawned in a room in which it should follow a certain trajectory.  ... 
arXiv:1702.07600v1 fatcat:ln7nsbs33jc3njka2vax6j2ukm

Towards More Accurate Automatic Sleep Staging via Deep Transfer Learning [article]

Huy Phan and Oliver Y. Chén and Philipp Koch and Zongqing Lu and Ian McLoughlin and Alfred Mertins and Maarten De Vos
2020 IEEE Transactions on Biomedical Engineering   accepted
The networks are first trained in the source domain (i.e. the large database). The pretrained networks are then finetuned in the target domain (i.e. the small cohort) to complete knowledge transfer.  ...  Methods: We start from a generic end-to-end deep learning framework for sequence-to-sequence sleep staging and derive two networks as the means for transfer learning.  ...  We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan V GPU used for this research. We would like to thank Dr.  ... 
doi:10.1109/tbme.2020.3020381 pmid:32866092 arXiv:1907.13177v3 fatcat:domyetqu5rb3tj4svc53lstgje

Looking beyond appearances: Synthetic training data for deep CNNs in re-identification

Igor Barros Barbosa, Marco Cristani, Barbara Caputo, Aleksander Rognhaugen, Theoharis Theoharis
2018 Computer Vision and Image Understanding  
First, SOMAnet is based on the Inception architecture, departing from the usual siamese framework.  ...  Re-identification is generally carried out by encoding the appearance of a subject in terms of outfit, suggesting scenarios where people do not change their attire.  ...  We opted for extracting poses from a recording titled 'navigate', where the subject walks forwards, backwards and sideways.  ... 
doi:10.1016/j.cviu.2017.12.002 fatcat:nvghxjvt4bg6dab7neyxyua4dq

ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network [article]

David Gschwend
2020 arXiv   pre-print
Image Understanding is becoming a vital feature in ever more applications ranging from medical diagnostics to autonomous vehicles.  ...  Many applications demand for embedded solutions that integrate into existing systems with tight real-time and power constraints.  ...  model was rewritten from scratch.  ... 
arXiv:2005.06892v1 fatcat:tduahjb5w5cjromemahngmt3gy
« Previous Showing results 1 — 15 out of 486 results