A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Automatically scheduling halide image processing pipelines
2016
ACM Transactions on Graphics
Halide programmers need only provide a high-level strategy for mapping an image processing pipeline to a parallel machine (a schedule), and the Halide compiler carries out the mechanical task of generating ...
Unfortunately, designing high-performance schedules for complex image processing pipelines requires substantial knowledge of modern hardware architecture and code-optimization techniques. ...
Parallelism Lab (supported by Oracle, AMD, Intel, and NVIDIA), and by equipment donations from NVIDIA. ...
doi:10.1145/2897824.2925952
fatcat:nr2o5mqsmncutdyxiuzrnont5e
A Stream Processing Framework for On-Line Optimization of Performance and Energy Efficiency on Heterogeneous Systems
2014
2014 IEEE International Parallel & Distributed Processing Symposium Workshops
Scheduling is automatically adapted on-line to continuously optimize performance and energy efficiency. ...
Scheduling should thus consider the performance of each processor as well as competing workloads and varying inputs. ...
For that reason and due to the merely moderate results of pipelining in [2] and [8] , Pipeline is the only strategy not to implement automatic performance and energy optimization in the class itself ...
doi:10.1109/ipdpsw.2014.119
dblp:conf/ipps/RanftDP14
fatcat:7yca7spnsbbprmk67o4yvmgcxe
CPU-GPU heterogeneous implementations of depth-based foreground detection
2018
IEICE Electronics Express
the relative performance between the CPU and GPU. ...
) by balancing the total workload between CPU and GPU. ...
Then, we applied the pipeline scheduling strategy by determining a computing device for each task and balancing the total execution times of CPU and GPU simultaneously. ...
doi:10.1587/elex.15.20170950
fatcat:hier7uoqrzhkdhyh6njkbzhf3i
Run-time Adaptation to Heterogeneous Processing Units for Real-time Stereo Vision
2012
2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems
On this basis, we develop and implement further strategies for heterogeneous systems and automatic adaptation to the hardware available at run-time. ...
Each approach is described concerning i. a. the propagation of data to processors and its relation to established methods. ...
Stereo vision is the process of recovering 3D structure from images of two side-by-side cameras, making it an important basis for environmental perception. ...
doi:10.1109/hpcc.2012.232
dblp:conf/hpcc/RanftD12
fatcat:ezz5rn7e3jb67nc247nujjruve
A Distributed Framework for Low-Latency OpenVX over the RDMA NoC of a Clustered Manycore
2018
2018 IEEE High Performance extreme Computing Conference (HPEC)
OpenVX is a standard proposed by the Khronos group for cross-platform acceleration of computer vision and deep learning applications. ...
OpenVX abstracts the target processor architecture complexity and automates the implementation of processing pipelines through high-level optimizations. ...
By contrast to OpenCV, the Khronos OpenVX standard [3] proposes a graph-based approach for the structured design of computer vision pipelines, where images flow as arcs between nodes, and nodes correspond ...
doi:10.1109/hpec.2018.8547736
dblp:conf/hpec/HascoetDDN18
fatcat:ri7iejxc2ffmhk3sntlilwxniq
Halide
2017
Communications of the ACM
Its model is simple enough to do so often in only a few lines of code, and small changes generate efficient implementations for x86, ARM, Graphics Processors (GPUs), and specialized image processors, all ...
We propose a new programming language for image processing pipelines, called Halide, that separates the algorithm from its schedule. ...
Most notably, Zalman Stern, Steven Johnson, and Patricia Suriana are full-time developers on the project at Google and are responsible for a large amount of the current code. ...
doi:10.1145/3150211
fatcat:4vhmxunjofam7daaaeiw5ssc7a
Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision Kernels
2019
2019 IEEE International Conference on Embedded Software and Systems (ICESS)
While for more complicated kernels and complete vision pipelines, the FPGA outperforms the others with energy/frame reduction ratios of 1.2-22.3×. ...
While for more complicated kernels and complete vision pipelines, the FPGA outperforms the others with energy/frame reduction ratios of 1.2-22.3×. ...
The Visionworks library applies many optimization techniques to boost performance, such as buffer reuse, kernel fusion, efficient use of streaming and CUDA textures, automatic scheduling across processing ...
doi:10.1109/icess.2019.8782524
dblp:conf/icess/QasaimehDLVZJ19
fatcat:s2bfurzoi5cn3j523kmhk3ozcy
Rethinking Training from Scratch for Object Detection
[article]
2021
arXiv
pre-print
Specifically, we propose a new training pipeline for object detection that follows 'pre-training and fine-tuning', utilizing low resolution images within target dataset to pre-training detector then load ...
Under this situation, we discover that the widely adopted large resizing strategy e.g. resize image to (1333, 800) is important for fine-tuning but it's not necessary for pre-training. ...
data diminishes the value of
ImageNet pre-training.
3 Methodology
Our aim is to setup a fast training pipeline for object detection. ...
arXiv:2106.03112v1
fatcat:tigh77gq2jbovjbci2r5cgbi6e
Programming Heterogeneous Systems from an Image Processing DSL
[article]
2016
arXiv
pre-print
Specialized image processing accelerators are necessary to deliver the performance and energy efficiency required by important applications in computer vision, computational photography, and augmented ...
that uses this code to automatically create the accelerator along with the "glue" code needed for the user's application to access this hardware. ...
C HLS tools raise the design level by decoupling clock timing and automatically scheduling pipelines and other resources. ...
arXiv:1610.09405v1
fatcat:p2qq2gcifnez7mtrswcl2h2vfy
A Unified Optimization Approach for CNN Model Inference on Integrated GPUs
2019
Proceedings of the 48th International Conference on Parallel Processing - ICPP 2019
ACKNOWLEDGMENT The authors thank the anonymous reviewers of the paper for valuable comments. ...
The authors are also grateful to Frank Chen and Long Gao for providing devices for experiments, and Tianqi Chen for technical assistance. The entire work was done at AWS. ...
As an overview of the pipeline, we wrote an optimized schedule template (Section 3.2.2), and then used AutoTVM [6] as well as graph tuner [26] to search the best schedules for different workloads ( ...
doi:10.1145/3337821.3337839
dblp:conf/icpp/WangCLWZLW19
fatcat:ptvsneujwjdmhesvcrune7rqwy
Extending Halide to Improve Software Development for Imaging DSPs
2017
ACM Transactions on Architecture and Code Optimization (TACO)
that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study ...
General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications ...
Recent advances in automatic scheduling may enable programmers to write efficient imaging code for any CPU, GPU or DSP target with Halide support, without detailed knowledge about any of these architectures ...
doi:10.1145/3106343
fatcat:lhwzjx4levafvbbxxl4mrzn2o4
Halide
2013
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation - PLDI '13
Image processing pipelines combine the challenges of stencil computations and stream programs. ...
We present a systematic model of the tradeoff space fundamental to stencil pipelines, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline ...
Acknowledgments Eric Chan provided feedback and inspiration throughout the design of Halide, and helped compare our local Laplacian filters implementation to his in Camera Raw. ...
doi:10.1145/2491956.2462176
dblp:conf/pldi/Ragan-KelleyBAPDA13
fatcat:tr3fzvh5arbbbo4nn2iqpivdaa
Halide
2013
SIGPLAN notices
Image processing pipelines combine the challenges of stencil computations and stream programs. ...
We present a systematic model of the tradeoff space fundamental to stencil pipelines, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline ...
Acknowledgments Eric Chan provided feedback and inspiration throughout the design of Halide, and helped compare our local Laplacian filters implementation to his in Camera Raw. ...
doi:10.1145/2499370.2462176
fatcat:afs2mud2unentdmcazyg2qhiqq
Scalable analysis of Big pathology image data cohorts using efficient methods and high-performance computing strategies
2015
BMC Bioinformatics
Conclusions: Our work demonstrates efficient CBIR algorithms and high performance computing can be leveraged for efficient analysis of large microscopy images to meet the challenges of clinically salient ...
Results: The proposed tools and methods take advantage of state-of-the-art parallel machines and efficient content-based image searching strategies. ...
Contract OCI-0910735, and the Nautilus system at the University of Tennessee's Center for Remote Data Analysis and Visualization supported by NSF Award ARRA-NSF-OCI-0906324. ...
doi:10.1186/s12859-015-0831-6
pmid:26627175
pmcid:PMC4667532
fatcat:lnnhszkk4vhi7d3yoi74t37e2m
Whale: Efficient Giant Model Training over Heterogeneous GPUs
[article]
2022
arXiv
pre-print
We present Whale, a general and efficient distributed training framework for giant models. ...
The Whale runtime utilizes those annotations and performs graph optimizations to transform a local deep learning DAG graph for distributed multi-GPU execution. ...
We would also like to thank the M6 team and all users of Whale for their help and suggestions. ...
arXiv:2011.09208v3
fatcat:zetefqb6o5htlhgp7gajhubxuy
« Previous
Showing results 1 — 15 out of 2,621 results