Filters








417 Hits in 1.7 sec

Open Source Hardware

Frank Hannig, Jurgen Teich
<span title="">2021</span> <i title="Institute of Electrical and Electronics Engineers (IEEE)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/dsrvu6bllzai7oj3hktnc5yf4q" style="color: black;">Computer</a> </i> &nbsp;
The authors of this article, Hannig and Teich, provide us with an overview of a deep and rich topic that might well warrant its own column.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/mc.2021.3099046">doi:10.1109/mc.2021.3099046</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/bdvewr4qdfa4xlyc6nozs25vii">fatcat:bdvewr4qdfa4xlyc6nozs25vii</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20211001013914/https://ieeexplore.ieee.org/ielx7/2/9548012/09548130.pdf?tp=&amp;arnumber=9548130&amp;isnumber=9548012&amp;ref=" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/2e/5d/2e5d0fd5ad923544962b271bcb2fbcd5888197d7.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/mc.2021.3099046"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>

Massively Parallel Processor Architectures for Resource-aware Computing [article]

Vahid Lari, Alexandru Tanase, Frank Hannig, Jürgen Teich
<span title="2014-05-12">2014</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We present a class of massively parallel processor architectures called invasive tightly coupled processor arrays (TCPAs). The presented processor class is a highly parameterizable template, which can be tailored before runtime to fulfill costumers' requirements such as performance, area cost, and energy efficiency. These programmable accelerators are well suited for domain-specific computing from the areas of signal, image, and video processing as well as other streaming processing
more &raquo; ... . To overcome future scaling issues (e.g., power consumption, reliability, resource management, as well as application parallelization and mapping), TCPAs are inherently designed in a way to support self-adaptivity and resource awareness at hardware level. Here, we follow a recently introduced resource-aware parallel computing paradigm called invasive computing where an application can dynamically claim, execute, and release resources. Furthermore, we show how invasive computing can be used as an enabler for power management. Finally, we will introduce ideas on how to realize fault-tolerant loop execution on such massively parallel architectures through employing on-demand spatial redundancies at the processor array level.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1405.2907v1">arXiv:1405.2907v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/uardefsilngdnbllkg5qkmsxha">fatcat:uardefsilngdnbllkg5qkmsxha</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200826150134/https://arxiv.org/ftp/arxiv/papers/1405/1405.2907.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/8b/18/8b183e17329c911eb07c82ccd986988563ae138a.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1405.2907v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Generation of Distributed Loop Control [chapter]

Marcus Bednara, Frank Hannig, Jürgen Teich
<span title="">2002</span> <i title="Springer Berlin Heidelberg"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/2w3awgokqne6te4nvlofavy5a4" style="color: black;">Lecture Notes in Computer Science</a> </i> &nbsp;
We present a new methodology for controlling the space-time behavior of VLSI and FPGA-based processor arrays. The main idea is to generate simple local control elements which take control over the activeness of each attached processor element. Each control element thereby propagates a "start" and a "stop execution" signal to its neighbors. We show that our control mechanism is much more efficient than existing approaches because 1) only two control signals (start/stop) are required, 2) no
more &raquo; ... ion of the computation space is necessary. 3) By the local propagation of just one start/stop signal, energy is saved as processing elements are only active between the time they have received the start signal and the time they have received the stop signal. Our methodology is applicable to one-and multi-dimensional processor arrays and is based on local control signal propagation. We provide a theoretical analysis of the overhead caused by the control structure.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/3-540-45874-3_9">doi:10.1007/3-540-45874-3_9</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/qplvjhoqcvhw5hmuqvbihr23qe">fatcat:qplvjhoqcvhw5hmuqvbihr23qe</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170808230128/https://www12.informatik.uni-erlangen.de/publications/hannig/papers/BHT02.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/f5/e0/f5e0af0e7f5342ed1c5242bb3546b88275036fcf.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/3-540-45874-3_9"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>

Utilizing Explainable AI for Quantization and Pruning of Deep Neural Networks [article]

Muhammad Sabih, Frank Hannig, Juergen Teich
<span title="2020-08-20">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
For many applications, utilizing DNNs (Deep Neural Networks) requires their implementation on a target architecture in an optimized manner concerning energy consumption, memory requirement, throughput, etc. DNN compression is used to reduce the memory footprint and complexity of a DNN before its deployment on hardware. Recent efforts to understand and explain AI (Artificial Intelligence) methods have led to a new research area, termed as explainable AI. Explainable AI methods allow us to
more &raquo; ... and better the inner working of DNNs, such as the importance of different neurons and features. The concepts from explainable AI provide an opportunity to improve DNN compression methods such as quantization and pruning in several ways that have not been sufficiently explored so far. In this paper, we utilize explainable AI methods: mainly DeepLIFT method. We use these methods for (1) pruning of DNNs; this includes structured and unstructured pruning of CNN filters pruning as well as pruning weights of fully connected layers, (2) non-uniform quantization of DNN weights using clustering algorithm; this is also referred to as Weight Sharing, and (3) integer-based mixed-precision quantization; this is where each layer of a DNN may use a different number of integer bits. We use typical image classification datasets with common deep learning image classification models for evaluation. In all these three cases, we demonstrate significant improvements as well as new insights and opportunities from the use of explainable AI in DNN compression.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2008.09072v1">arXiv:2008.09072v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/r7ypi5tgrrdtxmetn3iyqufdxy">fatcat:r7ypi5tgrrdtxmetn3iyqufdxy</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200822131019/https://arxiv.org/pdf/2008.09072v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2008.09072v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

HipaccVX: Wedding of OpenVX and DSL-based Code Generation [article]

M. Akif Özkan, Burak Ok, Bo Qiao, Jürgen Teich, Frank Hannig
<span title="2020-08-26">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Writing programs for heterogeneous platforms optimized for high performance is hard since this requires the code to be tuned at a low level with architecture-specific optimizations that are most times based on fundamentally differing programming paradigms and languages. OpenVX promises to solve this issue for computer vision applications with a royalty-free industry standard that is based on a graph-execution model. Yet, the OpenVX' algorithm space is constrained to a small set of vision
more &raquo; ... ns. This hinders accelerating computations that are not included in the standard. In this paper, we analyze OpenVX vision functions to find an orthogonal set of computational abstractions. Based on these abstractions, we couple an existing Domain-Specific Language (DSL) back end to the OpenVX environment and provide language constructs to the programmer for the definition of user-defined nodes. In this way, we enable optimizations that are not possible to detect with OpenVX graph implementations using the standard computer vision functions. These optimizations can double the throughput on an Nvidia GTX GPU and decrease the resource usage of a Xilinx Zynq FPGA by 50% for our benchmarks. Finally, we show that our proposed compiler framework, called HipaccVX, can achieve better results than the state-of-the-art approaches Nvidia VisionWorks and Halide-HLS.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2008.11476v1">arXiv:2008.11476v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/e4yyu4ei7nayjma5rpt6p3nmei">fatcat:e4yyu4ei7nayjma5rpt6p3nmei</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200901135545/https://arxiv.org/pdf/2008.11476v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2008.11476v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

High-Speed Event-Driven RTL Compiled Simulation [chapter]

Alexey Kupriyanov, Frank Hannig, Jürgen Teich
<span title="">2004</span> <i title="Springer Berlin Heidelberg"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/2w3awgokqne6te4nvlofavy5a4" style="color: black;">Lecture Notes in Computer Science</a> </i> &nbsp;
In this paper we present a new approach for generating high-speed optimized event-driven register transfer level (RTL) compiled simulators. The generation of the simulators is part of our BUILDABONG [7] framework, which aims at architecture and compiler co-generation for special purpose processors. The main focus of the paper is on the transformation of a given architecture's circuit into a graph and applying on it an essential graph decomposition algorithm to transform the graph into subgraphs
more &raquo; ... denoting the minimal subsets of sequential elements which have to be reevaluated during each simulation cycle. As a second optimization, we present a partitioning algorithm, which introduces intermediate registers to minimize the number of evaluations of combinational nodes during a simulation cycle. The simulator's superior performance compared to an existing commercial simulator is shown. Finally, we demonstrate the pertinence of our approach by simulating a MIPS processor.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-540-27776-7_53">doi:10.1007/978-3-540-27776-7_53</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ooza2zziefckbl6m7oir4f4s64">fatcat:ooza2zziefckbl6m7oir4f4s64</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20070611181723/http://www12.informatik.uni-erlangen.de/publications/hannig/papers/KHT04a.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/38/34/38343ad5eee00f3a62e257bc3c67717285a2bb46.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-540-27776-7_53"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>

Design Space Exploration for Massively Parallel Processor Arrays [chapter]

Frank Hannig, Jürgen Teich
<span title="">2001</span> <i title="Springer Berlin Heidelberg"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/2w3awgokqne6te4nvlofavy5a4" style="color: black;">Lecture Notes in Computer Science</a> </i> &nbsp;
In this paper, we describe an approach for the optimization of dedicated co-processors that are implemented either in hardware (ASIC) or configware (FPGA). Such massively parallel co-processors are typically part of a heterogeneous hardware/software-system. Each coprocessor is a massive parallel system consisting of an array of processing elements (PEs). In order to decide whether to map a computational intensive task into hardware, existing approaches either try to optimize for performance or
more &raquo; ... or cost with the other objective being a secondary goal. Our approach presented here, instead, a) considers multiple objectives simultaneously. For a given specification, we explore space-time-mappings leading to different degrees of parallelism and cost, and different optimal hardware solutions. b) We show that the hardware cost may be efficiently determined in terms of the chosen space-time mapping by using stateof-the-art techniques in polyhedral theory. c) Finally, we introduce ideas to drastically reduce dimension and size of the search space of mapping candidates. d) The feasibility of our approach is shown for two realistic examples.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/3-540-44743-1_5">doi:10.1007/3-540-44743-1_5</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/3erpp4dk3bgqhc2epjdfz3p42e">fatcat:3erpp4dk3bgqhc2epjdfz3p42e</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170829054019/https://www12.informatik.uni-erlangen.de/publications/hannig/papers/HT01.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/e7/44/e744e863284f9fce78c82f1e69418c10d7075258.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/3-540-44743-1_5"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>

Automatic Optimization of Hardware Accelerators for Image Processing [article]

Oliver Reiche, Konrad Häublein, Marc Reichenbach, Frank Hannig, Jürgen Teich, Dietmar Fey
<span title="2015-02-26">2015</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In the domain of image processing, often real-time constraints are required. In particular, in safety-critical applications, such as X-ray computed tomography in medical imaging or advanced driver assistance systems in the automotive domain, timing is of utmost importance. A common approach to maintain real-time capabilities of compute-intensive applications is to offload those computations to dedicated accelerator hardware, such as Field Programmable Gate Arrays (FPGAs). Programming such
more &raquo; ... ectures is a challenging task, with respect to the typical FPGA-specific design criteria: Achievable overall algorithm latency and resource usage of FPGA primitives (BRAM, FF, LUT, and DSP). High-Level Synthesis (HLS) dramatically simplifies this task by enabling the description of algorithms in well-known higher languages (C/C++) and its automatic synthesis that can be accomplished by HLS tools. However, algorithm developers still need expert knowledge about the target architecture, in order to achieve satisfying results. Therefore, in previous work, we have shown that elevating the description of image algorithms to an even higher abstraction level, by using a Domain-Specific Language (DSL), can significantly cut down the complexity for designing such algorithms for FPGAs. To give the developer even more control over the common trade-off, latency vs. resource usage, we will present an automatic optimization process where these criteria are analyzed and fed back to the DSL compiler, in order to generate code that is closer to the desired design specifications. Finally, we generate code for stereo block matching algorithms and compare it with handwritten implementations to quantify the quality of our results.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1502.07448v1">arXiv:1502.07448v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/krg6zewh75hyxp47lhei4ikq3i">fatcat:krg6zewh75hyxp47lhei4ikq3i</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200905064606/https://arxiv.org/ftp/arxiv/papers/1502/1502.07448.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/59/f7/59f73ef8df116fe5f3df9df005c61d23d37860e2.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1502.07448v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Automatic FIR Filter Generation for FPGAs [chapter]

Holger Ruckdeschel, Hritam Dutta, Frank Hannig, Jürgen Teich
<span title="">2005</span> <i title="Springer Berlin Heidelberg"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/2w3awgokqne6te4nvlofavy5a4" style="color: black;">Lecture Notes in Computer Science</a> </i> &nbsp;
This paper presents a new tool for the automatic generation of highly parallelized Finite Impulse Response (FIR) filters. In this approach we follow our PARO design methodology. PARO is a design system project for modeling, transformation, optimization, and synthesis of massively parallel VLSI architectures. The FIR filter generator employs during the design flow the following advanced transformations, (a) hierarchical partitioning in order to balance the amount of local memory with external
more &raquo; ... munication, and (b), partial localization to achieve higher throughput and smaller latencies. Furthermore, our filter generator allows for design space exploration to tackle trade-offs in cost and speed. Finally, synthesizable VHDL code is generated and mapped to an FPGA, the results are compared with a commercial filter generator.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/11512622_7">doi:10.1007/11512622_7</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/vifqb55j3jhu3k6nlu3zhbcjda">fatcat:vifqb55j3jhu3k6nlu3zhbcjda</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20070418024422/http://www12.informatik.uni-erlangen.de/publications/hannig/papers/SAMOS2005.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/8f/ad/8fad87d289ed8a4ed14569ca9de460da44b700c6.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/11512622_7"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>

Controller Synthesis for Mapping Partitioned Programs on Array Architectures [chapter]

Hritam Dutta, Frank Hannig, Jürgen Teich
<span title="">2006</span> <i title="Springer Berlin Heidelberg"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/2w3awgokqne6te4nvlofavy5a4" style="color: black;">Lecture Notes in Computer Science</a> </i> &nbsp;
Processor arrays can be used as accelerators for a plenty of dataflow-dominant applications. Innately these applications have almost no control flow, but the application of sophisticated partitioning and scheduling techniques in order to handle large scale problems and to balance local memory requirements with I/O-bandwidth has the disadvantage of a more complex control flow. Thus, efficient control path synthesis is one of the greatest challenges when compiling algorithms onto processor
more &raquo; ... This paper presents an efficient methodology for the automated control path synthesis for the mapping of partitioned algorithms onto processor arrays. The major advantages observed in the presented methodology are seen in, (a) control generation for different partitioning techniques and arbitrary parallelepiped tiles, (b) combined use of a global and a local control strategy in order to reduce the control overhead, (c) up to 90 percent reduction in control path area and resources compared to existing approaches. 1 J ⊕ K = {i = j + P · k | j ∈ J ∧ k ∈ K ∧ P ∈ Z n×n } 2 J ⊕ K ⊕ L = {i = j + P LS · k + P GS · l | j ∈ J ∧ k ∈ K ∧ l ∈ L ∧ P LS , P GS ∈ Z n×n }
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/11682127_13">doi:10.1007/11682127_13</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/2ri566quzneadmx7p6pwjhwgwa">fatcat:2ri566quzneadmx7p6pwjhwgwa</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170705093822/https://www12.informatik.uni-erlangen.de/publications/pub2005/TR03_2005.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/cc/20/cc20a1dae902238a840256a88288addae68b0aee.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/11682127_13"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>

Code Generation for High-Level Synthesis of Multiresolution Applications on FPGAs [article]

Moritz Schmid, Oliver Reiche, Christian Schmitt, Frank Hannig, Jürgen Teich
<span title="2014-08-20">2014</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Multiresolution Analysis (MRA) is a mathematical method that is based on working on a problem at different scales. One of its applications is medical imaging where processing at multiple scales, based on the concept of Gaussian and Laplacian image pyramids, is a well-known technique. It is often applied to reduce noise while preserving image detail on different levels of granularity without modifying the filter kernel. In scientific computing, multigrid methods are a popular choice, as they are
more &raquo; ... asymptotically optimal solvers for elliptic Partial Differential Equations (PDEs). As such algorithms have a very high computational complexity that would overwhelm CPUs in the presence of real-time constraints, application-specific processors come into consideration for implementation. Despite of huge advancements in leveraging productivity in the respective fields, designers are still required to have detailed knowledge about coding techniques and the targeted architecture to achieve efficient solutions. Recently, the HIPAcc framework was proposed as a means for automatic code generation of image processing algorithms, based on a Domain-Specific Language (DSL). From the same code base, it is possible to generate code for efficient implementations on several accelerator technologies including different types of Graphics Processing Units (GPUs) as well as reconfigurable logic (FPGAs). In this work, we demonstrate the ability of HIPAcc to generate code for the implementation of multiresolution applications on FPGAs and embedded GPUs.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1408.4721v1">arXiv:1408.4721v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/qoj4yo6gejac7gj7jjhc6qex6u">fatcat:qoj4yo6gejac7gj7jjhc6qex6u</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200906004424/https://arxiv.org/ftp/arxiv/papers/1408/1408.4721.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/5e/da/5edaa9ae9582129eb2c0838665573b5623e9a789.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1408.4721v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Mastering Software Variant Explosion for GPU Accelerators [chapter]

Richard Membarth, Frank Hannig, Jürgen Teich, Mario Körner, Wieland Eckert
<span title="">2013</span> <i title="Springer Berlin Heidelberg"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/2w3awgokqne6te4nvlofavy5a4" style="color: black;">Lecture Notes in Computer Science</a> </i> &nbsp;
Mapping algorithms in an efficient way to the target hardware poses a challenge for algorithm designers. This is particular true for heterogeneous systems hosting accelerators like graphics cards. While algorithm developers have profound knowledge of the application domain, they often lack detailed insight into the underlying hardware of accelerators in order to exploit the provided processing power. Therefore, this paper introduces a rule-based, domain-specific optimization engine for
more &raquo; ... g the most appropriate code variant for different Graphics Processing Unit (GPU) accelerators. The optimization engine relies on knowledge fused from the application domain and the target architecture. The optimization engine is embedded into a framework that allows to design imaging algorithms in a Domain-Specific Language (DSL). We show that this allows to have one common description of an algorithm in the DSL and select the optimal target code variant for different GPU accelerators and target languages like CUDA and OpenCL.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-642-36949-0_15">doi:10.1007/978-3-642-36949-0_15</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/73m6mnyjqncsxcbfseqmjbfqta">fatcat:73m6mnyjqncsxcbfseqmjbfqta</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20121114202209/http://www12.informatik.uni-erlangen.de/publications/membarth/membarth2012msv.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/c0/fb/c0fbdb0b2e5d2b2171db78fc6b6119670255e661.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-642-36949-0_15"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>

Defragmenting the Module Layout of a Partially Reconfigurable Device [article]

Jan van der Veen and Sandor P. Fekete and Ali Ahmadinia and Christophe Bobda and Frank Hannig and Juergen Teich
<span title="2005-05-02">2005</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Modern generations of field-programmable gate arrays (FPGAs) allow for partial reconfiguration. In an online context, where the sequence of modules to be loaded on the FPGA is unknown beforehand, repeated insertion and deletion of modules leads to progressive fragmentation of the available space, making defragmentation an important issue. We address this problem by propose an online and an offline component for the defragmentation of the available space. We consider defragmenting the module
more &raquo; ... ut on a reconfigurable device. This corresponds to solving a two-dimensional strip packing problem. Problems of this type are NP-hard in the strong sense, and previous algorithmic results are rather limited. Based on a graph-theoretic characterization of feasible packings, we develop a method that can solve two-dimensional defragmentation instances of practical size to optimality. Our approach is validated for a set of benchmark instances.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/cs/0505005v1">arXiv:cs/0505005v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/dmoqr2l3e5ev3ebit3akqk2eve">fatcat:dmoqr2l3e5ev3ebit3akqk2eve</a> </span>
<a target="_blank" rel="noopener" href="https://archive.org/download/arxiv-cs0505005/cs0505005.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> File Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/34/d4/34d419c63c71cfb856998e433d333357605a03c6.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/cs/0505005v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Special issue on heterogeneous real-time image processing

Dietmar Fey, Frank Hannig
<span title="">2018</span> <i title="Springer Nature"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/ihxuyfsqpjeoxn5hsmj4p7gmpy" style="color: black;">Journal of Real-Time Image Processing</a> </i> &nbsp;
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/s11554-018-0763-2">doi:10.1007/s11554-018-0763-2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/q2e4hg43eng4losfaw7ykkkkiq">fatcat:q2e4hg43eng4losfaw7ykkkkiq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20180730025601/https://link.springer.com/content/pdf/10.1007%2Fs11554-018-0763-2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/74/1f/741f16cec13959599af64240ecf737c111f502b4.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/s11554-018-0763-2"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>

Hardware Cost Analysis for Weakly Programmable Processor Arrays

Dmitrij Kissler, Frank Hannig, Alexey Kupriyanov, Jurgen Teich
<span title="">2006</span> <i title="IEEE"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/tpji4zllfrgvhci7agbxqfolze" style="color: black;">2006 International Symposium on System-on-Chip</a> </i> &nbsp;
Growing complexity and speed requirements in modern application areas such as wireless communication and multimedia in embedded devices demand for flexible and efficient parallel hardware architectures. The inherent parallelism in these application fields has to be reflected at the hardware level to achieve high performance. Coarse-grained reconfigurable architectures support a high degree of parallelism at multiple levels. In this paper technology-independent hardware cost analysis for a new
more &raquo; ... ass of highly parameterizable coarse-grained reconfigurable architectures called weakly programmable processor arrays is performed.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/issoc.2006.321996">doi:10.1109/issoc.2006.321996</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/issoc/KisslerHKT06.html">dblp:conf/issoc/KisslerHKT06</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ira6n7n3qfac7ib5j7vd7gszgm">fatcat:ira6n7n3qfac7ib5j7vd7gszgm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20070824110102/http://www12.informatik.uni-erlangen.de/publications/pub2006/KHKT06b.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/16/9d/169dad5863876c3772f7d569f7fd86d1be40c99f.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/issoc.2006.321996"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>
&laquo; Previous Showing results 1 &mdash; 15 out of 417 results