Direct Mapping of Low-Latency Asynchronous Controllers From STGs
Danil Sokolov, Alexander Bystrov, Alex Yakovlev
2007
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A method for automated synthesis of low latency asynchronous controllers using direct mapping is presented. The idea of direct mapping is that a graph specification of a system is translated into a circuit netlist by mapping the graph nodes into circuit elements and the graph arcs into circuit interconnects. The key feature of this approach is its low algorithmic complexity and direct correspondence between the elements of the initial specification and the components of the resultant circuit.
more »
... out method the synthesis starts from an initial specification in form of a Signal Transition Graph (STG). This STG is split into a device and an environment, which synchronise via a communication net that models wires. The device is represented as a tracker and a bouncer. The tracker follows the state of the environment and provides reference points to the device outputs. The bouncer interfaces to the environment and generates output events in response to the input events according to the state of the tracker. This two-level architecture provides an efficient interface to the environment and is convenient for subsequent mapping into a circuit netlist. A set of optimisation heuristics are developed to reduce the latency and size of the control circuit. As a result of this work, a software tool called OptiMist has been developed. Its low algorithmic complexity allows large specifications to be synthesised, which is not possible in acceptable time for the tools based on state-space exploration. OptiMist successfully interfaces conventional EDA design flow for simulation, timing analysis and place-and-route. Introduction Two main approaches to design of asynchronous controllers are logic synthesis [7] and direct mapping [15, 18] . Logic synthesis works with the low-level system specifications which capture the behaviour of the system at the level of signal transitions. In this approach boolean equations for the output signals of the circuit are derived using the next state functions [5] . In order to find the next state functions all possible orders of the events must be explored. Such an exploration may result in a state space which exponentially large w.r.t. the initial specification. The circuit optimisation often involves analysis and recalculation of the whole state space. The logic synthesis approach is now well developed and supported by public tools (Petrify [7], Minimalist [12], 3D [4]). However, this approach suffers from excessive computation complexity and memory requirements, thus it cannot be applied to large specifications. There is no transparent correspondence between the elements of the original specification, the intermediate representation of the state space and the components of the resultant circuit, which complicates the checking of circuit functionality. The main idea of the direct mapping approach is that a graph specification of a system is translated into a circuit netlist in such a way that the graph nodes correspond to the circuit elements and graph arcs correspond to the interconnects. Direct mapping can typically be divided into three independent operations: translation, optimisation and mapping. Firstly, a system specification is translated into an intermediate graph representation convenient for subsequent mapping. Then, peephole optimisation is usually applied to the intermediate representation of a system. Finally, the optimised graph is mapped into a circuit netlist implementation. In a practical design flow, however, some operations can be merged together or not present at all, e.g. optimisation is often performed together with mapping and there are cases when the circuit implementation is obtained directly from the initial specification without converting it into an intermediate form. mapping of low-latency asynchronous controllers from STGs The key feature of the direct mapping approach is its low algorithmic complexity. The use of heuristic-based local optimisation (as opposed to state-space global optimisation in a logic synthesis approach) also facilitates the computational simplicity of the method. The transparent correspondence between the elements of the initial specification and the components of the resultant circuit is advantageous for checking the functional correctness of the implementation. Notwithstanding all advantages, this approach is insufficiently studied and existing techniques for direct mapping often produce large circuits with inefficient interface to the environment. The direct mapping approach originates from [16] , where a method of the one-relay-per-row realisation of an asynchronous sequential circuit is proposed. This approach is further developed in [36] where the idea of the 1-hot state assignment is described. The 1-hot state assignment is then used in the method of concurrent circuit synthesis presented in [15] . The underlying model in this method is an Augmented Finite State Machine (AFSM), which is an FSM with added facilities, including timing mechanisms for the delay of state changes. These circuits have inputs that are logic values (signal levels as opposed to signal transitions), which is advantageous for low-level interfacing. These circuits use a separate set-reset flip-flop for every local state, which is set to 1 during a transition into the state, and which in turn resets to 0 the flip-flops of all its predecessor's local states. The main disadvantages of this approach are the fundamental mode assumptions and the use of local state variables as outputs. The latter are convenient for implementing event flows but require an additional level of flip-flops if each of those events controls just one switching phase of an external signal (either from 0 to 1 or from 1 to 0). Another direct mapping method proposed in [28] works for the whole class of 1-safe Petri nets. However, it produces control circuits whose operation uses a 2-phase (no-return-to-zero) signalling protocol. This results in lower performance than what can be achieved in 4-phase circuits. The approach of [18] is based on distributors and also uses the 1-hot state assignment, though a different implementation of local states. In this method every place of a Petri net is associated with a David cell (DC) [9] . The circuit diagram of a single DC is shown in Figure 1 (a). The state of its output r denotes the marking of an associated Petri net place. DCs can be coupled using a 4-phase handshake protocol, so that the interface a1, r of the previous stage DC is connected to the interface a, r1 of the next stage as shown in Figure 1 (b). This DC structure corresponds to a Petri net shown in Figure 1 (c). The circuits built of DCs by this approach are speed independent [24] and do not need D.Sokolov, A.Bystrov, A.Yakovlev: Direct mapping of low-latency asynchronous controllers from STGs presents a method based on the idea of [3] and extends it by a set of optimisation algorithms and heuristics. In proposed method a system specification is, firstly, split into a device STG and an environment STG. These are synchronise via a communication net, which model wires. The device STG is considered separately. It consists of a tracker and a bouncer. The tracker follows the state of the environment and is used as a reference point by the device outputs. The bouncer interfaces the environment and generates output events in response to the input events according to the state of the tracker. This two-level device architecture provides an efficient interface to the environment and is convenient for subsequent mapping into a circuit netlist. These are implemented in a software tool called OptiMist. The speedindependent circuits obtained by this method have a two-level architecture, which contributes to a low-latency interface to the environment. The OptiMist tool exhibits the computation time growth linear to the specification size which allows to apply the method to large STGs. The rest of the paper is organised as follows. Firstly, Section 2 defines terminology and behavioural models which are used in the paper. Secondly, the our direct mapping method and its justification are presented in Section 3. The optimisation heuristics and algorithms implemented in OptiMist software tool are described in Section 6. The OptiMist design flow is considered on a simple example in Section 7. The method and the optimisation heuristics are evaluated in Section 8 using a set of benchmarks. Background This section provides an introduction to asynchronous circuits, their delay models, operation modes, classes and common signalling protocols. A behavioural Petri nets model which is widely used for specification, verification and synthesis of asynchronous circuits is also presented in this section. Asynchronous circuits A category of circuits containing no global clock is called asynchronous circuits [36] . These circuits may make use of timing assumptions both within the circuit and in its interaction with environment. Based on these assumptions the asynchronous circuits can be divided into several classes. This section overviews the asynchronous circuits using a classification presented in [19, 10]. Delay models An asynchronous circuit can be considered as an interconnection of two types of components, gates and delay elements, by means of wires. A gate computes a set of output variables (often a single output variable) as a discrete logical function of its input variables. A delay element produces a single output that is a delayed version of its input. Each wire connects an output of a single gate or delay element to inputs of one or more gates or delay elements. Primary inputs and outputs of a circuit can be considered as gates computing the identity function. There are two major models of a delay element: pure delay model and inertial delay model. A pure delay element transmits each signal event on its input to its output with some delay regardless the shape of the signal's waveform. On the contrary, an inertial delay element alters the shape of its input waveform by attenuating short pulses, i.e. it filters out pulses of a duration less than some threshold period. The delay elements are also characterised by their timing models. In a fixed delay model, a delay is assumed to have a fixed value. In a bounded delay model, a delay may have any value in a given timing interval. In an unbounded delay model, a delay may take an arbitrary finite value. Operation modes An interaction of a device circuit with its environment can be characterised by circuit operation mode. The device and its environment together form a close system. If the environment is allowed to respond to a device's outputs NCL-EECE-MSD-TR-2006-110, University of Newcastle upon Tyne the branches is negligible [2]. Asynchronous circuits with isochronic forks are called Quasi-Delay-Insensitive (QDI) circuits [20]. In contrast, in DI circuits, delays on the different fork branches are completely independent, and may vary considerably. Speed-Independent (SI) circuits are guaranteed to work correctly in input-output mode regardless of gate delays, assuming that wire delays are negligible. This means that whenever a signal changes its value all gates it is connected to will see that change immediately. SI circuits introduced in [24] only considered deterministic input and output behaviour. This class has been extended to include circuits with a limited form of non-determinism in [1]. Self-timed circuits, described in [33] , are built out of a group of elements. Each element may be an SI circuit, or a circuit whose correct operation relies on local timing assumptions. However, no timing assumptions are made on the communication between elements and the circuit operates in input/output mode. If both internal and external timing assumptions are used to optimise the designs, then such circuits are called timed [26]. Signalling protocols Asynchronous circuit signalling schemes are based on a protocol called handshake, involving requests, which are used to initiate an action, and corresponding acknowledgements, used to signal completion of that action. These control signals provide all of the necessary sequence controls for computational events in the system. For example, consider an interaction of two modules, a sender A and a receiver B. A request is sent from A to D.Sokolov, A.Bystrov, A.Yakovlev: Direct mapping of low-latency asynchronous controllers from STGs B indicating that A is requesting some action from B. When B completes the action, it acknowledges the request by sending an acknowledge signal from B to A. Most asynchronous signalling protocols require a strict alternation of request and acknowledge events. These ideas can be extended to interfaces shared by more than 2 subsystems. There are several ways of how the handshake events are encoded onto specific control wires. The most commonly used handshake protocols are the four-phase and two-phase. In four-phase protocol, also called return-to-zero, four signal transitions (two on the request and two on the acknowledgement) are required to complete a handshake. In two-phase protocol, also called non-return-to-zero, every request-acknowledgement pair of transitions indicates a new handshake. Behavioural models This section introduces the formal models used for the specification and verification of asynchronous circuits. First, the basic concept of Petri nets (PNs) model is presented. PNs extend the Finite State Machines (FSMs) model with a notion of concurrency, which makes them especially convenient for the specification and verification of asynchronous circuits. The formal definitions and notations in this section are based on the work introduced in [8, 25, 27, 30] . Petri nets A Petri nets model, first defined in [29] , is a graphical and mathematical representations of discrete distributed systems. Petri nets are used to describe and study concurrent, asynchronous, distributed, parallel and non-deterministic systems. As a graphical tool, PNs can be used as a visual communication aid similar to flow charts, block diagrams, and networks. In addition, tokens are used in these nets to simulate the dynamic and concurrent activities of systems. As a mathematical tool, it is allows to set up state equations, algebraic equations, and other mathematical models governing the behaviour of systems. A Petri Net (PN) is formally defined as a tuple P N = P, T, F, M 0 comprising finite disjoint sets of places P and transitions T , arcs denoting the flow relation F ⊆ (P × T ) ∪ (T × P ) and initial marking M 0 . There is an arc between x ∈ P ∪ T and y ∈ P ∪ T iff (x, y) ∈ F . An arc from a place to a transition is called consuming arc, and from a transition to a place -producing arc. The preset of a node x ∈ P ∪ T is defined as •x = {y | (y, x) ∈ F }, and the postset as x• = {y | (x, y) ∈ F }. It is assumed that •t = ∅ = t•, ∀ t ∈ T . The pre-preset of a node x ∈ P ∪ T is defined as • • x = y∈•x •y, and the post-postset as x • • = y∈x• y•. A place p such that |p•| > 1 is called choice place, i.e. it has more than one transition in its postset. A choice place p is called free choice if ∀t ∈ p• : |•t| = 1, i.e. each transition in its postset has exactly one preset place. A choice place p is called controlled choice if ∃t ∈ p• : |•t| > 1, i.e. there is at least one transition in its postset which has more than one preset place. Note that a controlled choice whose all postset transitions have the same preset places can be transformed into a free choice. A place p such that |•p| > 1 is called merge place. A transition t such that |t•| > 1 is called fork and a transition t such that |•t| > 1 is called join. The dynamic behaviour of a PN is defined as a token game, changing markings according to the enabling and firing rules its transitions. A marking is a mapping M : P → N denoting the number of tokens in each place, N = {0, 1} for 1-safe PNs. A transition t is enabled iff M (p) > 0, ∀ p ∈ •t. The evolution of a PN is possible by firing the enabled transitions. Firing of a transition t results in a new marking M such that ∀p ∈ P : M (p) = M (p) − 1 if p ∈ •t, M (p) + 1 if p ∈ t•, M (p) otherwise , i.e. for an enabled transition t one token is removed from each preset place and one token is produced to each postset place. A marking M is reachable from a marking M if there exists a firing sequence σ = t 0 . . . t n starting at marking M and finishing at M . A set of reachable markings from M is denoted by [M . A set of markings reachable from the initial marking M 0 is called a reachability set of a PN.
doi:10.1109/tcad.2006.884416
fatcat:ll37pame4vbcpc4unumcwerfme