Architecture and Instruction Set of the C6x Processor [chapter]

Digital Signal Processing and Applications with the TMS320C6713 and TMS320C6416 DSK  
Hardback); 0-471-22112-0 (Electronic) 62 Architecture and Instruction Set of the C6x Processor 11.72 mV.With an 8-bit ADC, 2 8 or 256 different levels can represent the input signal. An ADC with a larger word length such as a 16-bit ADC (currently very common) can reduce the quantization error, yielding a higher resolution. The more bits that an ADC has, the better it can represent an input signal. The TMS320C30 floating-point processor was introduced in the late 1980s. The C31, C32, and the
more » ... 31, C32, and the more recent C33 are all members of the C3x family of floatingpoint processors [2, 3] . The C4x floating-point processors, introduced subsequently, are code-compatible with the C3x processors and are based on the modified Harvard architecture [4] . The TMS320C6201 (C62x), announced in 1997, is the first member of the C6x family of fixed-point digital signal processors. Unlike the previous fixed-point processors, C1x, C2x, and C5x, the C62x is based on a very-long-instruction-word (VLIW) architecture, still using separate memory spaces for instructions and data as with the Harvard architecture. The VLIW architecture has simpler instructions, but more are needed for a task than with a conventional DSP architecture. The C62x is not code-compatible with the previous generation of fixed-point processors. Subsequently, the TMS320C6701 (C67x) floating-point processor was introduced as another member of the C6x family of processors. The instruction set of the C62x fixed-point processor is a subset of the instruction set of the C67x processor. Appendix A contains a list of instructions available on the C6x processors. A recent addition to the family of the C6x processors is the fixed-point C64x. An application-specific integrated circuit (ASIC) has a DSP core with customized circuitry for a specific application. A C6x processor can be used as a standard general-purpose digital signal processor programmed for a specific application. Specific-purpose digital signal processors are the modem, echo canceler, and others. A fixed-point processor is better for devices that use batteries, such as cellular phones, since it uses less power than does an equivalent floating-point processor. The fixed-point processors, C1x, C2x, and C5x are 16-bit processors with limited dynamic range and precision. The C6x fixed-point processor is a 32-bit processor with improved dynamic range and precision. In a fixed-point processor, it is necessary to scale the data. Overflow, which occurs when an operation such as the addition of two numbers produces a result with more bits than can fit within a processor's register, becomes a concern. A floating-point processor is generally more expensive since it has more "real estate" or is a larger chip because of additional circuitry necessary to handle integer as well as floating-point arithmetic. Several factors, such as cost, power consumption, and speed, come into play when choosing a specific digital signal processor. The C6x processors are particularly useful for applications requiring intensive computations. Family members of the C6x include both fixed-point (e.g., C62x, C64x) and floating-point processors (e.g., C67x). Other digital signal processors are also available, from companies such as Motorola and Analog Devices [5] . Other architectures include the Super Scalar, which requires special hardware to determine which instructions are executed in parallel. The burden is then on the TMS320C6x Architecture 63 processor more than on the programmer as in the VLIW architecture. It does not execute necessarily the same group of instructions, and as a result, it is difficult to time. Thus, it is rarely used in DSP. TMS320C6x ARCHITECTURE The TMS320C6711 onboard the DSK is a floating-point processor based on the VLIW architecture [6-9]. Internal memory includes a two-level cache architecture with 4 kB of level 1 program cache (L1P), 4 kB of level 1 data cache (L1D), and 64 kB of RAM or level 2 cache for data/program allocation (L2). It has a glueless (direct) interface to both synchronous memories (SDRAM and SBSRAM) and asynchronous memories (SRAM and EPROM). Synchronous memory requires clocking but provides a compromise between static SRAM and dynamic SDRAM, with SRAM being faster but more expensive than DRAM. On-chip peripherals include two multichannel buffered serial ports (McBSPs), two timers, a 16-bit host port interface (HPI), and a 32-bit external memory interface (EMIF). It requires 3.3 V for I/O and 1.8 V for the core (internal). Internal buses include a 32-bit program address bus, a 256-bit program data bus to accommodate eight 32-bit instructions, two 32-bit data address buses, two 64-bit data buses, and two 64-bit store data buses. With a 32-bit address bus, the total memory space is 2 32 = 4 GB, including four external memory spaces: CE0, CE1, CE2, and CE3. Figure 3 .1 shows a functional block diagram of the C6711 processor included with CCS. Independent memory banks on the C6x allow for two memory accesses within one instruction cycle. Two independent memory banks can be accessed using two FIGURE 3.1. Functional block diagram of TMS320C6x (Courtesy of Texas Instruments). FIGURE 3.2. Internal memory block diagram (Courtesy of Texas Instruments). Functional Units 65 lates (MACs) per second. With six of the eight functional units in Figure 3.1 (not the .D units described below) capable of handling floating-point operations, it is possible to perform 900 million floating-point operations per second (MFLOPS). Operating at 150 MHz, this translates to 1200 million instructions per second (MIPS) with a 6.67-ns instruction cycle time. FUNCTIONAL UNITS The CPU consists of eight independent functional units divided into two data paths A and B, as shown in Figure 3 .1. Each path has a unit for multiply operations (.M), for logical and arithmetic operations (.L), for branch, bit manipulation, and arithmetic operations (.S), and for loading/storing and arithmetic operations (.D). The .S and .L units are for arithmetic, logical, and branch instructions. All data transfers make use of the .D units. The arithmetic operations, such as subtract or add (SUB or ADD), can be performed by all the units except the .M units (one from each data path). The eight functional units consist of four floating/fixed-point ALUs (two .L and two .S), two fixed-point ALUs (.D units), and two floating/fixed-point multipliers (.M units). Each functional unit can read directly from or write directly to the register file
doi:10.1002/9780470238141.ch3 fatcat:wmsx2doaujaifk7chkh33shmma