Bo Bayles
University of Missouri-Rolla: Computer Engineering 313
RISC Processor Design Part 2

RISC Data Path

Introduction

The processor whose instruction set was described in Part 1 can be implemented by specifying how it data moves within it. One part of this definition is the description of the data path.

Instructions and Registers

In Part 1, the Instruction Set and Register definitions, binary encodings for the various instructions and CPU registers were defined. Table 1 below summarizes these encodings.
Binary Encoding Instruction Opcodes Register Name Logic Function Code Comp Function Code
0000 add $zero AND >
0001 addi $a0 OR <
0010 sub $a1 NAND >=
0011 logic $a2 NOR <
0100 comp $a3 XOR ==
0101 ujmp $v0 XNOR !=
0110 cjmp $v1 - -
0111 lw $mar - -
1000 sw $mdr - -
1001 rol $t0 - -
1010 call $t1 - -
1011 ret $t2 - -
1100 halt $t3 - -
1101 - $at - -
1110 - $cond - -
1111 - $ra - -

Table 1 - Instruction, Register, and Function Encoding

Data Path Components

The RISC CPU is composed of various hardware devices with input and ports. The input ports on each device are fed either from the Control Unit (italics signals come from the Control Unit) or the output ports on other devices. The devices that make up the processor include:

Program Counter (PC)

Program Counter A 16-bit register that holds the location of the next instruction to be executed by the CPU.
Inputs: pc_d_in[15:0], pc_clk
Outputs: pc_d_out[15:0]

Instruction Memory (IM)

Instruction Memory 16-bit word-addressable RAM that holds the instructions to be executed by the CPU.
Inputs: im_a_in[15:0]
Outputs: im_d_out[15:0]

Register File (RF)

Register File A register file composed of 16 16-bit registers.
Inputs: rf_rs[3:0], rf_rt[3:0], rf_rd[3:0], rf_d_in[15:0], rf_we, rf_clk
Outputs: rf_ra[15:0], rf_rb[15:0]

Main ALU (ALU)

Main ALU A multi-function Arithmetic Logic Unit that operates on 16-bit data.
Inputs: alu_op1[15:0], alu_op2[15:0], alu_ctrl[3:0]
Outputs: alu_d[15:0], alu_zero, alu_carry

Data Memory (DM)

Data Memory 16-bit word-addressable RAM that stores data memory.
Inputs: dm_a_in[15:0], dm_d_in[15:0], dm_we, dm_clk
Outputs: dm_d_out[15:0]

Program Counter ALU (PCALU)

PC ALU A single-function Arithmetic Logic Unit that operates on 16-bit data, and controls the content of the Program Counter.
Inputs: pcAlu_current[15:0], pcAlu_inc[15:0]
Outputs:: pcAlu_d_out[15:0]

Sign Extender (SE)

Sign Extender A sign-extending unit that extends 8-bit signed data to 16-bit signed data.
Inputs: se_d_in[7:0]
Outputs: se_d_out[15:0]

Control Unit (CU)

A finite state machine that issues signals that controls the data path's gate multiplexers, the read and write enable signals, and selects the ALU operation. The Control Unit will be defined further in Part 4.

Register Transfer Data Path Description

Each instruction in the instruction set can be defined as a series of data transfers along the data path. For example, the add instruction can be defined as the following set of sequences: A similar trace through the transfers was done for each instruction. These transfers and control signal assertions are detailed in Table 2. The rows of Table 2 correspond to each data path component's input signals. The columns correspond to each instruction. If there is more than source signal in a row, a multiplexer is needed to gate the correct signal.
Input/ Instruction 0000 0001 0010 0011 0100 0101 0110 0111
add addi sub logic comp ujmp cjmp lw
im_a pc_q pc_q pc_q pc_q pc_q pc_q pc_q pc_q
rf_rs im_q[11:8] im_q[3:0] im_q[11:8] im_q[3:0] im_q[11:8] * 0Eh im_q[11:8]
rf_rt im_q[7:4] * im_q[7:4] im_q[7:4] im_q[7:4] * 0Eh *
rd_rd im_q[3:0] im_q[3:0] im_q[3:0] im_q[3:0] 0Eh * 0Eh im_q[3:0]
rf_d alu_q alu_q alu_q alu_q alu_q * alu_q dm_q
se_d * im_q[11:4] * * * im_q[11:4] im_q[11:4] *
alu_op1 rf_ra rf_ra rf_ra rf_ra rf_ra * rf_ra *
alu_op2 rf_rb se_q rf_rb rf_rb rf_rb * rf_rb *
dm_a * * * * * * * rf_ra
dm_d * * * * * * * *
pcAlu_inc 01h 01h 01h 01h 01h se_q se_q/01h 01h
pcAlu_current pc_q pc_q pc_q pc_q pc_q pc_q pc_q pc_q
pc_d pcAlu_q pcAlu_q pcAlu_q pcAlu_q pcAlu_q pcAlu_q pcAlu_q pcAlu_q
Input/ Instruction 1000 1001 1010 1011 1100 1101 1110 1111
sw rol call ret halt - - -
im_a pc_q pc_q pc_q pc_q pc_q * * *
rf_rs im_q[11:8] im_q[3:0] * 0Fh * * * *
rf_rt im_q[7:4] * * * * * * *
rd_rd * im_q[3:0] 0Fh * * * * *
rf_d * alu_q pc_q * * * * *
se_d * * * * * * * *
alu_op1 * rf_ra * * * * * *
alu_op2 * * * * * * * *
dm_a rf_ra * * * * * * *
dm_d rf_rb * * * * * * *
pcAlu_inc 01h 01h * 01h 00h * * *
pcAlu_current pc_q pc_q * rf_ra pc_q * * *
pc_d_in pcAlu_q pcAlu_q im_q[11:0] pcAlu_q pcAlu_q * * *

Table 2: Data path definitions (register-transfer level)

Data Path Illustration

Data Path

Figure 1: Low-resolution data path schematic

Figure 1 shows a very rough sketch of how the inside of the processor might be laid out - it shows connections between the components, and signals going to and from the Control Unit, which will be specified in Part 4. A larger, more detailed schematic is shown below. The connections between components and the placement of multiplexers shown as gates to the input ports of the components come from Table 2.

Discussion of Tradeoffs

Cost, speed, and complexity are closely related in designing a processor. In order to reduce the complexity of the data path, some modifications were made to the instruction set specified in Part 1.

By simplifying the instruction formats, fewer multiplexers will be need to used on the data path to control input sources. This saves on both speed and cost. Simplifying rol also reduces complexity at the expense of programming simplicity. The multiple-shift functionality can still be implemented by using a single shift in a loop.

A single-cycle design was chosen over a multi-cycle design to reduce complexity and cost. In this case, simplification leads to lower cost (fewer components) at the expense of processor speed - a pipelined design would execute instructions at a faster rate. Simplification decreases engineering complexity significantly, however - designing he forthcoming Control Unit will be much simpler, and coordinating clock-timing will not be a problem since the instructions should execute in a single cycle.

To reduce cost, the main ALU performs the function of a comparator in addition to the regular arithmetic and logic operations - the ALU and comparator are shared. This makes the ALU a bit more complex, but it simplifies the number of destinations for instruction arguments. The ALU and PC ALU are separate components, however, since this reduces the number of multiplexers needed to gate ALU operand signals, and because it allows for the execution of instructions in the same clock cycle as the incrementing of the program counter.

In a multi-cycle data path, the selection of edge-triggered registers over latching registers would be important. However, for the single-cycle design, the decision does not affect very much since register will not need to be read from and written to in the same clock cycle. Edge-triggered registers were chosen for this processor's data path in case later revisions of the design are pipelined.

Conclusion

Defining the data path is an important step toward implementation of the processor. Once the control unit is defined, the transfer routines described above can be programmed in VHDL. This VHDL code will be simulated before translating it into an FPGA circuit.
Data Path

Figure 2: Data path block diagram based on Table 2