| Binary Encoding | Instruction Opcodes | Register Name | Logic Function Code | Comp Function Code |
| 0000 | add | $zero | AND | > |
| 0001 | addi | $a0 | OR | < |
| 0010 | sub | $a1 | NAND | >= |
| 0011 | logic | $a2 | NOR | < |
| 0100 | comp | $a3 | XOR | == |
| 0101 | ujmp | $v0 | XNOR | != |
| 0110 | cjmp | $v1 | - | - |
| 0111 | lw | $mar | - | - |
| 1000 | sw | $mdr | - | - |
| 1001 | rol | $t0 | - | - |
| 1010 | call | $t1 | - | - |
| 1011 | ret | $t2 | - | - |
| 1100 | halt | $t3 | - | - |
| 1101 | - | $at | - | - |
| 1110 | - | $cond | - | - |
| 1111 | - | $ra | - | - |
Table 1 - Instruction, Register, and Function Encoding
A 16-bit register that holds the location of the next instruction to
be executed by the CPU.
16-bit word-addressable RAM that holds the instructions to be
executed by the CPU.
A register file composed of 16 16-bit registers.
A multi-function Arithmetic Logic Unit that operates on 16-bit
data.
16-bit word-addressable RAM that stores data memory.
A single-function Arithmetic Logic Unit that operates on 16-bit
data, and controls the content of the Program Counter.
A sign-extending unit that extends 8-bit signed data to 16-bit
signed data.| Input/ Instruction | 0000 | 0001 | 0010 | 0011 | 0100 | 0101 | 0110 | 0111 |
| add | addi | sub | logic | comp | ujmp | cjmp | lw | |
| im_a | pc_q | pc_q | pc_q | pc_q | pc_q | pc_q | pc_q | pc_q |
| rf_rs | im_q[11:8] | im_q[3:0] | im_q[11:8] | im_q[3:0] | im_q[11:8] | * | 0Eh | im_q[11:8] |
| rf_rt | im_q[7:4] | * | im_q[7:4] | im_q[7:4] | im_q[7:4] | * | 0Eh | * |
| rd_rd | im_q[3:0] | im_q[3:0] | im_q[3:0] | im_q[3:0] | 0Eh | * | 0Eh | im_q[3:0] |
| rf_d | alu_q | alu_q | alu_q | alu_q | alu_q | * | alu_q | dm_q |
| se_d | * | im_q[11:4] | * | * | * | im_q[11:4] | im_q[11:4] | * |
| alu_op1 | rf_ra | rf_ra | rf_ra | rf_ra | rf_ra | * | rf_ra | * |
| alu_op2 | rf_rb | se_q | rf_rb | rf_rb | rf_rb | * | rf_rb | * |
| dm_a | * | * | * | * | * | * | * | rf_ra |
| dm_d | * | * | * | * | * | * | * | * |
| pcAlu_inc | 01h | 01h | 01h | 01h | 01h | se_q | se_q/01h | 01h |
| pcAlu_current | pc_q | pc_q | pc_q | pc_q | pc_q | pc_q | pc_q | pc_q |
| pc_d | pcAlu_q | pcAlu_q | pcAlu_q | pcAlu_q | pcAlu_q | pcAlu_q | pcAlu_q | pcAlu_q |
| Input/ Instruction | 1000 | 1001 | 1010 | 1011 | 1100 | 1101 | 1110 | 1111 |
| sw | rol | call | ret | halt | - | - | - | |
| im_a | pc_q | pc_q | pc_q | pc_q | pc_q | * | * | * |
| rf_rs | im_q[11:8] | im_q[3:0] | * | 0Fh | * | * | * | * |
| rf_rt | im_q[7:4] | * | * | * | * | * | * | * |
| rd_rd | * | im_q[3:0] | 0Fh | * | * | * | * | * |
| rf_d | * | alu_q | pc_q | * | * | * | * | * |
| se_d | * | * | * | * | * | * | * | * |
| alu_op1 | * | rf_ra | * | * | * | * | * | * |
| alu_op2 | * | * | * | * | * | * | * | * |
| dm_a | rf_ra | * | * | * | * | * | * | * |
| dm_d | rf_rb | * | * | * | * | * | * | * |
| pcAlu_inc | 01h | 01h | * | 01h | 00h | * | * | * |
| pcAlu_current | pc_q | pc_q | * | rf_ra | pc_q | * | * | * |
| pc_d_in | pcAlu_q | pcAlu_q | im_q[11:0] | pcAlu_q | pcAlu_q | * | * | * |
Table 2: Data path definitions (register-transfer level)
Figure 1: Low-resolution data path schematic
Figure 1 shows a very rough sketch of how the inside of the processor might be laid out - it shows connections between the components, and signals going to and from the Control Unit, which will be specified in Part 4. A larger, more detailed schematic is shown below. The connections between components and the placement of multiplexers shown as gates to the input ports of the components come from Table 2.Cost, speed, and complexity are closely related in designing a processor. In order to reduce the complexity of the data path, some modifications were made to the instruction set specified in Part 1.
By simplifying the instruction formats, fewer multiplexers will be need to used on the data path to control input sources. This saves on both speed and cost. Simplifying rol also reduces complexity at the expense of programming simplicity. The multiple-shift functionality can still be implemented by using a single shift in a loop.
A single-cycle design was chosen over a multi-cycle design to reduce complexity and cost. In this case, simplification leads to lower cost (fewer components) at the expense of processor speed - a pipelined design would execute instructions at a faster rate. Simplification decreases engineering complexity significantly, however - designing he forthcoming Control Unit will be much simpler, and coordinating clock-timing will not be a problem since the instructions should execute in a single cycle.
To reduce cost, the main ALU performs the function of a comparator in addition to the regular arithmetic and logic operations - the ALU and comparator are shared. This makes the ALU a bit more complex, but it simplifies the number of destinations for instruction arguments. The ALU and PC ALU are separate components, however, since this reduces the number of multiplexers needed to gate ALU operand signals, and because it allows for the execution of instructions in the same clock cycle as the incrementing of the program counter.
In a multi-cycle data path, the selection of edge-triggered registers over latching registers would be important. However, for the single-cycle design, the decision does not affect very much since register will not need to be read from and written to in the same clock cycle. Edge-triggered registers were chosen for this processor's data path in case later revisions of the design are pipelined.
Figure 2: Data path block diagram based on Table 2