“I want to take responsibility for my work to the very end” – Fio Piccolo
Better late than never? This blog post describes a project we worked on at the end of the fall semester 2021, which was a little over 5 months (edit: 1 year 10 months) ago. In the interest of documentation and demonstrating what we accomplished, we decided to write this blog post.
Overview
About EE126
Tufts’ EE126 course is a lab based computer architecture and RTL development deep-dive. The coursework is structured around a single long term lab project—the development of a LEGv8 (an ARMv8-based ISA) processor. First, we learned about and built the individual components of the CPU. Next, a single cycle design was developed and tested. Finally, a pipelined design was realized, incorporating hazard detection and forwarding capabilities. Throughout the process, ModelSim (and GHDL for the adventurous) were used.
The three of us took the class in the Fall of 2021, and by the end we were hungry for more. Unfortunately however, there wasn’t enough time at the end of the semester for the class to have final projects. So we decided to take it into our own hands and do a final project just between us. Over the course of 12 hours on a Sunday, we put together this demo and had a lot of fun doing so.
About the Project
The final lab of EE126 left us with a functional LEGv8 CPU, but no way to run any useful programs with it. Test programs consisted of data and instruction memory entities which were hard coded.
The idea for the project was to take the processor we had made, synthesize it to an FPGA, write an assembler, and run a basic program on it. This included several steps, from adding new components and testing them, to debugging overall problems that come from synthesizing code which had only been simulated previously, to writing an assembler specifically targeting the processor that would output in the format necessary for execution on our design.
Toolchain and Hardware
We used Vivado as our toolchain and a Nexys A7 board as it was what we had available and what we were familiar with. The open source FPGA toolchain (Yosys, GHDL, etc.) was considered but we decided not to use it due to a lack of time in finding an adequate board. The Nexys A7 had a lot of features that were desirable to us with this project, primarily it’s built-in buttons, LEDs, switches, and seven segment displays. This allowed us to provide our processor with a lot of functionality out of the box without needing to breadboard too much.
Architecture
Processor Implementation
The LEGv8 instruction set was the basis for our design. In previous labs, we implemented most of the instructions, including multiple opcodes within each instruction format. Given the simplicity of the architecture and the coherent setup of opcodes (the opcode bits almost directly correspond to a number of control unit outputs), adding more capabilities was simple. Several more LEG instructions were implemented, most notably, bit shifting.
A simplified diagram of Fio’s CPU is shown below:
Notable features:
- Fio has a five stage pipeline with full forwarding and hazard detection logic. The forwarding logic allows instruction “results” to be passed back to the register file without the need for stalling. The
Hazard Detection Unit
(HDU) detects whether stalling the CPU is necessary to allow data to propagate. - Aside from the HDU, CPU Control is accomplished using several components. First, the
Control Unit
detects the opcode and instruction format and generates the appropriate control logic signals. The ALU also has an intermediary opcode translation component calledALU Control
.
Memory Map
All of the Nexys’ switches, buttons, LEDs, and displays were mapped into the data memory (DMEM). For instance, to write a 0x4
to the display would be as simple as writing writing b0100
to a specific memory address. The processor would have to handle converting to decimal internally.
Addresses | Function | Description |
---|---|---|
x000 - x0FF |
RAM | Byte addr, 64 bit words |
x100 - x10F |
Switches | 1 bit word, 63 bit padding |
x110 - x11F |
LEDs | 1 bit word, 63 bit padding |
x120 - x127 |
Seven seg displays | 4 bit word, 60 bit padding |
x130 - x134 |
Buttons | 1 bit word, 63 bit padding |
The instruction memory (IMEM) was a simple 2d array containing the machine code to execute, and was loaded into the processor before synthesis.
The CPU and memory units were connected as shown here.
Assembler
As no processor is complete without a toolchain, we also designed and built a simple assembler. This meant the design process for programs would start with ponyo, to ensure the program worked at a baseline, before moving on to being run through the assembler, and finally to being synthesized onto the board for a real hardware test. Charles primarily focused on building the assembler, as he was most familiar with that area and already had a lot of good ideas for how to complete it.
Demo
Here is a demo of our processor calculating the fibonacci sequence, incrementing at the press of a button. It’s running at a low clock speed on purpose to show it changing the numbers on the seven segment displays.
ADDI X16, XZR, 43 // Number of fib iterations
ADDI X5, XZR, 0 // Starting fib number (1)
ADDI X6, XZR, 1 // Starting fib number (2)
ADDI X23, XZR, 304 // Address of button to use
ADDI X24, XZR, 0
fib:
ADD X7, X6, X5 // Calc next fib number
ADDI X5, X6, 0
ADDI X6, X7, 0
SUBI X16, X16, 1 // Decrement fib iterator
ADDI X0, X5, 0
print_start:
ADDI X9, X9, 8 // Total number of display digits
ADDI X10, X22, 0 // Reset display digit pointer to 0
print_x:
STUR X0, [X10, 0] // Store digit into display
SUBI X9, X9, 1
ADDI X10, X10, 1 // Increment display digit pointer
LSR X0, X0, 4 // Get next digit to display
CBNZ X9, print_x // Break when displayed all 8 digits
btn_wait: // Button debouncing logic
LDUR X25, [X23, 1]
CBNZ X25, 3
ADDI X24, X25, 0
B btn_wait
CBZ X24, 3
ADDI X24, X25, 0
B btn_wait
CBNZ X16, fib // Stop if iteration count reached
B 0 // Stop
Future Work
In the weeks after the completion of Fio, some work was started on further expanding the CPU’s capabilities. Adding decimal multiplication functionality and expanded memory operations to the CPU were the top priorities. However, these new features were never completed. Given the modularity of the architecture, such features would be relatively simple to implement in the future.
Conclusion
We wanted to go above and beyond, to push our limits and knowledge, and so it felt really good to have the project we had been working on the entire semester be realized. For those interested, the source code is available here, licensed under the MIT license.