Fio: A LEG Ahead

Margret Riegert, Zev Pogrebin, Charles DePalma20 Sep 2023

blog@eowyn.net - views

Overview

Toolchain and Hardware

Architecture

Demo

Future Work

Conclusion

“I want to take responsibility for my work to the very end” – Fio Piccolo

Better late than never? This blog post describes a project we worked on at the end of the fall semester 2021, which was a little over 5 months (edit: 1 year 10 months) ago. In the interest of documentation and demonstrating what we accomplished, we decided to write this blog post.

Overview

About EE126

Tufts’ EE126 course is a lab based computer architecture and RTL development deep-dive. The coursework is structured around a single long term lab project—the development of a LEGv8 (an ARMv8-based ISA) processor. First, we learned about and built the individual components of the CPU. Next, a single cycle design was developed and tested. Finally, a pipelined design was realized, incorporating hazard detection and forwarding capabilities. Throughout the process, ModelSim (and GHDL for the adventurous) were used.

The three of us took the class in the Fall of 2021, and by the end we were hungry for more. Unfortunately however, there wasn’t enough time at the end of the semester for the class to have final projects. So we decided to take it into our own hands and do a final project just between us. Over the course of 12 hours on a Sunday, we put together this demo and had a lot of fun doing so.

About the Project

The final lab of EE126 left us with a functional LEGv8 CPU, but no way to run any useful programs with it. Test programs consisted of data and instruction memory entities which were hard coded.

The idea for the project was to take the processor we had made, synthesize it to an FPGA, write an assembler, and run a basic program on it. This included several steps, from adding new components and testing them, to debugging overall problems that come from synthesizing code which had only been simulated previously, to writing an assembler specifically targeting the processor that would output in the format necessary for execution on our design.

Toolchain and Hardware

We used Vivado as our toolchain and a Nexys A7 board as it was what we had available and what we were familiar with. The open source FPGA toolchain (Yosys, GHDL, etc.) was considered but we decided not to use it due to a lack of time in finding an adequate board. The Nexys A7 had a lot of features that were desirable to us with this project, primarily it’s built-in buttons, LEDs, switches, and seven segment displays. This allowed us to provide our processor with a lot of functionality out of the box without needing to breadboard too much.

Architecture

Processor Implementation

The LEGv8 instruction set was the basis for our design. In previous labs, we implemented most of the instructions, including multiple opcodes within each instruction format. Given the simplicity of the architecture and the coherent setup of opcodes (the opcode bits almost directly correspond to a number of control unit outputs), adding more capabilities was simple. Several more LEG instructions were implemented, most notably, bit shifting.

A simplified diagram of Fio’s CPU is shown below:

ALT HERE

Notable features:

Fio has a five stage pipeline with full forwarding and hazard detection logic. The forwarding logic allows instruction “results” to be passed back to the register file without the need for stalling. The Hazard Detection Unit (HDU) detects whether stalling the CPU is necessary to allow data to propagate.
Aside from the HDU, CPU Control is accomplished using several components. First, the Control Unit detects the opcode and instruction format and generates the appropriate control logic signals. The ALU also has an intermediary opcode translation component called ALU Control.

Memory Map

All of the Nexys’ switches, buttons, LEDs, and displays were mapped into the data memory (DMEM). For instance, to write a 0x4 to the display would be as simple as writing writing b0100 to a specific memory address. The processor would have to handle converting to decimal internally.

Addresses	Function	Description
`x000` - `x0FF`	RAM	Byte addr, 64 bit words
`x100` - `x10F`	Switches	1 bit word, 63 bit padding
`x110` - `x11F`	LEDs	1 bit word, 63 bit padding
`x120` - `x127`	Seven seg displays	4 bit word, 60 bit padding
`x130` - `x134`	Buttons	1 bit word, 63 bit padding

The instruction memory (IMEM) was a simple 2d array containing the machine code to execute, and was loaded into the processor before synthesis.

The CPU and memory units were connected as shown here.

ALT HERE

Assembler

As no processor is complete without a toolchain, we also designed and built a simple assembler. This meant the design process for programs would start with ponyo, to ensure the program worked at a baseline, before moving on to being run through the assembler, and finally to being synthesized onto the board for a real hardware test. Charles primarily focused on building the assembler, as he was most familiar with that area and already had a lot of good ideas for how to complete it.

Demo

Here is a demo of our processor calculating the fibonacci sequence, incrementing at the press of a button. It’s running at a low clock speed on purpose to show it changing the numbers on the seven segment displays.

    ADDI X16, XZR, 43 // Number of fib iterations
    ADDI X5, XZR, 0   // Starting fib number (1)
    ADDI X6, XZR, 1   // Starting fib number (2)
    ADDI X23, XZR, 304 // Address of button to use
    ADDI X24, XZR, 0
fib:
    ADD  X7, X6, X5   // Calc next fib number
    ADDI X5, X6, 0
    ADDI X6, X7, 0
    SUBI X16, X16, 1  // Decrement fib iterator
    ADDI X0, X5, 0

print_start:
    ADDI X9,  X9,  8  // Total number of display digits
    ADDI X10, X22, 0  // Reset display digit pointer to 0
print_x:
    STUR X0, [X10, 0] // Store digit into display
    SUBI X9,  X9,  1
    ADDI X10, X10, 1  // Increment display digit pointer
    LSR  X0,  X0,  4  // Get next digit to display
    CBNZ X9, print_x  // Break when displayed all 8 digits

btn_wait:             // Button debouncing logic
    LDUR X25, [X23, 1]
    CBNZ X25, 3
    ADDI X24, X25, 0
    B btn_wait
    CBZ  X24, 3
    ADDI X24, X25, 0
    B btn_wait

    CBNZ X16, fib     // Stop if iteration count reached
    B 0               // Stop

Future Work

In the weeks after the completion of Fio, some work was started on further expanding the CPU’s capabilities. Adding decimal multiplication functionality and expanded memory operations to the CPU were the top priorities. However, these new features were never completed. Given the modularity of the architecture, such features would be relatively simple to implement in the future.

Conclusion

We wanted to go above and beyond, to push our limits and knowledge, and so it felt really good to have the project we had been working on the entire semester be realized. For those interested, the source code is available here, licensed under the MIT license.

nobody's blog

about

projects

bookshelf

Fio: A LEG Ahead

Margret Riegert, Zev Pogrebin, Charles DePalma20 Sep 2023

blog@eowyn.net - views

Overview

About EE126

About the Project

Toolchain and Hardware

Architecture

Processor Implementation

Memory Map

Assembler

Demo

Future Work

Conclusion