Cover image licensed under the Creative Commons Attribution-Share Alike 4.0 International and is copyright Thomas Nguyen.
I’ve been curious recently about diving deeper into how a CPU works, and possibly trying to design my own. I began by wondering how a 4-bit CPU would work, and in doing some research came across the Intel 4004. It’s almost 50 years old at this point, and at first I thought it’d be easy to figure out how it worked. I quickly realized this is not the case.
It turns out around the 35th anniversary of the chip, Intel released the schematics and manuals for the chip under a non-commercial license. From this, a small but fervent community sprouted, from Reece working to build the processor out of discrete transistors on their very thorough and knowledgable blog, to the Arduino-compatible Retroshield 4004 which allows one to use a real 4004 processor in a more modern system, to the unofficial anniversary project. Before researching, I had assumed that everyone had forgotten about this processor. How wrong was I!
As a first step, I decided to try writing an assembler for this machine. There is an assembler for the 4004 here that I used to verify my results. Maybe in the future I’ll get around to disassembling the CPU itself further, as my original plan was to attempt recreating it in VHDL and putting it on a board like the Upduino. I also decided to use Python as it allowed me to work quickly without getting bogged down by pain points surrounding strings in languages like C. For the most part, this was written in a day back in December 2020 (three months isn’t too late for a blog post, right?).
There were a few different types of instructions that I needed to write for:
- 1 byte instructions that don’t take any parameters (NOP, CLB, etc.)
- 1 byte instructions that take a register, register pair, or data
- 2 byte instructions that take registers, addresses, and/or data
I started writing the assembler by first writing a dictionary at the beginning of the program with all the instructions with their op codes. It was then easy to setup a commandline interface, read a file, and split said file into a list that could be parsed. If an instruction in a line wasn’t in the dictionary, the program would error out. If an instruction was in the dictionary, the corresponding opcode would be sent to an output file. So far, all 1 byte instructions that didn’t take parameters were done.
The next step was to add comments. This was done by searching each line for a semicolon (;
) and throwing out everything after it. This would mean comments could either be their own line or at the end of a line.
Instructions that took arguments (1 byte and 2 byte) were next. Each line was split into spaces, and checked to see if each of the parts of the line matched up with what was expected, converting hex and decimal numbers into their respective binary equivalents, and as well translating the possible register values to their own binary equivalents.
Symbols were implemented by adding a second pass before the translation pass, getting the names and values for all the symbols. Then in the translation pass, these values were substituted in. One of the limitations I ran into though was that the symbols had to be placed on their own line, otherwise the assembler would try to parse them as instructions.
Taking inspiration from other assemblers, I added *=pos
and .byte X
pseudo-instructions, where *=pos
would jump to position pos
in the rom, and .byte X
would insert X
bytes of NOP
.
At this point I started trying to add more CLI functionality, such as a printout that looks something like this, with a list of all the symbols and their values, alongside the code, location in the rom, and the resulting hexadecimal output.
$ # example file from reference manual
$ cat src/16_dig_add.asm
; 16-DIGIT DECIMAL ADDITION ROUTINE
; TAKEN FROM MCS-4 MANUAL
ADDITN
FIM r0, 0 ; IR(0-1) = 0
FIM p2, 48 ; IR(4)=3, IR(5)=0
LDM 0 ; LOAD 0 TO AC
XCH 6 ; EXCHANGE C(AC) and IR(6)
CLC ; CLEAR CARRY REG
ADI
SRC p2 ; DEFINE RAM ADDRESS $<1
RDM ; READ RAM TO AC
SRC p0 ; DEFINE RAM ADDRESS
ADM ; ADD C(RAM) TO AC, CARRY ENABLED
DAA ; DECIMAL ADDRESS ACC
WRM ; WRITE AC TO RAM
INC 1 ; INCREMENT IR(1)
INC 5 ; INCREMENT IR(5)
ISZ 6, ADI ; IR(6)=IR(6)+1, SKIP IF C(IR(6))=0
OVERFL
JCN CN, XXX ; TEST CARRY, JUMP IF
JUN NEXT ; SEE NOTE $<2
XXX
LDM 0 ; LOAD AC WITH 0
XCH 10 ; EXCHANGE IR(10) AND AC
OVFL1
FIM p1, 216 ; IR(1)=8, IR(2)=13 [x]
JMS PRINT
ISZ 10, OVFL1 ;IR(10)=IR(10)+1, SKIP IF IR(10)=0
FIM p2, 0 ; SET IR(4-5)=0
JMS CLRRAM ; CLEAR RAM DATA
JUN NEXT ; SEE NOTE $<2
CN=5
; DUMMY ARGUMENTS
CLRRAM=0xC0
NEXT=0x80
PRINT=0xE8
; $<1 RAM ADDRESSING DEFINE AS TO STANDARDS IN
; SPEC SHEET.
; BITS NUMBERED FROM LEFT TO RIGHT MSB TO LSB
; 0 1 2 3 4 5 6 7
; BITS 0-1 SELECT RAM CHIP 1 OF 4
; BITS 2-3 SELECT RAM REGISTER 1 OF 4
; BITS 4-7 SELECT REGISTER CHARACTER 1 OF 16
;
; $<2 NEXT, PRINT, AND CLRRAM ARE ADDRESS TAGS USED FOR
; ASSEMBLY
; NEXT CAN BE THE RETURN POINT OF THIS ROUTINE
; CLRRAM AND PRINT ARE ROUTINES CALLED BY THIS PROGRAM
$ # running asm-4004 with said file, printing out the resulting code
$ ./asm-4004 -i src/16_dig_add.asm -v -m
Input file: src/16_dig_add.asm
Output file: src/16_dig_add.bin
Symbols:
additn 0x0
adi 0x7
overfl 0x11
xxx 0x15
ovfl1 0x17
cn 0x5
clrram 0xc0
next 0x80
print 0xe8
additn
$000 20 00 fim r0, 0
$002 24 30 fim p2, 48
$004 d0 ldm 0
$005 b6 xch 6
$006 f1 clc
adi
$007 25 src p2
$008 e9 rdm
$009 21 src p0
$00A eb adm
$00B fb daa
$00C e0 wrm
$00D 61 inc 1
$00E 65 inc 5
$00F 76 07 isz 6, adi
overfl
$011 15 15 jcn cn, xxx
$013 40 80 jun next
xxx
$015 d0 ldm 0
$016 ba xch 10
ovfl1
$017 22 d8 fim p1, 216
$019 50 e8 jms print
$01B 7a 17 isz 10, ovfl1
$01D 24 00 fim p2, 0
$01F 50 c0 jms clrram
$021 40 80 jun next
cn=5
clrram=0xc0
next=0x80
print=0xe8
20002430d0b6f125e921ebfbe06165760715154080d0ba22d850e87a17240050c04080
The terminal output includes color to separate the different elements to make it easier to read. I also added the -m
option to print out the direct machine code in hex alongside saving it to a file, for testing purposes.
There was a lot of testing to make sure I worked out all of the possible bugs with the assembler, checking against other assemblers I found online. I did not, however, get the chance to test my code in a simulator unfortunately. I also added a warning to tell the user once they’ve used up all their memory for one ROM. I did toy around with the idea of a pseudo-instruction that would allow connecting multiple ROMs together so they could share symbols, but that proved to be out of the scope for this small project. Who needs more than 256 bytes?
As I keep messing around with the Intel 4004, I think going through this process of writing this assembler will of proved to be a useful endeavor. I’ve definitely learned a lot about how the processor works at a higher level, and have been itching to write my own ISA for fun since. The source code for the assembler is available on GitHub here and is MIT licensed.