An assembler for LC-3
Purpose
This lab involves implementing a basic assembler for the LC-3 assembly language. My goal is to create a toy assembler that is capable of translating the assembler code into machine code.
Principles
Reading Instructions
The first step to achieve our goal is to find a way reading the instructions. In my case, I will read an instruction section by section based on spaces or some other punctuation marks.
This method allows me to break each instruction into different parts and interpret each instruction according to the meaning of each part. Another advantage of this is that I can quickly find similarities between different instructions and design some general functions to handle the translation task of different instructions.
The code for reading one part of an instruction is as follows.
1 | //Read one part of a instruction (between tow space characters) |
According to the requirements of the assembly code format of the experiment, a single instruction can be thus decomposed into multiple components. This achievement marks the first step in breaking down a line of instructions into different elements (such as labels, opcodes, operands, etc.) and subsequently performing specific operations based on the unique characteristics of each component.
The First Pass
The process of translate the assembly program consists of two pass. The aim of the first pass is to build a symbol table of those label appeared in program. To build the symbol table, I designed two data structures for the label and the table.
1 | //The data structure of label |
And then I can build the table while the first time the program traverses those instructions.
Since each label must appear once at the beginning of a line of code, I only need to read the first part of each code on the first traverse, and then determine that it is not part of the LC-3 instruction set, such a note must be a label, the last thing I need to do is to record its name and address.
To do this, I have to construct the LC-3 instruction set in memory, which is also quite simple. All I need is just a liner table which stores those 15 instructions.
There are few things need to be noticed. Because of the .STRINGZ
instruction and .BLKW
instruction may occupy more than one word in memory, we need to consider the change they make during the process of building the symbol table.
After doing all these things, I can implement the first pass.
1 | int index = 0; //The position of the instruction reading point |
The above code shows how to create a symbol table. And of course I considered the influence of .STRINGZ
and .BLKW
.
Translation of a Single Instruction
For the translation of a single instruction, the point is to classify the instructions, find the same or similar format instructions, and take roughly the same treatment for them.
In my case, I treat the ADD
instruction and the AND
instruction as the same class instruction, the memory access instructions (e.g. LD
, ST
, LDR
, STI
) as one type, and the pseudo (e.g. .ORIG
, .FILL
) instructions as one type.
When these instructions are classified, we can make corresponding operations according to each part of the instruction. For example, for the instructions ADD R0, R0, #5
, I can take the first part of it, identify its opcode is 0001
, and then read the second part of it, identify it to register R0
, and convert it to binary 000
. Then read the third part, convert to 000
, and finally read #5
, convert it to the five-digit immediate number 00101
.
Since the AND
directive has the same format as the ADD
directive, I can handle them in the same way. Only in the process of implementing these functions, I also need to implement some additional functions such as base conversion, condition judgment and so on.
The instructions to access memory basically involve all of the above procedures, and I will show this part of my program. The following code shows the way I deal with the memory access instructions.
1 | void Instrucion_LS(const char instruction[], char machine_code[], char Part[], int index){ |
I will explain some of the functions not mentioned in the above program in the next section.
Procedure
According to the textbook, we can translate the assembler program in a two pass process.
- Read the instructions from a
.asm
file. - First pass to create the symbol table.
- Sectond pass to translate every instruction.
- Stores the machine code as a
.text
file.
In the process of translating a single instruction, the biggest problem is to realize the conversion between data in different bases. Since machine code is a 16-bit binary string, and assembly code may contain decimal or hexadecimal numbers, my program needs to be able to correctly identify them and convert them to a binary string of corresponding bits. For example, for the code ADD R0, R7, xA
, my program needs to be able to recognize R0
, R7
, the two decimal numbers representing registers, and then convert them into the 3-bit binary string 000
and 111
, for xA
, My code needs to be able to recognize this immediate hexadecimal number and convert it to the 5-bit binary number 01010
.
So I designed the following function transform
, whose function is to recognize an incoming decimal or hexadecimal number, convert it into a binary string requiring an extended number, and return a corresponding decimal unsigned number.
1 | unsigned short transform(const char input[], char output[], int expend){ |
Using this function, I can implement the register, immediate number and PC-Offeset translation in assembly code.
For the linear table I mentioned in section 2.2, I designed the following data structure.
1 | typedef struct INSTRUCTION |
Now I just need to traverse through the table to determine what an instruction is and find its corresponding opcode.
Results
To test my program, I wrote a short assembler program including most of the characteristics of LC-3 instructions.
1 | .ORIG x3000 |
Here is my program’s translation of above assembler program.
1 | 0011000000000000 .ORIG x3000 |
Load the machine code into the LC-3 simulator, the running result is Hello world!USTCUUTC
, which meets the expectation.