A while ago I decided the learn the basics of the Assembly language for my x64 processor. Assembly has a reputation for being difficult and elitist which is keeping many people from trying it out. But once you learn the basics, it is actually really simple. In fact, in many ways it is much simpler than other programming languages. Using Linux as our host system, we can make system calls to the kernel for many operations like reading and writing standard input/output and files. Also the x86 instruction set allows us to perform string operations and "loops" directly with a single instruction, condensing many common operations into just a few lines of code.
I am writing this as a "beginner's guide" with the assumption that you have some knowledge of C (or similar language), but don't have knowledge of Assembly yet. That is also the place where I started my journey into the world of Assembly. I am not suggesting using Assembly as your daily language in projects where you need to be productive, but I think learning the basics is a valuable tool for understanding how a computer works at a lower level, and will make you a better programmer in high-level languages.
When writing Assembly, we are dealing with and manipulating the CPU directly. The CPU contains "memory locations" called registers and programming in Assembly consists mainly of moving data in and out of those registers and performing operations on the data while it is in a register, such as comparison or math operations like addition or subtraction. Additionally we have jump instructions that can jump to arbitrary place in the code (which is really just a memory address) and continue execution from that point. It works much like the goto statement in C. This conveniently handles all the cases where we would normally use if, while, for, or a function call.
To turn human-readable code into machine-readable code, we need an assembler. This is similar to compiler in C but simpler because the code we are writing is already closer to machine code to begin with. We will be using NASM, or Netwide Assembler, for these examples. It should be available as nasm package in all common Linux distributions. We will also be using the GNU linker, ld, which is in the binutils package, and the GDB debugger.
Make sure those are installed before proceeding if you wish to have an interactive experience:
$ nasm --version NASM version 2.15.05 compiled on Sep 24 2020 $ ld --version GNU ld (GNU Binutils) 2.36.1 $ gdb --version GNU gdb (GDB) 10.2
Before getting into the tedious technical details, lets start with a "Hello World" example that all software developers are familiar with. I guess it is something of a tradition that all writings about programming languages have to start with this specific example which outputs the text "Hello, World!" on the screen.
;; Hello World in Assembly .data text db `Hello, World!\n` .text _start _start: mov rax, 1 mov rdi, 1 mov rsi, text mov rdx, 14 syscall mov rax, 60 mov rdi, 0 syscall
.data is used to define initialized data like strings or more generally a sequence of bytes. The
db stands for define bytes. Strings inside backquotes support escape sequences like the newline character shown here. Strings can also be delimited by normal single or double quotes.
.text is used to define code. Code is limited by labels which are just words followed by a colon. They mark a specific location in code that we can jump to. The
_start is a special label which represents the starting point of the whole program, similar to the main function in C. The
global keyword declares a label to be visible from outside the file. This would be similar to declaring it "public" in some other languages.
The code consists of performing two system calls. Each system call has a unique number and we store it in the RAX register. Number 1 corresponds to write and number 60 corresponds to exit. The second argument to write, stored in RDI, is the file where we want to write. Number 1 for files corresponds to standard output (stdout). The third argument is the text to write and 14 is the number of bytes to write.
If you save the file as hello.asm then you can assemble it with NASM into an object file:
$ nasm -f elf64 -o hello.o hello.asm
We use ELF64 as the format to specify we are assembling a 64-bit Linux application. Next we use the linker to turn the object file into a proper executable:
$ ld hello.o -o hello
If you have a larger program that consists of multiple source files, you can link multiple object files together into an executable just like you would in C. You can also mix object files from C and Assembly and link them all together into an executable.
Now we should have an executable ready to run, which does what we expect:
$ ./hello Hello, World!
If you check the file sizes, they are really tiny:
$ ls -l -rwxr-xr-x 1 pekka pekka 8840 2021-08-01 11:46 hello -rw-r--r-- 1 pekka pekka 230 2021-08-01 11:28 hello.asm -rw-r--r-- 1 pekka pekka 848 2021-08-01 11:42 hello.o
If you struggle with having "bloat" in your applications, this method of writing software is a good way to eliminate some of that. Next we must dwell a little deeper into some technical details to write meaningful programs that do something more interesting. But I will keep it simple, I promise!
Registers are the heart and soul of your CPU. Here is a table of some important registers that you should know about. RAX to RBP and R8 to R15 are called general-purpose registers. You can freely use all general-purpose registers with some caveats explained below. Registers XMM0 to XMM15 are used for working with floating-point values.
|128 bit||64 bit||32 bit||16 bit||8 bit MSB||8 bit LSB||Notes|
|RBX||EBX||BX||BH||BL||Base index (arrays)|
|RCX||ECX||CX||CH||CL||Counter (loops, strings)|
|RSI||ESI||SI||SIL||Source index (strings)|
|RDI||EDI||DI||DIL||Destination index (strings)|
|RSP||ESP||SP||SPL||Stack pointer (top of stack)|
|RBP||EBP||BP||BPL||Stack base pointer (bottom of stack)|
|R8 to R15||R8D to R15D||R8W to R15W||R8B to R15B||8 additional general purpose registers|
|RIP||EIP||IP||Instruction pointer (program counter)|
|XMM0 to XMM15||16 floating-point registers|
Since we are working with a 64-bit system and writing 64-bit code, we are mostly concerned with the column showing 64-bit registers starting with RAX. The 32-, 16- and 8-bit registers refer to the lower portions of the same 64-bit register. For example EAX refers to the lowest 32 bits of RAX, not a separate 32-bit register.
FLAGS is a special register which is used to test various conditions after comparison or arithmetic instruction is executed. For example, it can be used to check if the result of the last instruction is zero or not, if the result is even or odd, or if it caused an overflow. Here are some flags:
- CF - Carry flag
- PF - Parity flag
- ZF - Zero flag
- SF - Sign flag
- OF - Overflow flag
- AF - Adjust flag
- IF - Interrupt flag
Most of the time you don't need to deal with the FLAGS register explicitly and you can use conditional jumps instead like jump-if-zero, jump-if-signed or jump-if-overflow that implicitly check the corresponding flag.
There are lots of instructions in the x86-64 instruction set, but you only need a handful to get started with writing awesome code.
mov instruction is used to move values into and out of registers. I would say it is the most used instruction. It takes destination and source as arguments:
;; Move value 5 into RAX register: mov rax, 5 ;; Move value of RBX into RAX: mov rax, rbx
We can also have "pointers" by having a memory address in a register and then access that memory location directly. The notation for this is square brackets around the register name:
;; Move the value at memory address where RBX is pointing into RAX: mov rax, [rbx]
Jump instructions allow us to jump to the location of a given label and continue execution from that point. The most simple case is the unconditional jump instruction
jmp which always jumps without checking any conditions:
myLabel: jmp myLabel
The above code causes an infinite loop because the code jumps to
Compare and Conditional Jump
Often it is more useful to jump only if a given condition is true, like the if statement in C. We can use the
cmp instruction to compare two values. It takes the form
cmp a, b where a is always a register and b is a register or a value. The compare instruction sets status flags in the FLAGS register according to the result.
After the compare instruction has been executed, we can perform a conditional jump based on the result of the comparison. Here are some instructions for conditional jumps:
|Instruction (signed)||Instruction (unsigned)||After
|je||a = b|
|jne||a != b|
|jg||ja||a > b|
|jge||jae||a >= b|
|jl||jb||a < b|
|jle||jbe||a <= b|
|jz||a = 0|
|jnz||a != 0|
|jo||Overflow flag is set|
|jno||Overflow flag is not set|
|js||Sign flag is set|
|jns||Sign flag is not signed|
Note that some instructions have different versions for dealing with signed and unsigned values. Here is an example of compare and jump:
cmp rax, rbx jg myLabel
This code would jump to
myLabel if the value in RAX is greater than the value in RBX.
The stack is a block of memory reserved for your application by the operating system. It operates as a Last-In-First-Out (LIFO) data structure. New items can be added on the top of the stack using the
push instruction. Reading from the stack using the
pop instruction always returns the last item that was added.
You can use the stack to store values temporarily. For example something like this:
push rax ; push the current value of rax on stack ;; use rax for something else pop rax ; pop the last value from stack into rax
Note that it is generally better to use registers instead of stack when possible. Registers are located on the CPU itself while the stack is located in the RAM memory, so it is significantly faster to move values between two registers than access the stack.
The stack pointer (RSP) and stack base pointer (RBP) registers are used to manage the stack. Generally you don't need to use these registers directly.
Arithmetic functions take a register as the first argument and a register or a value as the second argument. The first register is the subject of the operation and the result is stored in it. Note that reg below refers to any register while rax refers to the specific RAX register. Generally these functions also set the status flags in the FLAGS register so you can immediately execute a conditional jump without using a
cmp instruction in between.
|add a, b||a = a + b|
|sub a, b||a = a - b|
|mul reg||imul reg||rax = rax * reg|
|div reg||idiv reg||rax = rax / reg|
|neg reg||reg = -reg|
|inc reg||reg = reg + 1|
|dec reg||reg = reg - 1|
|adc a, b||a = a + b + CF|
|sbb a, b||a = a - b - CF|
mov rax, 3 mov rbx, 2 add rax, rbx ;; rax will contain the value 5
Division comes with a two caveats. First, RDX is used together with RAX as RDX:RAX to make a 128-bit value and then that value is divided. Second, the remainder of the division is stored in RDX. For example:
mov rax 7 mov rdx 0 mov rcx 3 div rcx ;; rax will contain value 2 ;; rdx will contain value 1
An excellent reference for x86/x64 instructions is maintained by Félix Cloutier. Use it to look up any instructions and how to use them.
Functions / Procedures / Subroutines
Just like in other languages, in order to write a larger program in Assembly, you need to divide it into functions that can be called independently. They are also called procedures or subroutines but I am not quite sure which is the most accurate term in this context. A function in Assembly is just a label followed by instructions (the function body) and finally a
A function is "called" with the
call instruction which jumps to the given label and pushes a return address on the stack. The
ret instruction then reads the return address from the stack and jumps to it.
Calling conventions were established in order to have a standard for how registers are used in function calls. Note that calling convention is specific to your operating system. This article deals with Linux specifically, but you can see the Windows calling convention for comparison.
Function arguments are set in registers according to the following table:
|Argument #||Register in user-space||Register in kernel-space|
All registers for arguments are same in user-space and kernel-space except the fourth. You should use the user-space convention in your own functions and the kernel-space convention when performing system calls.
Another part of calling convention determines who is responsible for restoring the original value into a register if it is modified. For this purpose registers are divided into caller-saved and callee-saved registers. The values of caller-saved registers must be saved by the parent procedure (caller), and possibly restored after the call into a subroutine has finished. The values of callee-saved registers must not be changed by the subroutine (callee), or if changed, must be restored to their original value before the call returns.
The registers RBX, RSP, RBP and R12 to R15 are callee-saved. All other registers are caller-saved and may be modified freely by the subroutine.
The return value of a procedure call is stored in RAX for integers and in XMM0 for floating-point values.
System calls are functions on the Linux kernel that we can call from our programs. They are identified by a number which is stored in the RAX register and the system call is invoked with the
Here are a few system calls for working with files and exiting the program:
|Name||Id||Arg 1||Arg 2||Arg 3|
You can find a list of all supported system calls and their numbers in the file /usr/include/asm/unistd_64.h on your system.
On Linux it is convenient to use standard input and output to print text on the screen or read user input. Here are the file descriptor numbers for them:
- 0: standard input (stdin)
- 1: standard output (stdout)
- 2: standard error (stderr)
Well, that was a lot of technical stuff! But if you are still reading, you can perform a deep sigh of relief, because now we are ready to write some real programs instead of just learning about registers and conventions.
Lets modify our original "Hello World" program a bit. This time we will read user input and split our code into separate functions:
;; Improved Hello World .data text1 db "What is your name? " text2 db "Hello, " .bss input resb 16 .text _start _start: mov rdi, text1 mov rsi, 19 call print ; print text1 call read ; read input mov rdi, text2 mov rsi, 7 call print ; print text2 mov rdi, input mov rsi, 16 call print ; print input call finish finish: mov rax, 60 ; exit mov rdi, 0 ; exit code syscall read: mov rax, 0 ; read mov rdi, 0 ; stdin mov rsi, input ; buffer mov rdx, 16 ; length syscall ret print: mov rax, 1 ; write mov rdx, rsi ; length mov rsi, rdi ; buffer mov rdi, 1 ; stdin syscall ret
We have a new section here called
.bss. It is used to define uninitialized data. The
resb stands for reserve bytes which simply reserves a block of memory that can be used for reading and writing. We reserve a buffer of 16 bytes to hold user input.
Save the file as hello2.asm, assemble and run it:
$ nasm -f elf64 -o hello2.o hello2.asm $ ld hello2.o -o hello2 $ ./hello2 What is your name? Hacker Hello, Hacker
Now we are cooking! And it's not that many lines of code considering we can already handle input and output by leveraging the kernel.
However, there are still a few things we should improve.
Using plain numbers in code is not very convenient and we can make the code more readable by using constants. Constants can be defined in NASM using the
equ expression. For example:
STDERR equ 2 SYS_EXIT equ 60
NASM supports macros which allow us to replace code similar to macros in the C preprocessor. This allows us to give inline arguments to function calls and wrap the
mov instructions inside the macro.
Here is an example of a macro which allows us to call a function that takes 2 arguments:
%macro call2 3 mov rdi, %2 mov rsi, %3 call %1 %endmacro
The arguments to the macro are referenced by
%2 and so on. The number 3 after the name means it takes 3 arguments.
Lets create a separate file called macro.asm where we can store convenient macros and include them in other files. This is similar to a header-file in C:
%macro call1 2 mov rdi, %2 call %1 %endmacro %macro call2 3 mov rdi, %2 mov rsi, %3 call %1 %endmacro %macro call3 4 mov rdi, %2 mov rsi, %3 mov rdx, %4 call %1 %endmacro
Finally, we should split our application into generic library-code that can be used anywhere and into application-code that is specific to our "Hello World" application.
Another inconvenient part of our application is that we have to manually give the length of the string when printing it. It would be better to have a print function that just takes a null-terminated string and calculates the length automatically.
With these improvements in mind, lets first create a separate file called lib.asm and define a few generic functions:
;; File descriptors STDIN equ 0 STDOUT equ 1 STDERR equ 2 ;; Syscalls SYS_READ equ 0 SYS_WRITE equ 1 SYS_EXIT equ 60 ;; Export functions exit, gets, prints, strlen .text ;; Exit the program ;; Inputs: RDI = exit code exit: mov rax, SYS_EXIT syscall ; rdi is passed on unchanged ret ;; Read input from stdin ;; Note that result includes newline character ;; Inputs: RDI = buffer, RSI = length gets: mov rax, SYS_READ mov rdx, rsi ; arg3: length mov rsi, rdi ; arg2: buffer mov rdi, STDIN ; arg1: file syscall ret ;; Print null-terminated string to stdout ;; Inputs: RDI prints: mov rsi, rdi ; arg2: string (strlen does not modify rsi) call strlen ; length into rax mov rdi, STDOUT ; arg1: stdout mov rdx, rax ; arg3: length mov rax, SYS_WRITE ; syscall id syscall ret ;; Calculate length of null-terminated string ;; Inputs: RDI strlen: xor rax, rax mov rcx, -1 cld repne scasb ; loop until [rdi] != rax mov rax, rcx add rax, 2 neg rax ret
Now we introduce the
strlen function which contains a few new instructions. The
xor instruction at the beginning is used to set the register to zero. The really interesting part is the
repne scasb line. The
scasb compares the input string one byte at a time to the value in RAX and the
repne repeats it as long as the result is not equal. So we continue until a zero is found, thus giving us the length of a null-terminated string.
This means that we don't need to use a jump instruction to create a loop. Instead we can use special instructions provided by the CPU to run the loop directly on the CPU. I think that is really cool!
Now we have the final version of our Hello World application:
;; Final version of Hello World ;; Include a header file "macro.asm" ;; Functions defined in library file exit, gets, prints .data ;; Terminate strings with null text1 db `What is your name? \0` text2 db `Hello, \0` .bss input resb 16 .text _start _start: call1 prints, text1 call2 gets, input, 16 call1 prints, text2 call1 prints, input call exit
We include our header file with the
%include directive. We also need to use
extern to introduce functions which are defined in another file. Strings need to be null-terminated for our
strlen function to work correctly.
Now we need to assemble both files and link them together into an executable:
$ nasm -f elf64 -o lib.o lib.asm $ nasm -f elf64 -o hello3.o hello3.asm $ ld lib.o hello3.o -o hello3 $ ./hello3 What is your name? Superman Hello, Superman
Our app still works and the code is starting to look pretty neat!
Debugging with GDB
In higher-level languages it is often easy to perform basic debugging by printing some debug information on the screen or a log file. While possible in Assembly, it's simply not practical. Therefore one of the most important tools to go with Assembly development is the GDB debugger. With the debugger you can execute one instruction at a time and view the values of the registers to see exactly what is happening. Keep it always with you like a trusted friend when working in Assembly.
How to use GDB would require a completely separate article but here are some basics. Use the
-g flag to include debug information in the assembled files:
$ nasm -f elf64 -g -o lib.o lib.asm $ nasm -f elf64 -g -o hello3.o hello3.asm $ ld lib.o hello3.o -o hello3
Next start gdb and give the executable as argument:
$ gdb hello3 GNU gdb (GDB) 10.2 Copyright (C) 2021 Free Software Foundation, Inc. Reading symbols from hello3... (gdb)
You will get the debugger prompt shown as (gdb) where you can type commands. Normally we want to set a breakpoint at some label and when the code reaches it we want to pause execution and inspect the internal state of the program.
Set a breakpoint using
break and the name of a label. For example:
(gdb) break strlen Breakpoint 1 at 0x401033: file lib.asm, line 48.
Now start the program using
(gdb) run Starting program: hello3 Breakpoint 1, strlen () at lib.asm:48 48 xor rax, rax (gdb)
The execution pauses at the breakpoint and GDB displays the next instruction to be executed as
xor rax, rax which is the first instruction of the strlen function.
You can use the command
info r to display the values of all registers:
(gdb) info r rax 0x0 0 rbx 0x0 0 rcx 0x0 0 rdx 0x0 0 rsi 0x402000 4202496 rdi 0x402000 4202496 rbp 0x0 0x0 rsp 0x7fffffffe6a0 0x7fffffffe6a0 r8 0x0 0 r9 0x0 0 r10 0x0 0 r11 0x0 0 r12 0x0 0 r13 0x0 0 r14 0x0 0 r15 0x0 0 rip 0x401033 0x401033 <strlen> eflags 0x202 [ IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0
step command to execute one instruction at a time. For example, if you step twice and then check the registers again, you will see that the value of RCX was indeed set to -1:
(gdb) step 49 mov rcx, -1 (gdb) step 50 cld (gdb) info r rcx rcx 0xffffffffffffffff -1
Step a few more times to execute the
repne scasb instruction:
(gdb) step 51 repne scasb ; loop until [rdi] != rax (gdb) step 52 mov rax, rcx (gdb) info r rcx rcx 0xffffffffffffffeb -21
Now we see the value in the RCX register is -21. This is really awesome!
continue to continue execution at full speed until the next breakpoint:
(gdb) continue Continuing. What is your name? Bob Breakpoint 1, strlen () at lib.asm:48 48 xor rax, rax (gdb)
Execution is paused at the next breakpoint which occurs at strlen again.
quit to exit the debugger.
I will add a few more examples of some simple functions I wrote. Please take these with a caution that they have not been extensively tested and may contain bugs. But they might be useful to get some ideas of what you can do with Assembly.
;; Compare two memory locations ;; Inputs: RDI, RSI, RDX = length memcmp: mov rcx, rdx cld repe cmpsb ; repeat until equal or rcx=0 jz .equal jns .negres mov rax, 1 ret .negres: mov rax, -1 ret .equal: mov rax, 0 ret ;; Copy bytes from source to destination ;; Inputs: RDI = destination, RSI = source, RDX = length memcpy: mov rcx, rdx cld rep movsb ; copy RCX bytes from RSI to RDI ret ;; Set bytes to specified value ;; Inputs: RDI = destination, RSI = value to set, RDX = length memset: mov rax, rsi mov rcx, rdx cld rep stosb ret ;; Compare two strings ;; Inputs: RDI, RSI strcmp: mov al, [rdi] cmp al, [rsi] jne .noteq test al, al jz .equal inc rdi inc rsi jmp strcmp .equal: mov rax, 0 ret .noteq: js .negres ; check sign flag (SF) mov rax, 1 ret .negres: mov rax, -1 ret ;; Copy null-terminated string ;; Inputs: RDI = destination, RSI = source strcpy: cld .loop: lodsb stosb test al, al jnz .loop ret
One useful thing to know is that if a label name starts with a dot, it is treated like a private sub-label under the main label. This means that you can have sub-labels with the same name under multiple parent-labels and it does not cause a namespace conflict.
Well, this has been quite a long post and it's time to wrap it up. It's difficult to be concise but at the same time include all the relevant information.
Hope you have fun exploring some things shown here.