Getting Started with x64 Assembly

A while ago I decided the learn the basics of the Assembly language for my x64 processor. Assembly has a reputation for being difficult and elitist which is keeping many people from trying it out. But once you learn the basics, it is actually really simple. In fact, in many ways it is much simpler than other programming languages. Using Linux as our host system, we can make system calls to the kernel for many operations like reading and writing standard input/output and files. Also the x86 instruction set allows us to perform string operations and "loops" directly with a single instruction, condensing many common operations into just a few lines of code.

I am writing this as a "beginner's guide" with the assumption that you have some knowledge of C (or similar language), but don't have knowledge of Assembly yet. That is also the place where I started my journey into the world of Assembly. I am not suggesting using Assembly as your daily language in projects where you need to be productive, but I think learning the basics is a valuable tool for understanding how a computer works at a lower level, and will make you a better programmer in high-level languages.

When writing Assembly, we are dealing with and manipulating the CPU directly. The CPU contains "memory locations" called registers and programming in Assembly consists mainly of moving data in and out of those registers and performing operations on the data while it is in a register, such as comparison or math operations like addition or subtraction. Additionally we have jump instructions that can jump to arbitrary place in the code (which is really just a memory address) and continue execution from that point. It works much like the goto statement in C. This conveniently handles all the cases where we would normally use if, while, for, or a function call.

Assembler

To turn human-readable code into machine-readable code, we need an assembler. This is similar to compiler in C but simpler because the code we are writing is already closer to machine code to begin with. We will be using NASM, or Netwide Assembler, for these examples. It should be available as nasm package in all common Linux distributions. We will also be using the GNU linker, ld, which is in the binutils package, and the GDB debugger.

Make sure those are installed before proceeding if you wish to have an interactive experience:

$ nasm --version
NASM version 2.15.05 compiled on Sep 24 2020
$ ld --version
GNU ld (GNU Binutils) 2.36.1
$ gdb --version
GNU gdb (GDB) 10.2

Hello, World!

Before getting into the tedious technical details, lets start with a "Hello World" example that all software developers are familiar with. I guess it is something of a tradition that all writings about programming languages have to start with this specific example which outputs the text "Hello, World!" on the screen.

;; Hello World in Assembly

section .data
    text db `Hello, World!\n`

section .text

global _start
_start:
    mov rax, 1
    mov rdi, 1
    mov rsi, text
    mov rdx, 14
    syscall

    mov rax, 60
    mov rdi, 0
    syscall

Section .data is used to define initialized data like strings or more generally a sequence of bytes. The db stands for define bytes. Strings inside backquotes support escape sequences like the newline character shown here. Strings can also be delimited by normal single or double quotes.

Section .text is used to define code. Code is limited by labels which are just words followed by a colon. They mark a specific location in code that we can jump to. The _start is a special label which represents the starting point of the whole program, similar to the main function in C. The global keyword declares a label to be visible from outside the file. This would be similar to declaring it "public" in some other languages.

The code consists of performing two system calls. Each system call has a unique number and we store it in the RAX register. Number 1 corresponds to write and number 60 corresponds to exit. The second argument to write, stored in RDI, is the file where we want to write. Number 1 for files corresponds to standard output (stdout). The third argument is the text to write and 14 is the number of bytes to write.

If you save the file as hello.asm then you can assemble it with NASM into an object file:

$ nasm -f elf64 -o hello.o hello.asm

We use ELF64 as the format to specify we are assembling a 64-bit Linux application. Next we use the linker to turn the object file into a proper executable:

$ ld hello.o -o hello

If you have a larger program that consists of multiple source files, you can link multiple object files together into an executable just like you would in C. You can also mix object files from C and Assembly and link them all together into an executable.

Now we should have an executable ready to run, which does what we expect:

$ ./hello
Hello, World!

If you check the file sizes, they are really tiny:

$ ls -l
-rwxr-xr-x 1 pekka pekka 8840 2021-08-01 11:46 hello
-rw-r--r-- 1 pekka pekka  230 2021-08-01 11:28 hello.asm
-rw-r--r-- 1 pekka pekka  848 2021-08-01 11:42 hello.o

If you struggle with having "bloat" in your applications, this method of writing software is a good way to eliminate some of that. Next we must dwell a little deeper into some technical details to write meaningful programs that do something more interesting. But I will keep it simple, I promise!

Registers

Registers are the heart and soul of your CPU. Here is a table of some important registers that you should know about. RAX to RBP and R8 to R15 are called general-purpose registers. You can freely use all general-purpose registers with some caveats explained below. Registers XMM0 to XMM15 are used for working with floating-point values.

128 bit	64 bit	32 bit	16 bit	8 bit MSB	8 bit LSB	Notes
	RAX	EAX	AX	AH	AL	Accumulator
	RBX	EBX	BX	BH	BL	Base index (arrays)
	RCX	ECX	CX	CH	CL	Counter (loops, strings)
	RDX	EDX	DX	DH	DL	Extend accumulator
	RSI	ESI	SI		SIL	Source index (strings)
	RDI	EDI	DI		DIL	Destination index (strings)
	RSP	ESP	SP		SPL	Stack pointer (top of stack)
	RBP	EBP	BP		BPL	Stack base pointer (bottom of stack)
	R8 to R15	R8D to R15D	R8W to R15W		R8B to R15B	8 additional general purpose registers
	RIP	EIP	IP			Instruction pointer (program counter)
	RFLAGS	EFLAGS	FLAGS			Flags register
XMM0 to XMM15						16 floating-point registers

Since we are working with a 64-bit system and writing 64-bit code, we are mostly concerned with the column showing 64-bit registers starting with RAX. The 32-, 16- and 8-bit registers refer to the lower portions of the same 64-bit register. For example EAX refers to the lowest 32 bits of RAX, not a separate 32-bit register.

FLAGS is a special register which is used to test various conditions after comparison or arithmetic instruction is executed. For example, it can be used to check if the result of the last instruction is zero or not, if the result is even or odd, or if it caused an overflow. Here are some flags:

CF - Carry flag
PF - Parity flag
ZF - Zero flag
SF - Sign flag
OF - Overflow flag
AF - Adjust flag
IF - Interrupt flag

Most of the time you don't need to deal with the FLAGS register explicitly and you can use conditional jumps instead like jump-if-zero, jump-if-signed or jump-if-overflow that implicitly check the corresponding flag.

Instructions

There are lots of instructions in the x86-64 instruction set, but you only need a handful to get started with writing awesome code.

Move

The mov instruction is used to move values into and out of registers. I would say it is the most used instruction. It takes destination and source as arguments:

;; Move value 5 into RAX register:
mov rax, 5

;; Move value of RBX into RAX:
mov rax, rbx

We can also have "pointers" by having a memory address in a register and then access that memory location directly. The notation for this is square brackets around the register name:

;; Move the value at memory address where RBX is pointing into RAX:
mov rax, [rbx]

Unconditional Jump

Jump instructions allow us to jump to the location of a given label and continue execution from that point. The most simple case is the unconditional jump instruction jmp which always jumps without checking any conditions:

myLabel:
    jmp myLabel

The above code causes an infinite loop because the code jumps to myLabel indefinitely.

Compare and Conditional Jump

Often it is more useful to jump only if a given condition is true, like the if statement in C. We can use the cmp instruction to compare two values. It takes the form cmp a, b where a is always a register and b is a register or a value. The compare instruction sets status flags in the FLAGS register according to the result.

After the compare instruction has been executed, we can perform a conditional jump based on the result of the comparison. Here are some instructions for conditional jumps:

Instruction (signed)	Instruction (unsigned)	After `cmp a, b`, jump if
je		a = b
jne		a != b
jg	ja	a > b
jge	jae	a >= b
jl	jb	a < b
jle	jbe	a <= b
jz		a = 0
jnz		a != 0
jo		Overflow flag is set
jno		Overflow flag is not set
js		Sign flag is set
jns		Sign flag is not signed

Note that some instructions have different versions for dealing with signed and unsigned values. Here is an example of compare and jump:

cmp rax, rbx
jg myLabel

This code would jump to myLabel if the value in RAX is greater than the value in RBX.

Stack

The stack is a block of memory reserved for your application by the operating system. It operates as a Last-In-First-Out (LIFO) data structure. New items can be added on the top of the stack using the push instruction. Reading from the stack using the pop instruction always returns the last item that was added.

You can use the stack to store values temporarily. For example something like this:

push rax   ; push the current value of rax on stack
;; use rax for something else
pop rax    ; pop the last value from stack into rax

Note that it is generally better to use registers instead of stack when possible. Registers are located on the CPU itself while the stack is located in the RAM memory, so it is significantly faster to move values between two registers than access the stack.

The stack pointer (RSP) and stack base pointer (RBP) registers are used to manage the stack. Generally you don't need to use these registers directly.

Arithmetic

Arithmetic functions take a register as the first argument and a register or a value as the second argument. The first register is the subject of the operation and the result is stored in it. Note that reg below refers to any register while rax refers to the specific RAX register. Generally these functions also set the status flags in the FLAGS register so you can immediately execute a conditional jump without using a cmp instruction in between.

Instruction	Instruction (signed)	Notes
add a, b		a = a + b
sub a, b		a = a - b
mul reg	imul reg	rax = rax * reg
div reg	idiv reg	rax = rax / reg
neg reg		reg = -reg
inc reg		reg = reg + 1
dec reg		reg = reg - 1
adc a, b		a = a + b + CF
sbb a, b		a = a - b - CF

Example:

mov rax, 3
mov rbx, 2
add rax, rbx
;; rax will contain the value 5

Division comes with a two caveats. First, RDX is used together with RAX as RDX:RAX to make a 128-bit value and then that value is divided. Second, the remainder of the division is stored in RDX. For example:

mov rax 7
mov rdx 0
mov rcx 3
div rcx
;; rax will contain value 2
;; rdx will contain value 1

Instruction Reference

An excellent reference for x86/x64 instructions is maintained by Félix Cloutier. Use it to look up any instructions and how to use them.

Functions / Procedures / Subroutines

Just like in other languages, in order to write a larger program in Assembly, you need to divide it into functions that can be called independently. They are also called procedures or subroutines but I am not quite sure which is the most accurate term in this context. A function in Assembly is just a label followed by instructions (the function body) and finally a ret instruction.

A function is "called" with the call instruction which jumps to the given label and pushes a return address on the stack. The ret instruction then reads the return address from the stack and jumps to it.

Calling Convention

Calling conventions were established in order to have a standard for how registers are used in function calls. Note that calling convention is specific to your operating system. This article deals with Linux specifically, but you can see the Windows calling convention for comparison.

Function arguments are set in registers according to the following table:

Argument #	Register in user-space	Register in kernel-space
1	RDI	RDI
2	RSI	RSI
3	RDX	RDX
4	RCX	R10
5	R8	R8
6	R9	R9

All registers for arguments are same in user-space and kernel-space except the fourth. You should use the user-space convention in your own functions and the kernel-space convention when performing system calls.

Another part of calling convention determines who is responsible for restoring the original value into a register if it is modified. For this purpose registers are divided into caller-saved and callee-saved registers. The values of caller-saved registers must be saved by the parent procedure (caller), and possibly restored after the call into a subroutine has finished. The values of callee-saved registers must not be changed by the subroutine (callee), or if changed, must be restored to their original value before the call returns.

The registers RBX, RSP, RBP and R12 to R15 are callee-saved. All other registers are caller-saved and may be modified freely by the subroutine.

The return value of a procedure call is stored in RAX for integers and in XMM0 for floating-point values.

System Calls

System calls are functions on the Linux kernel that we can call from our programs. They are identified by a number which is stored in the RAX register and the system call is invoked with the syscall instruction.

Here are a few system calls for working with files and exiting the program:

Name	Id	Arg 1	Arg 2	Arg 3
read	0	file descriptor	buffer	length
write	1	file descriptor	buffer	length
open	2	filename	flags	mode
close	3	file descriptor
exit	60	exit code

You can find a list of all supported system calls and their numbers in the file /usr/include/asm/unistd_64.h on your system.

Standard Input/Output

On Linux it is convenient to use standard input and output to print text on the screen or read user input. Here are the file descriptor numbers for them:

0: standard input (stdin)
1: standard output (stdout)
2: standard error (stderr)

Hello, Again!

Well, that was a lot of technical stuff! But if you are still reading, you can perform a deep sigh of relief, because now we are ready to write some real programs instead of just learning about registers and conventions.

Lets modify our original "Hello World" program a bit. This time we will read user input and split our code into separate functions:

;; Improved Hello World

section .data
    text1 db "What is your name? "
    text2 db "Hello, "

section .bss
    input resb 16

section .text

global _start
_start:
    mov rdi, text1
    mov rsi, 19
    call print                  ; print text1
    call read                   ; read input
    mov rdi, text2
    mov rsi, 7
    call print                  ; print text2
    mov rdi, input
    mov rsi, 16
    call print                  ; print input
    call finish

finish:
    mov rax, 60                 ; exit
    mov rdi, 0                  ; exit code
    syscall

read:
    mov rax, 0                  ; read
    mov rdi, 0                  ; stdin
    mov rsi, input              ; buffer
    mov rdx, 16                 ; length
    syscall
    ret

print:
    mov rax, 1                  ; write
    mov rdx, rsi                ; length
    mov rsi, rdi                ; buffer
    mov rdi, 1                  ; stdin
    syscall
    ret

We have a new section here called .bss. It is used to define uninitialized data. The resb stands for reserve bytes which simply reserves a block of memory that can be used for reading and writing. We reserve a buffer of 16 bytes to hold user input.

Save the file as hello2.asm, assemble and run it:

$ nasm -f elf64 -o hello2.o hello2.asm
$ ld hello2.o -o hello2
$ ./hello2
What is your name? Hacker
Hello, Hacker

Now we are cooking! And it's not that many lines of code considering we can already handle input and output by leveraging the kernel.

However, there are still a few things we should improve.

Constants

Using plain numbers in code is not very convenient and we can make the code more readable by using constants. Constants can be defined in NASM using the equ expression. For example:

STDERR equ 2
SYS_EXIT equ 60

Macros

NASM supports macros which allow us to replace code similar to macros in the C preprocessor. This allows us to give inline arguments to function calls and wrap the mov instructions inside the macro.

Here is an example of a macro which allows us to call a function that takes 2 arguments:

%macro call2 3
    mov rdi, %2
    mov rsi, %3
    call %1
%endmacro

The arguments to the macro are referenced by %1, %2 and so on. The number 3 after the name means it takes 3 arguments.

Lets create a separate file called macro.asm where we can store convenient macros and include them in other files. This is similar to a header-file in C:

%macro call1 2
    mov rdi, %2
    call %1
%endmacro

%macro call2 3
    mov rdi, %2
    mov rsi, %3
    call %1
%endmacro

%macro call3 4
    mov rdi, %2
    mov rsi, %3
    mov rdx, %4
    call %1
%endmacro

Library Functions

Finally, we should split our application into generic library-code that can be used anywhere and into application-code that is specific to our "Hello World" application.

Another inconvenient part of our application is that we have to manually give the length of the string when printing it. It would be better to have a print function that just takes a null-terminated string and calculates the length automatically.

With these improvements in mind, lets first create a separate file called lib.asm and define a few generic functions:

;; File descriptors
STDIN equ 0
STDOUT equ 1
STDERR equ 2

;; Syscalls
SYS_READ equ 0
SYS_WRITE equ 1
SYS_EXIT equ 60

;; Export functions
global exit, gets, prints, strlen

section .text

;; Exit the program
;; Inputs: RDI = exit code
exit:
    mov rax, SYS_EXIT
    syscall                     ; rdi is passed on unchanged
    ret

;; Read input from stdin
;; Note that result includes newline character
;; Inputs: RDI = buffer, RSI = length
gets:
    mov rax, SYS_READ
    mov rdx, rsi                ; arg3: length
    mov rsi, rdi                ; arg2: buffer
    mov rdi, STDIN              ; arg1: file
    syscall
    ret

;; Print null-terminated string to stdout
;; Inputs: RDI
prints:
    mov rsi, rdi                ; arg2: string (strlen does not modify rsi)
    call strlen                 ; length into rax
    mov rdi, STDOUT             ; arg1: stdout
    mov rdx, rax                ; arg3: length
    mov rax, SYS_WRITE          ; syscall id
    syscall
    ret

;; Calculate length of null-terminated string
;; Inputs: RDI
strlen:
    xor rax, rax
    mov rcx, -1
    cld
    repne scasb                 ; loop until [rdi] != rax
    mov rax, rcx
    add rax, 2
    neg rax
    ret

Now we introduce the strlen function which contains a few new instructions. The xor instruction at the beginning is used to set the register to zero. The really interesting part is the repne scasb line. The scasb compares the input string one byte at a time to the value in RAX and the repne repeats it as long as the result is not equal. So we continue until a zero is found, thus giving us the length of a null-terminated string.

This means that we don't need to use a jump instruction to create a loop. Instead we can use special instructions provided by the CPU to run the loop directly on the CPU. I think that is really cool!

Hello, Final!

Now we have the final version of our Hello World application:

;; Final version of Hello World

;; Include a header file
%include "macro.asm"

;; Functions defined in library file
extern exit, gets, prints

section .data
    ;; Terminate strings with null
    text1 db `What is your name? \0`
    text2 db `Hello, \0`

section .bss
    input resb 16

section .text

global _start
_start:
    call1 prints, text1
    call2 gets, input, 16
    call1 prints, text2
    call1 prints, input
    call exit

We include our header file with the %include directive. We also need to use extern to introduce functions which are defined in another file. Strings need to be null-terminated for our strlen function to work correctly.

Now we need to assemble both files and link them together into an executable:

$ nasm -f elf64 -o lib.o lib.asm
$ nasm -f elf64 -o hello3.o hello3.asm
$ ld lib.o hello3.o -o hello3
$ ./hello3
What is your name? Superman
Hello, Superman

Our app still works and the code is starting to look pretty neat!

Debugging with GDB

In higher-level languages it is often easy to perform basic debugging by printing some debug information on the screen or a log file. While possible in Assembly, it's simply not practical. Therefore one of the most important tools to go with Assembly development is the GDB debugger. With the debugger you can execute one instruction at a time and view the values of the registers to see exactly what is happening. Keep it always with you like a trusted friend when working in Assembly.

How to use GDB would require a completely separate article but here are some basics. Use the -g flag to include debug information in the assembled files:

$ nasm -f elf64 -g -o lib.o lib.asm
$ nasm -f elf64 -g -o hello3.o hello3.asm
$ ld lib.o hello3.o -o hello3

Next start gdb and give the executable as argument:

$ gdb hello3
GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
Reading symbols from hello3...
(gdb)

You will get the debugger prompt shown as (gdb) where you can type commands. Normally we want to set a breakpoint at some label and when the code reaches it we want to pause execution and inspect the internal state of the program.

Set a breakpoint using break and the name of a label. For example:

(gdb) break strlen
Breakpoint 1 at 0x401033: file lib.asm, line 48.

Now start the program using run:

(gdb) run
Starting program: hello3

Breakpoint 1, strlen () at lib.asm:48
48	    xor rax, rax
(gdb)

The execution pauses at the breakpoint and GDB displays the next instruction to be executed as xor rax, rax which is the first instruction of the strlen function.

You can use the command info r to display the values of all registers:

(gdb) info r
rax            0x0                 0
rbx            0x0                 0
rcx            0x0                 0
rdx            0x0                 0
rsi            0x402000            4202496
rdi            0x402000            4202496
rbp            0x0                 0x0
rsp            0x7fffffffe6a0      0x7fffffffe6a0
r8             0x0                 0
r9             0x0                 0
r10            0x0                 0
r11            0x0                 0
r12            0x0                 0
r13            0x0                 0
r14            0x0                 0
r15            0x0                 0
rip            0x401033            0x401033 <strlen>
eflags         0x202               [ IF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0

Use the step command to execute one instruction at a time. For example, if you step twice and then check the registers again, you will see that the value of RCX was indeed set to -1:

(gdb) step
49	    mov rcx, -1
(gdb) step
50	    cld
(gdb) info r rcx
rcx            0xffffffffffffffff  -1

Step a few more times to execute the repne scasb instruction:

(gdb) step
51	    repne scasb                 ; loop until [rdi] != rax
(gdb) step
52	    mov rax, rcx
(gdb) info r rcx
rcx            0xffffffffffffffeb  -21

Now we see the value in the RCX register is -21. This is really awesome!

Use continue to continue execution at full speed until the next breakpoint:

(gdb) continue
Continuing.
What is your name? Bob

Breakpoint 1, strlen () at lib.asm:48
48	    xor rax, rax
(gdb)

Execution is paused at the next breakpoint which occurs at strlen again.

Use quit to exit the debugger.

More Examples

I will add a few more examples of some simple functions I wrote. Please take these with a caution that they have not been extensively tested and may contain bugs. But they might be useful to get some ideas of what you can do with Assembly.

;; Compare two memory locations
;; Inputs: RDI, RSI, RDX = length
memcmp:
    mov rcx, rdx
    cld
    repe cmpsb                  ; repeat until equal or rcx=0
    jz .equal
    jns .negres
    mov rax, 1
    ret
.negres:
    mov rax, -1
    ret
.equal:
    mov rax, 0
    ret

;; Copy bytes from source to destination
;; Inputs: RDI = destination, RSI = source, RDX = length
memcpy:
    mov rcx, rdx
    cld
    rep movsb                   ; copy RCX bytes from RSI to RDI
    ret

;; Set bytes to specified value
;; Inputs: RDI = destination, RSI = value to set, RDX = length
memset:
    mov rax, rsi
    mov rcx, rdx
    cld
    rep stosb
    ret

;; Compare two strings
;; Inputs: RDI, RSI
strcmp:
    mov al, [rdi]
    cmp al, [rsi]
    jne .noteq
    test al, al
    jz .equal
    inc rdi
    inc rsi
    jmp strcmp
.equal:
    mov rax, 0
    ret
.noteq:
    js .negres                  ; check sign flag (SF)
    mov rax, 1
    ret
.negres:
    mov rax, -1
    ret

;; Copy null-terminated string
;; Inputs: RDI = destination, RSI = source
strcpy:
    cld
.loop:
    lodsb
    stosb
    test al, al
    jnz .loop
    ret

One useful thing to know is that if a label name starts with a dot, it is treated like a private sub-label under the main label. This means that you can have sub-labels with the same name under multiple parent-labels and it does not cause a namespace conflict.

Conclusions

Well, this has been quite a long post and it's time to wrap it up. It's difficult to be concise but at the same time include all the relevant information.

As for the benefits of learning Assembly, I think it's really useful to understand how a CPU works. Higher-level languages hide all of this away and most developers writing software today have no idea what is happening under the surface. In addition to increasing your knowledge, consider the obvious benefits for your social life. Just imagine standing in a room full of JavaScript developers and casually dropping that you write Assembly. That is some serious "geek cred" right there.

Hope you have fun exploring some things shown here.