Section 2: x86-64 Instruction Set - The Assembly Blueprint

Section 2 Summary

Building on foundational knowledge, this section dives deeper into the x86-64 instruction set, focusing on common registers, operand types, and the role of assembler directives. You'll learn how instructions map to real CPU architecture.

Lesson 2.1: x86-64 Architecture Overview

Learning Objectives

Explain the design principles behind x86-64 (register extensions, backwards compatibility).
Identify key hardware features relevant to assembly (e.g., expanded register set).

Prerequisites

Basic knowledge of x86 registers (from Section 1).
Familiarity with 32-bit vs. 64-bit architectures.

Key Concepts

Extended Registers: rax, rbx, rcx, rdx, r8–r15
RIP-relative addressing: A new feature in x86-64
Compatibility: 64-bit mode vs. legacy mode

Detailed Explanation

x86-64 Evolution

x86-64 (also known as AMD64 or Intel 64) extends the x86 architecture to 64 bits while maintaining backward compatibility:

Extended Address Space: Can address up to 16 EB (exabytes) theoretically
More Registers: 16 general-purpose registers instead of 8
Wider Registers: All registers extended to 64 bits
Improved Calling Conventions: More efficient parameter passing

Register Extensions

In 64-bit mode, additional registers r8 through r15 are available, each 64 bits wide:

Register	64-bit	32-bit	16-bit	8-bit
Extended #8	R8	R8D	R8W	R8B
Extended #9	R9	R9D	R9W	R9B
...	...	...	...	...
Extended #15	R15	R15D	R15W	R15B

RIP-Relative Addressing

RIP-relative addressing allows referencing memory relative to the instruction pointer, useful for position-independent code:

; Example of RIP-relative addressing
mov rax, [rip + some_label]  ; Load from address relative to current instruction
lea rbx, [rip + data_table]  ; Get address of data_table
...
some_label:
    dq 0x11223344
data_table:
    dq 0x1000, 0x2000, 0x3000

Operating Modes

Long Mode (64-bit): Native x86-64 mode with full 64-bit capabilities
Compatibility Mode: Runs 32-bit and 16-bit code within 64-bit OS
Legacy Mode: Traditional 32-bit x86 mode

Memory Model

x86-64 uses a flat memory model with virtual addressing:

Virtual Address Space: Each process has its own 64-bit address space
Canonical Addresses: Only lower 48 bits typically used
Segmentation: Largely obsolete in 64-bit mode

Exercises & Practice Problems

Question: Which registers are new in x86-64 as opposed to x86?

Answer: r8, r9, r10, r11, r12, r13, r14, r15. These provide 8 additional general-purpose registers beyond the original 8.

Exercise: Inspect code compiled as position-independent (-fPIC) and observe RIP-relative addressing in the disassembly.

Solution: Compile with gcc -fPIC -c file.c then objdump -d file.o. Look for instructions like mov rax, [rip+0x...] or lea rdi, [rip+0x...].

Recommended Resources

Intel® 64 and IA-32 Architectures Software Developer's Manual, Vol. 1 (Architecture)
Position-Independent Code (GCC)

Lesson 2.2: Commonly Used Registers in x86-64

Learning Objectives

Differentiate between general-purpose, index, base pointer, stack pointer registers.
Describe how each register is commonly used in code.

Prerequisites

Understanding of basic register usage and CPU architecture.

Key Concepts

rbp (base pointer) often used for stack frames
rsp (stack pointer) points to the top of the stack
rsi, rdi often used for string operations

Detailed Explanation

General-Purpose Registers and Their Conventional Uses

Register	Traditional Name	Common Use	Calling Convention Role
RAX	Accumulator	Arithmetic operations, return values	Return value
RBX	Base	General storage, base pointer	Callee-saved
RCX	Counter	Loop counters, string operations	4th function argument
RDX	Data	Data operations, I/O	3rd function argument
RSI	Source Index	String/memory operations source	2nd function argument
RDI	Destination Index	String/memory operations destination	1st function argument
RSP	Stack Pointer	Points to top of stack	Stack management
RBP	Base Pointer	Frame pointer for stack frames	Callee-saved

Stack Management

In typical C function prologues, rbp is used to create a stable reference for local variables:

; Typical function prologue
push rbp         ; save old base pointer
mov rbp, rsp     ; set new base pointer to current stack top
sub rsp, 32      ; reserve 32 bytes for local variables

; Function body can now reference locals as [rbp-8], [rbp-16], etc.
mov [rbp-8], rax    ; store local variable
mov rbx, [rbp-16]   ; load local variable

; Typical function epilogue  
mov rsp, rbp     ; restore stack pointer
pop rbp          ; restore old base pointer
ret              ; return to caller

System V AMD64 Calling Convention

In System V AMD64 calling convention, function arguments are passed in registers in this order:

RDI - First argument
RSI - Second argument
RDX - Third argument
RCX - Fourth argument
R8 - Fifth argument
R9 - Sixth argument
Stack - Additional arguments

; Example function call: func(1, 2, 3, 4, 5, 6, 7, 8)
mov rdi, 1          ; 1st argument
mov rsi, 2          ; 2nd argument  
mov rdx, 3          ; 3rd argument
mov rcx, 4          ; 4th argument
mov r8, 5           ; 5th argument
mov r9, 6           ; 6th argument
push 8              ; 8th argument (pushed first)
push 7              ; 7th argument 
call func
add rsp, 16         ; clean up stack (2 * 8 bytes)

Extended Registers (R8-R15)

The additional registers provide more flexibility:

R8, R9: Used for 5th and 6th function arguments
R10, R11: Caller-saved scratch registers
R12-R15: Callee-saved general-purpose registers

Exercises & Practice Problems

Question: Which register typically holds the return address after a call instruction?

Answer: The return address is pushed onto the stack at [rsp]. The CPU automatically uses it on ret. No single register permanently holds the return address.

Exercise: Write a short function in C, compile with -O0, and disassemble. Identify usage of rbp and rsp.

Solution:

// example.c
int add(int a, int b) {
    int result = a + b;
    return result;
}

// Compile: gcc -O0 -c example.c
// Disassemble: objdump -d example.o
// Look for push rbp, mov rbp,rsp at start and leave/ret at end

Recommended Resources

System V AMD64 ABI Documentation
Intel Developer Manual, Vol. 2

Lesson 2.3: Immediate, Register, and Memory Operands

Learning Objectives

Recognize and interpret different operand types (immediate, register, memory).
Explain how each operand type is used in assembly instructions.

Prerequisites

Basic knowledge of instruction format.
Familiarity with memory addresses and registers.

Key Concepts

Immediate Operands: Hardcoded values (e.g., mov rax, 5)
Register Operands: The CPU's internal registers
Memory Operands: References to addresses in RAM

Detailed Explanation

Immediate Operands

Immediate operands are constant values encoded directly into the instruction:

The operand is part of the instruction encoding (e.g., add rax, 0x10)
Can be decimal, hexadecimal, or binary
Size must match the operation (8, 16, 32, or 64 bits)

; Immediate operand examples
mov rax, 42          ; Load immediate decimal value
mov rbx, 0xFF        ; Load immediate hexadecimal value  
add rcx, 1000        ; Add immediate value to register
cmp rdx, 0           ; Compare register with immediate zero

Register Operands

Fastest operand type (no memory access required)
Can be 8, 16, 32, or 64-bit portions of same physical register
Writing to 32-bit register zeros upper 32 bits in 64-bit mode

; Register operand examples
mov rax, rbx         ; Copy value from RBX to RAX
add rsi, rdi         ; Add RDI to RSI
xor rcx, rcx         ; Clear RCX (common idiom: XOR register with itself)
inc r8               ; Increment R8 by 1

Memory Operands

Memory operands reference locations in RAM using various addressing modes:

Direct Memory Addressing

mov rax, [0x401000]      ; Load from absolute address
mov [global_var], rbx    ; Store to named memory location

Register Indirect

mov rax, [rbx]           ; Load from address in RBX
mov [rcx], rdx           ; Store RDX to address in RCX

Base + Displacement

mov rax, [rbp-8]         ; Load from RBP minus 8 (local variable)
mov [rsp+16], rbx        ; Store to RSP plus 16

Base + Index * Scale + Displacement

mov rax, [rbx + rcx*4]           ; Array access: base + index*scale
mov rdx, [rsi + rdi*8 + 16]      ; Complex addressing

Operand Size Considerations

Operation	Size	Effect on 64-bit Register
`mov al, 5`	8-bit	Modifies lowest 8 bits only
`mov ax, 5`	16-bit	Modifies lowest 16 bits only
`mov eax, 5`	32-bit	Clears upper 32 bits, sets lower 32
`mov rax, 5`	64-bit	Sets entire 64-bit register

Combining Operand Types

; Different operands in a single snippet
mov rax, 10        ; immediate to register
add rax, rbx       ; register to register  
mov [rcx], rax     ; register to memory
inc QWORD [rdx+8]  ; memory operand with increment

Exercises & Practice Problems

Question: What is the difference between mov rax, [rbx] and mov [rbx], rax?

Answer: The first loads from memory at address in rbx into rax; the second stores the contents of rax into memory at address in rbx. Direction matters!

Exercise: Write a snippet that loads a value from memory into rax, adds an immediate, and stores it back to memory. Disassemble and explain each instruction.

Solution:

mov rax, [rbp-8]    ; Load from local variable (memory operand)
add rax, 100        ; Add immediate value 100  
mov [rbp-8], rax    ; Store back to same location (memory operand)

Recommended Resources

Intel Developer Manual, Vol. 2: Instruction Set Reference
x86 Assembly Guide

Lesson 2.4: Basic Arithmetic and Logical Instructions

Learning Objectives

Demonstrate usage of arithmetic instructions (add, sub, mul, imul, div, idiv).
Apply logical instructions (and, or, xor, not) in example code.

Prerequisites

Familiarity with integer arithmetic and boolean logic in high-level languages.
Understanding of CPU flags.

Key Concepts

Unsigned vs. Signed Operations: mul vs. imul; div vs. idiv
Flag Effects: Overflow flag (OF), carry flag (CF), zero flag (ZF)

Detailed Explanation

Arithmetic Instructions

Addition and Subtraction

; Basic arithmetic
add rax, rbx        ; rax = rax + rbx
sub rcx, 10         ; rcx = rcx - 10
inc rdx             ; rdx = rdx + 1 (more efficient than add rdx, 1)
dec rsi             ; rsi = rsi - 1

Multiplication

MUL (unsigned): Treats operands as unsigned integers
IMUL (signed): Treats operands as signed integers

; Unsigned multiplication
mov rax, 6
mul rbx             ; rax = rax * rbx, rdx:rax = 128-bit result

; Signed multiplication (more common)
mov rax, 6
imul rax, rbx       ; rax = rax * rbx (64-bit result)
imul rax, rbx, 10   ; rax = rbx * 10 (three-operand form)

Division

Division is more complex and requires setup:

; Unsigned division: divide rdx:rax by rbx
mov rax, 100        ; dividend (low part)
xor rdx, rdx        ; clear high part (important!)
div rbx             ; rax = quotient, rdx = remainder

; Signed division: divide rdx:rax by rbx  
mov rax, -100       ; dividend
cqo                 ; sign-extend rax into rdx (important!)
idiv rbx            ; rax = quotient, rdx = remainder

Logical Instructions

Bitwise Operations

; Boolean logic operations
and rax, rbx        ; rax = rax & rbx (bitwise AND)
or rcx, rdx         ; rcx = rcx | rdx (bitwise OR)  
xor rsi, rdi        ; rsi = rsi ^ rdi (bitwise XOR)
not r8              ; r8 = ~r8 (bitwise NOT)

Bit Shifting

; Shift operations  
shl rax, 1          ; shift left by 1 (multiply by 2)
shr rbx, 2          ; logical shift right by 2 (unsigned divide by 4)
sar rcx, 3          ; arithmetic shift right by 3 (signed divide by 8)
rol rdx, 4          ; rotate left by 4 bits
ror rsi, 1          ; rotate right by 1 bit

Flag Effects

Arithmetic and logical operations affect CPU flags:

Flag	Meaning	Set When
ZF	Zero Flag	Result is zero
SF	Sign Flag	Result is negative (MSB set)
CF	Carry Flag	Unsigned overflow occurred
OF	Overflow Flag	Signed overflow occurred
PF	Parity Flag	Even number of 1-bits in result

Common Patterns and Idioms

; Clear a register (faster than mov rax, 0)
xor rax, rax        ; rax = 0

; Test if register is zero
test rax, rax       ; sets flags based on rax & rax
je zero_label       ; jump if rax was zero

; Multiply by power of 2 (faster than imul)
shl rax, 3          ; rax = rax * 8 (2^3)

; Check if number is even
test rax, 1         ; test lowest bit
jz even_label       ; jump if even (ZF set when bit 0 is clear)

Exercises & Practice Problems

Question: After xor rax, rax, what is the value of rax?

Answer: Zero (because XOR of a register with itself clears the register). This is a common idiom for zeroing registers.

Exercise: Demonstrate a signed division by -2 using idiv. Observe sign extension requirements in rdx.

Solution:

mov rax, 20         ; dividend  
mov rbx, -2         ; divisor
cqo                 ; sign-extend rax into rdx (critical!)
idiv rbx            ; rax = -10, rdx = 0

Recommended Resources

Intel Developer Manual, Vol. 2: Instruction Set Reference
Art of Assembly Language

Lesson 2.5: Understanding Directives and Pseudo-instructions

Learning Objectives

Identify common assembler directives (.data, .text, .bss).
Distinguish between real instructions and pseudo-instructions handled by the assembler.

Prerequisites

Basic familiarity with assembly source structure.
Comfort with reading assembly listings that include directives.

Key Concepts

Assembler Directives: .global, .section, .org, .align
Pseudo-instructions: High-level constructs that expand to multiple real instructions
Data Definition Directives: .db, .dw, .dd, .dq in NASM; .byte, .word, .long, .quad in GAS

Detailed Explanation

Section Directives

Directives provide information to the assembler, not the CPU. They organize code and data into sections:

Common Sections

.text: Contains executable code
.data: Contains initialized data
.bss: Contains uninitialized data (allocated but not stored in executable)
.rodata: Contains read-only data (constants)

; NASM syntax example
section .data
    msg db "Hello, World!", 0    ; null-terminated string
    count dd 42                  ; 32-bit integer
    array dq 1, 2, 3, 4, 5      ; array of 64-bit integers

section .bss  
    buffer resb 1024            ; reserve 1024 bytes
    temp_var resq 1             ; reserve 1 quadword (8 bytes)

section .text
    global _start
_start:
    ; executable code goes here

# GAS syntax example  
.section .data
    msg: .asciz "Hello, World!"  # null-terminated string
    count: .long 42             # 32-bit integer  
    array: .quad 1, 2, 3, 4, 5  # array of 64-bit integers

.section .bss
    .lcomm buffer, 1024         # reserve 1024 bytes
    .lcomm temp_var, 8          # reserve 8 bytes

.section .text
    .globl _start
_start:
    # executable code goes here

Symbol Management Directives

.global/.globl: Makes symbol visible to linker
.extern: Declares external symbol
.local: Makes symbol local to current file

; Symbol visibility
.global main        ; make 'main' visible to linker
.extern printf      ; declare external function
.local helper_func  ; local function, not exported

Data Definition Directives

NASM	GAS	Size	Description
`db`	`.byte`	1 byte	Define byte(s)
`dw`	`.word`	2 bytes	Define word(s)
`dd`	`.long`	4 bytes	Define doubleword(s)
`dq`	`.quad`	8 bytes	Define quadword(s)

Alignment Directives

Alignment ensures data is placed at memory addresses that are multiples of certain values:

; Alignment examples
.align 16           ; align next item to 16-byte boundary
data1: .quad 0x1234567890ABCDEF

.balign 4           ; align to 4-byte boundary  
data2: .long 42

Pseudo-instructions

Some assemblers provide pseudo-instructions that expand to multiple real instructions:

; NASM macro example
%macro PUSH_ALL 0
    push rax
    push rbx  
    push rcx
    push rdx
%endmacro

%macro POP_ALL 0
    pop rdx
    pop rcx
    pop rbx
    pop rax  
%endmacro

; Usage:
PUSH_ALL            ; expands to multiple push instructions
; ... do work ...
POP_ALL             ; expands to multiple pop instructions

Conditional Assembly

; Conditional compilation
%ifdef DEBUG
    call debug_print
%endif

%if PLATFORM = 64
    mov rax, rbx    ; 64-bit version
%else  
    mov eax, ebx    ; 32-bit version
%endif

Practical Example

; Complete NASM program structure
section .data
    prompt db "Enter a number: ", 0
    result_msg db "Result: %d", 10, 0
    
section .bss
    input_buffer resb 16
    
section .text
    global main
    extern printf, scanf

main:
    push rbp
    mov rbp, rsp
    
    ; Print prompt
    mov rdi, prompt
    call printf
    
    ; ... program logic ...
    
    leave
    ret

Exercises & Practice Problems

Exercise: Write a small assembly file that defines a data string in .data and prints it in _start.

Solution: See the practical example above, modified for system calls instead of C library calls.

Question: What's the difference between .bss and .data sections?

Answer: .bss is for uninitialized data (allocated at runtime but not stored in the executable file), while .data is for initialized data (stored in the executable file). This saves file space for large uninitialized arrays.