Section 1: Fundamentals - The Assembly Blueprint

Section 1 Summary

This section introduces the basics of low-level programming, the role of the assembler, and essential CPU concepts. By the end, you will have the foundational vocabulary and context needed for reading assembly code.

Lesson 1.1: Introduction to Low-level Programming Concepts

Learning Objectives

Define what assembly language is in the context of low-level programming.
Distinguish between high-level and low-level code, and explain why assembly is necessary.

Prerequisites

Basic programming knowledge (e.g., in C/C++).
Understanding of what a compiler does.

Key Concepts

Low-level vs. High-level: Assembly is close to machine code and hardware.
Abstraction Layers: Higher-level languages vs. direct hardware instructions.
Assembly Language: A human-readable representation of machine instructions.

Detailed Explanation

Assembly language is a symbolic representation of the instructions executed directly by a CPU. Each instruction translates to a specific binary pattern (opcode). Reading assembly bridges the gap between what high-level code says and what the CPU does.

Consider this progression from high-level to low-level:

// High-level C code
int result = a + b;

// Compiler translates to assembly (simplified)
mov eax, [a]      ; Load value of 'a' into EAX register
add eax, [b]      ; Add value of 'b' to EAX
mov [result], eax ; Store result back to memory

// Assembly translates to machine code (hexadecimal)
8B 45 FC          ; mov eax, [ebp-4]
03 45 F8          ; add eax, [ebp-8]
89 45 F4          ; mov [ebp-12], eax

Why Learn Assembly?

Performance Analysis: Understanding what the CPU actually executes
Debugging: Low-level debugging often requires reading disassembly
Reverse Engineering: Analyzing software without source code
System Programming: Writing drivers, operating systems, embedded code
Security: Vulnerability research and exploit development

Exercises & Practice Problems

Question: What is the difference between assembly language and machine language?

Answer: Assembly language uses symbolic mnemonics (like mov, add) and human-readable register names. Machine language is the binary encoding (opcodes) that the CPU executes directly. Assembly is translated to machine language by an assembler.

Question: Why might a developer need to read assembly code in real-world scenarios?

Answer: For performance tuning (understanding compiler optimizations), debugging complex issues (especially when stepping through disassembly), reverse engineering (analyzing software without source), security research (finding vulnerabilities), and embedded/systems programming where direct hardware control is needed.

Recommended Resources

Intel® Developer Manuals - Volume 1: Basic Architecture
"Introduction to Assembly Language" – MIT OpenCourseWare
"Programming from the Ground Up" by Jonathan Bartlett

Lesson 1.2: Role of the Assembler in the Development Process

Learning Objectives

Understand the role of assemblers in converting assembly to machine code.
Distinguish between assemblers, compilers, and disassemblers.

The Development Toolchain

The assembler is a crucial part of the software development process:

Source Code (.c, .cpp, .rs, etc.)
           ↓ [Compiler]
Assembly Code (.s, .asm)
           ↓ [Assembler]
Object Code (.o, .obj)
           ↓ [Linker]
Executable (.exe, ELF, Mach-O)

What Assemblers Do

Translation: Convert mnemonics to opcodes
Symbol Resolution: Handle labels and symbolic addresses
Addressing: Calculate memory addresses and offsets
Object File Generation: Create relocatable object files

Common Assemblers

Assembler	Platform	Syntax	Use Case
NASM	Cross-platform	Intel	Learning, x86 development
GAS (as)	Unix/Linux	AT&T	GNU toolchain, GCC output
MASM	Windows	Intel	Microsoft development
YASM	Cross-platform	Intel	NASM-compatible

Syntax Differences: Intel vs AT&T

Two major syntax styles exist for x86 assembly:

Aspect	Intel Syntax	AT&T Syntax
Operand Order	`mov dest, src`	`mov src, dest`
Register Prefix	None: `eax`	Percent: `%eax`
Immediate Prefix	None: `5`	Dollar: `$5`
Memory Addressing	`[base+index*scale+disp]`	`disp(base,index,scale)`
Size Suffixes	Explicit: `DWORD PTR`	Mnemonic: `movl` (l=long)

Key Differences

Operand Order: Intel (dest, src) vs AT&T (src, dest)
Prefixes: AT&T uses % for registers, $ for immediates
Suffixes: AT&T uses b/w/l/q for byte/word/long/quad
Memory: Intel [base+offset] vs AT&T offset(base)

Exercises & Practice Problems

Question: Convert this Intel syntax to AT&T syntax: mov eax, [ebx+8]

Answer: movl 8(%ebx), %eax - Note the reversed operand order, register prefix %, and memory addressing format.

Question: What's the difference between an assembler and a disassembler?

Answer: An assembler converts assembly language to machine code, while a disassembler does the reverse - converts machine code back to assembly language. Disassemblers are used for reverse engineering and debugging.

Lesson 1.3: CPU Architecture - Registers, Memory, and Execution Model

Learning Objectives

Understand the x86-64 register set and their purposes.
Grasp the basic CPU execution model and memory hierarchy.

x86-64 Register Set

x86-64 provides 16 general-purpose registers, each 64 bits wide:

64-bit	32-bit	16-bit	8-bit	Purpose
RAX	EAX	AX	AL	Accumulator (arithmetic, return values)
RBX	EBX	BX	BL	Base (general purpose)
RCX	ECX	CX	CL	Counter (loops, string operations)
RDX	EDX	DX	DL	Data (arithmetic, I/O)
RSI	ESI	SI	SIL	Source Index (string operations)
RDI	EDI	DI	DIL	Destination Index
RSP	ESP	SP	SPL	Stack Pointer
RBP	EBP	BP	BPL	Base Pointer (frame pointer)
R8-R15	R8D-R15D	R8W-R15W	R8B-R15B	Additional general-purpose registers

Special Purpose Registers

RIP: Instruction Pointer (program counter)
RFLAGS: Status flags (zero, carry, sign, etc.)
Segment Registers: CS, DS, ES, FS, GS, SS

Memory Hierarchy

Understanding the memory hierarchy helps in reading assembly code:

CPU Registers     (fastest, smallest)
    ↓
L1 Cache         (very fast, ~32KB)
    ↓  
L2 Cache         (fast, ~256KB)
    ↓
L3 Cache         (fast, ~8MB)
    ↓
Main Memory      (slower, GBs)
    ↓
Storage          (slowest, TBs)

Basic CPU Execution Cycle

Fetch: Get instruction from memory at RIP address
Decode: Interpret the instruction opcode and operands
Execute: Perform the operation
Write-back: Store results to registers/memory
Update RIP: Point to next instruction

; Example execution trace
mov rax, 42          ; 1. Fetch this instruction
                     ; 2. Decode: move immediate 42 to RAX
                     ; 3. Execute: load value 42
                     ; 4. Write-back: RAX = 42
                     ; 5. RIP += instruction_length

add rax, 8           ; Next instruction...

Exercises & Practice Problems

Question: What's the difference between RAX, EAX, AX, and AL?

Answer: They refer to different portions of the same register: RAX (64-bit), EAX (lower 32 bits), AX (lower 16 bits), AL (lower 8 bits). When you write to EAX, the upper 32 bits of RAX are zeroed automatically.

Question: Why does x86-64 have both general-purpose and special-purpose registers?

Answer: General-purpose registers (RAX, RBX, etc.) can be used for various operations, while special-purpose registers (RSP, RIP, RFLAGS) have specific functions that the CPU hardware depends on for proper operation. This design balances flexibility with functionality.

Lesson 1.4: Basic Assembly Instructions

Learning Objectives

Classify different instruction types (data movement, arithmetic, etc.).
Apply fundamental assembly instructions in short code snippets.

Prerequisites

Understanding of registers and CPU architecture.
Familiarity with basic binary and hexadecimal notation.

Key Concepts

Data Movement: mov, push, pop
Arithmetic: add, sub, mul, div
Logical/Bitwise: and, or, xor, not
Comparison/Testing: cmp, test

Detailed Explanation

Assembly instructions can be categorized into several fundamental types:

Data Movement Instructions

The mov instruction copies data from one place to another
push and pop work with the stack
lea (Load Effective Address) calculates addresses

Arithmetic Instructions

Arithmetic instructions modify registers based on operation
add, sub perform basic arithmetic
mul, imul for multiplication (unsigned/signed)
div, idiv for division (unsigned/signed)

Logical Instructions

Logical instructions perform bitwise operations
and, or, xor for boolean logic
not for bitwise negation
shl, shr for bit shifting

Comparison Instructions

cmp sets flags (in rflags) for subsequent conditional jumps
test performs bitwise AND and sets flags without storing result

; Example instruction sequence
mov rax, 10         ; rax = 10
add rax, 5          ; rax = 15  
cmp rax, 20         ; compare rax with 20, sets condition flags
sub rax, 3          ; rax = 12
and rax, 0xF        ; rax = 12 & 15 = 12 (no change in this case)

Flag Effects

Many instructions affect the CPU flags register (RFLAGS):

ZF (Zero Flag): Set if result is zero
SF (Sign Flag): Set if result is negative
CF (Carry Flag): Set if unsigned overflow occurs
OF (Overflow Flag): Set if signed overflow occurs

Exercises & Practice Problems

Question: After executing add rax, rbx, which flags might be set?

Answer: Zero flag (ZF) if the result is 0, sign flag (SF) if the result is negative, overflow flag (OF) if there is signed overflow, carry flag (CF) if there is unsigned overflow.

Exercise: Write assembly that computes 3 * 5 + 2. Verify in a debugger that rax holds the correct result.

Solution:

mov rax, 3          ; rax = 3
mov rbx, 5          ; rbx = 5  
imul rax, rbx       ; rax = 3 * 5 = 15
add rax, 2          ; rax = 15 + 2 = 17

Question: What's the difference between mul and imul?

Answer: mul performs unsigned multiplication, while imul performs signed multiplication. The interpretation of the operands and result differs based on whether they're treated as signed or unsigned values.

Recommended Resources

Intel Developer Manual, Vol. 2: Instruction Set Reference
x86 Instruction Set Reference

Lesson 1.5: Instruction Format and Syntax

Learning Objectives

Describe how x86-64 instructions are structured (mnemonic + operands).
Interpret opcodes, mnemonics, and operands in various disassembly outputs.

Prerequisites

Knowledge of x86-64 instruction classification.
Familiarity with a disassembler (e.g., objdump, gdb).

Key Concepts

Opcode: The machine code that represents the instruction
Mnemonic: Human-readable name (e.g., mov, add)
Operands: Register, immediate, or memory references
AT&T vs Intel Syntax: Differences in operand order, prefix usage, etc.

Detailed Explanation

Instruction Structure

Every x86-64 instruction follows this general format:

[PREFIX] MNEMONIC [OPERAND1], [OPERAND2], [OPERAND3]

Syntax Differences: Intel vs AT&T

Two major syntax styles exist for x86 assembly:

Aspect	Intel Syntax	AT&T Syntax
Operand Order	`mov dest, src`	`mov src, dest`
Register Prefix	None: `eax`	Percent: `%eax`
Immediate Prefix	None: `5`	Dollar: `$5`
Memory Addressing	`[base+index*scale+disp]`	`disp(base,index,scale)`
Size Suffixes	Explicit: `DWORD PTR`	Mnemonic: `movl` (l=long)

Examples Side by Side

; Intel Syntax (NASM, MASM, disassemblers)
mov eax, ebx          ; destination, source
add eax, 5            ; register, immediate  
mov eax, [ebx+4]      ; memory addressing
mov DWORD PTR [eax], 42

# AT&T Syntax (GAS, GCC output)
movl %ebx, %eax       # source, destination
addl $5, %eax         # immediate, register
movl 4(%ebx), %eax    # memory addressing  
movl $42, (%eax)

Operand Types

Register Operands: CPU registers (rax, rbx, etc.)
Immediate Operands: Constant values embedded in instruction
Memory Operands: References to memory locations

Machine Code Representation

Assembly instructions are encoded as machine code (opcodes):

Assembly:     mov eax, 42
Machine Code: B8 2A 00 00 00
              │  └─────────── 32-bit immediate value (42)
              └─── Opcode for "mov eax, immediate"

Disassembly Output Examples

Different tools show assembly in various formats:

objdump (AT&T syntax):

$ objdump -d program
  400546: 48 89 e5    mov    %rsp,%rbp
  400549: c7 45 fc 2a movl   $0x2a,-0x4(%rbp)

gdb (can switch between syntaxes):

(gdb) set disassembly-flavor intel
(gdb) disas
   0x400546: mov    rbp,rsp
   0x400549: mov    DWORD PTR [rbp-0x4],0x2a

Exercises & Practice Problems

Question: In Intel syntax, is the destination typically on the left or right?

Answer: The destination is on the left (dest, src). This is the opposite of AT&T syntax where source comes first.

Exercise: Convert this Intel syntax to AT&T syntax: mov eax, [ebx+8]

Answer: movl 8(%ebx), %eax - Note the reversed operand order, register prefix %, and memory addressing format.

Question: What does the 'l' suffix mean in AT&T syntax (e.g., movl)?

Answer: The 'l' suffix indicates "long" or 32-bit operation. Other suffixes are 'b' (byte/8-bit), 'w' (word/16-bit), and 'q' (quad/64-bit).

Recommended Resources

GNU Assembler (AT&T syntax) documentation
NASM (Intel syntax) documentation
Intel Developer Manual, Vol. 2: Instruction Set Reference