x86 Assembly
Assembly Programming
- Move instructions, registers, and operands
- Memory addressing modes
- swap example: 32-bit vs 64-bit
- Arithmetic operations
- Condition codes
- Conditional and unconditional branches
- Loops
- Switch statements
Three Basic Kinds of Instructions
- Transfer data between memory and register
Loaddata from memory into register%reg = Mem[address]
Storeregister data into memoryMem[address] = %reg
Memory is indexed just like an array[]
- Perform arithmetic function on register or memory data
c = a + b;
- Transfer control
- Unconditional jumps to/ from procedures
- Conditional branches
Moving Data: IA32
- IA32 has 8 registers. 6 are general purpose and 2 special purpose(the last 2)
| Register | Size(32-bit) | Usage (mostly obsolete) |
|---|---|---|
| %eax | 32 | Accumulator for operands and results data |
| %ecx | 32 | Counter for string and loop operations |
| %edx | 32 | I/O pointer |
| %ebx | 32 | Pointer to data in the data segment |
| %esi | 32 | Source pointer for string operations |
| %edi | 32 | Destination pointer for string operations |
| %esp | 32 | Stack pointer |
| %ebp | 32 | Pointer to the base of the current stack frame |
- Moving Data
-
movx Source, Dest -
x is one of {b, w, l}
-
movl Source, Dest:- Move 4-byte “long word”
-
movw Source, Dest:- Move 2-byte “word”
-
movb Source, Dest:- Move 1-byte “byte”
-
- Lots of these in typical code
Moving Data: IA32 Continue…
- Moving Data
movl Source, Dest:
- Operand Types
- Immediate: Constant integer data
- Example:
$0x400, $-533 - Like C constant, but prefixed with
$ - Encoded with 1, 2, or 4 bytes
- Example:
- Register: One of 8 integer registers
- Example:
%eax,%edx - But
%espand%ebpreserced for special use - Others have special uses for particular instructions
- Example:
- Memory: 4 consecutive bytes of memory at address given by register
- Simplest example: (
%eax) - Various other “address modes”
- Simplest example: (
- Immediate: Constant integer data
movl Operand Combination
| Operand | Source | Destination | Src, Dest | C Analog |
|---|---|---|---|---|
mov1 |
Imm | Reg | movl $0x4, %eax |
var_a = 0x4; |
mov1 |
Imm | Mem | movl $-147,(%eax) |
*p_a = -147; |
mov1 |
Reg | Reg | movl %eax, %edx |
var_d = var_a; |
mov1 |
Reg | Mem | movl %eax, (%edx) |
*p_d = var_a; |
mov1 |
Mem | Reg | movl (%eax), %edx |
var_d = *p_a; |
Cannot do memory-memory transfer with a single instruction.
Memory Addressing Modes: Basic
- Indirect (R) Mem[Reg[R]]
- Register R specifies the momory address
- `movl (%ecx), %eax
- Register R specifies the momory address
- Displacement D(R) Mem[Reg[R]+D]
- Register R specifies a memory address
- (e.g. the start of some memory region)
- Constant displacement D specifies the offset from that address
movl 8 (%ebp), %edx
- Register R specifies a memory address
Using Basic Addressing Modes
void swap(int *xp, int *yp)
{
int t0 = *xp;
int t1 = *yp;
*xp = t1;
*yp = t0;
}
swap:
// Set Up
pushl %ebp
movl %esp, %ebp
pushl %ebx
// Body
movl 12(%ebp), %ecx
movl 8(%ebp), %edx
movl (%ecx), %eax
movl (%edx), %ebx
movl %eax, (%edx)
movl %ebx, (%ecx)
// Finish
movl -4(%ebp), %ebx
movl %ebp, %esp
popl %ebp
ret
Understanding Swap
void swap(int *xp, int *yp)
{
int t0 = *xp;
int t1 = *yp;
*xp = t1;
*yp = t0;
}
| Rgister | Value |
|---|---|
| %ecx | yp |
| %edx | xp |
| %eax | t1 |
| %ebx | t0 |

x86-64 Integer Registers
| 64-bit | 32-bit |
|---|---|
| %rax | %eax |
| %rbx | %ebx |
| %rcx | %ecx |
| %rdx | %edx |
| %rsi | %esi |
| %rdi | %edi |
| %rsp | %esp |
| %rbp | %ebp |
| %r8 | %r8d |
| %r9 | %r9d |
| %r10 | %r10d |
| %r11 | %r11d |
| %r12 | %r12d |
| %r13 | %r13d |
| %r14 | %r14d |
| %r15 | %r15d |
32-bit vs. 64-bit operands
-
Long word
l(4 Bytes) <-> Quad wordq(8 Bytes) -
New instruction forms:
movl -> movqaddl -> addqsall -> salq- etc
-
x86-64 can still use 32-bit instructions that generate 32-bit results
- Higher-order bits of destination register are just set to 0
- Example:
addl
Swap Ints in 64-bit Mode
package main
func swap(xp *int32, yp *int32) {
t0 := *xp
t1 := *yp
*xp = t1
*yp = t0
}
func main() {
var a, b int32 = 5, 10
swap(&a, &b)
}
void swap(int *xp, int *yp)
{
int t0 = *xp;
int t1 = *yp;
*xp = t1;
*yp = t0;
}
swap:
movl (%rdi), %edx
movl (%rsi), %eax
movl %eax, (%rdi)
movl %edx,(%rsi)
retq
- Arguments passed in registers
- First
(xp)in%rdi, second(yp)in%rsi - 64-bit pointers
- First
- No stack operations required
- 32-bit data
- Data held in registers
%eaxand%edx movloperation (the l refers to data width, not address width)
- Data held in registers
Swap Long Ints in 64-bit Mode
void swap_1
(long int *xp, long int *yp)
{
long int t0 = *xp;
long int t1 = *yp;
*xp = t1;
*yp = t0;
}
```asm
swap_1:
movq (%rdi), %rdx
movq (%rsi), %rax
movq %rax, (%rdi)
movq %rdx, (%rsi)
retq
- 64-bit data
- Data held in registers
%raxandrdx movqoperationqstands for quad-word
- Data held in registers
Complete Memory Addressing Modes
- The addresses used for accessing memory in
mov(and other) instructions can be computed in several different ways. - Most General Form:
D(Rb, Ri, S) Mem[Reg[Rb] + S*Reg[Ri] + D]- D: Constant “displacement” 1, 2, or 4 bytes
- Rb: Base register: Any of the 8/16 integer registers
- Ri: Index register: Any, except for
%espor%rsp- Unlikely you’d use
%ebpeither
- Unlikely you’d use
- S: Scale: 1, 2, 4, or 8
- Special Cases: can use any combination of D, Rb, Ri and S
(Rb, Ri) Mem[Reg[Rb] + Reg[Ri]]D(Rb, Ri) Mem[Reg[Rb] + Reg[Ri] + D](Rb, Ri, S) Mem[Reg[Rb]+S*Reg[Ri]]
Address Computetion Examples
| Register | Address |
|---|---|
| %edx | 0xf000 |
| %ecx | 0x100 |
(Rb, Ri) Mem[Reg[Rb] + Reg[Ri]]D(Ri, S) Mem[S*Reg[Ri] + D](Rb, Ri, S) Mem[Reg[Rb]+S*Reg[Ri]]D(Rb) Mem[Reg[Rb]+D]
| Expression | Address Computation | Address |
|---|---|---|
| 0x8(%edx) | 0xf000 + 0x8 | 0xf008 |
| (%edx, %ecx) | 0xf000 + 0x100 | 0xf100 |
| (%edx, %ecx, 4) | 0xf000 + 4*0x100 | 0xf400 |
| 0x80 (, %edx, 2) | 2*0xf000 + 0x80 | 0x1e080 |
Address Computation Instruction
- leal Src, Dest
- Src is address mode expression
- Set Dest to address computed by expression
- (lea stands for load effective address)
- Example: leal (%edx, %ecx,4) , %eax
%eax = %edx + 4 * ecx
- Uses
- Computing addresses without a memory reference
- E.g., translation of
p = &x[i];
- E.g., translation of
- Computing arithmetic expressions of the form
x + k*ik = 1, 2, 4, or 8
- Computing addresses without a memory reference
Some Arithmetic Operations
- Two Operand (Binary) Instructions:
| Format | Computation |
|---|---|
| addl Src, Dest | Dest = Dest + Src |
| subl Src, Dest | Dest = Dest - Src |
| imull Src,Dest | Dest = Dest*Src |
| sall Src, Dest | Dest = Dest«Src |
| sarl Src, Dest | Dest = Dest»Src |
| shrl Src, Dest | Dest = Dest » Src Logical |
| xorl Src, Dest | Dest = Dest ^ Src |
| andl Src, Dest | Dest = Dest & Src |
| orl Src, Dest | Dest = Dest |
-
Watch out for argumant order. (especially
subl) -
No distinction between signed and unsigned int
-
One Operand (Unary) Instructions
incl Dest Dest = Dest + 1
decl Dest Dest = Dest - 1
negl Dest Dest = -Dest
notl Dest Dest = ~Dest
- See Section 3.5.5 of text for more instructions: mull, cltd, idivl, divl
Using leal for Arithmetic Expressions
int arith
(int x, int y, int z)
{
int t1 = x + y;
int t2 = z + t1;
int t3 = x + 4;
int t4 = y * 48;
int t5 = t3 + t4;
int rval = t2 * t5;
return rval;
}
movl 8(%ebp), %eax # eax = x
movl 12(%ebp), %edx # edx = y
leal (%edx, %eax), %ecx # ecx = x + y (t1)
leal (%edx, %edx, 2), %edx # edx = y + 2*y = 3*y
sall $4, %edx # edx = 48*y (t4)
addl 16(%ebp), %ecx # ecx = z+t1 (t2)
leal 4(%edx,%eax), %eax # eax = 4+t4+x (t5)
imull %ecx, %eax $ eax = t5*t2 (rval)
- Note
- Instructions in different order from C code
- Some expressions require multiple instructions
- Some instructions cover multiple expressions
- Get exact same code when compile
- (x+y+x)(x+4+48y)
Example 2
int logical (int x, int y)
{
int t1 = x^y;
int t2 = t1 >> 17;
int mask = (1<<13) - 7;
int rval = t2 & mask;
return rval;
}
logical:
// Setup
push1 %ebp
movl *esp, %ebp
// Body
movl 8(%ebp), %eax
xorl 12(%ebp), %eax
sarl $17, %eax
andl $8185, %eax
Finish
movl %ebp, %esp
popl %ebp
ret
movl 8(%ebp), %eax eax = x
xorl 12(%ebp), %eax eax = x ^ y (t1)
sarl $17, %eax eax = t1>>17(t2)
andl $8185, %eax eax = t2 & 8185
Conditionals and Control Flow
- A condition branch is sufficient to implement most control flow constructions offered in hight level languages
- if (condition) then {…} else {…}
- while (condition) {…}
- do {…} while (condition)
- for (initialization; condition; iterative) {…}
- Unconditional branches implement some related control flow constructs
- break, continue
- In x86, we’ll refer to branches as *“jumps” (either conditional or unconditional)
Jumping
- jX Instructions
- Jump to different part of code depending on condition codes
| jX | Condition | Description |
|---|---|---|
| jmp | 1 | Unconditional |
| je | ZF == 1 | Equal (Zero) |
| jne | ~ZF | Not Equal (Not Zero) |
| js | SF | Negative |
| jns | ~SF | Nonnegative |
| jg | ~ZF && SF == OF | Greater (Signed) |
| jge | ~(SF ^ OF) | Greater or Equal (Signed) |
| jl | (SF ^ OF) | Less (Signed) |
| jle | ZF | (SF ^ OF) |
| ja | ~CF & ~ZF | Above (Unsigned) |
| jb | CF | Below (Unsigned) |
Processor State (IA32, Partial)
- Information about current executing program
- Temporary data (%eax, …)
- Location of runtime stack (%ebp, %esp)
- Location of current code control point (%eip)
- Status of recent tests (CF, ZF, SF, OF)
Condition Codes (Implicit Settings)
-
Single-bit registers
CF Carry Flag (for unsigned)
ZF Zero flag
SF Sign Flag (for signed)
OF Overflow Flag (for signed) -
Implicitly set (think of it as side effect) by arithmetic operations
- Example:
addl/addq Src,Dest <-> t = a+b - CF set if carry out from most significant bit (unsigned overflow)
- ZF set if t == 0
- SF set if t < 0 (as signed)
- OF set of two’s complement (signed) overflow
(a>0 && b>0 && t<0) || (a<0 && b<0 && t>=0)
- Example:
-
Not set by lea instuction (beware)
Condition Codes (Explicit Settings: Compare)
-
Single-bit registers
CF Carry Flag (for unsigned)
ZF Zero flag
SF Sign Flag (for signed)
OF Overflow Flag (for signed) -
Explicit Setting by Compare Instruction
cmpl/cmpq Src2,Src1- `cmpl b, a like computing a-b without setting destination
-
CF set if carry out from most significant bit (used for unsigned comparisons)
-
ZF set if a == b
-
SF set if (a-b) < 0 (as signed)
-
OF sef if two’s complement (signed) overflow
(a>0 && b<0 && (a-b)<0) || (a<0 && b>0 && (a-b) > 0)
Condition Codes (Explicit Settings: Test)
-
Single-bit registers
CF Carry Flag (for unsigned)
ZF Zero flag
SF Sign Flag (for signed)
OF Overflow Flag (for signed) -
Explicit Setting by Test instruction
testl /testq Src2,Src1- test1 b, a like computing a & b without setting destination
- Sets condition codes based on value of Src1 & Src2
- Useful to have one of the operands be a mask
- ZF set if a&b == 0
- SF set if a&b < 0
- testl %eax, %eax
- Sets SF and ZF, check if wax is +, 0, -
Reading Condition Codes
- SetX Instructions
- Set a single byte to 0 or 1 based on combinations of condition codes
| SetX_____ | Condition | Description |
|---|---|---|
| sete | ZF | Equal (Zero) |
| setne | ~ZF | Not Equal (Not Zero) |
| sets | SF | Negative |
| setns | ~SF | Nonnegative |
| setg | ~ZF && SF == OF | Greater (Signed) |
| setge | ~(SF ^ OF) | Greater or Equal (Signed) |
| setl | (SF ^ OF) | Less (Signed) |
| setle | ZF | (SF ^ OF) |
| seta | ~CF & ~ZF | Above (Unsigned) |
| setb | CF | Below (Unsigned) |
Reading Condition Codes (Cont.)
- SetX Instructions:
- Set single byte to 0 or 1 based on combination of condition codes
- One of 8 addressable byte registers
- Does not alter remaining 3 bytes
- Typically use
movzblto finish job
int gt (int x, int y)
{
return x > y;
}
Body: y at 12(%ebp), x at 8(%ebp)
movl 12(%ebp) , %eax # eax = y
cmpl %eax, 8(%ebp) # Compare x and y
setg %al # al = x > y
movzbl %al, %eax # Zero rest of %eax
Conditional Branches
- jX Instructions
- Jump to different part of code depending on condition codes
| jX | Condition | Description |
|---|---|---|
| jmp | 1 | Unconditional |
| je | ZF == 1 | Equal (Zero) |
| jne | ~ZF | Not Equal (Not Zero) |
| js | SF | Negative |
| jns | ~SF | Nonnegative |
| jg | ~ZF && SF == OF | Greater (Signed) |
| jge | ~(SF ^ OF) | Greater or Equal (Signed) |
| jl | (SF ^ OF) | Less (Signed) |
| jle | ZF | (SF ^ OF) |
| ja | ~CF & ~ZF | Above (Unsigned) |
| jb | CF | Below (Unsigned) |
Conditional Branch Example
int absdiff(int x, int y)
{
int result;
if (x > y) {
result = x - y;
} else {
result = y - x;
}
return result;
}
absdiff:
push1 %ebp
mov1 %esp, %ebp
mov1 8(%ebp), %edx
mov1 12(%ebp), %eax
cmp1 %eax, %edx
jle .L7
sub1 %eax, %edx
mov1 %eax, %edx
.L8:
leave
ret
.L7:
sub1 %edx, %eax
jmp .L8
int goto_ad(int x, int y)
{
int result;
if (x <= y) goto Else;
result = x - y;
Exit:
return result;
Else:
result = y - x;
goto Exit;
}
- C allows “goto” as means of transferring control
- Closer to machine-level programming style
- Generally considered bad coding style
Conditional Expression
val = Test ? Then-Expr : Else-Expr;
result = x>y ? x-y : y-x;
- The above is equivalent to…
if (Test)
val = Then-Expr;
else
val = Else-Expr;
- Goto Version
nt = !Test;
if (nt) goto Else;
val = Then-Exp;
Done:
...
Else:
val = Else-Expr;
goto Done;
- Test is expression returning integer
- = 0 interpreted as false
- != 0 interpreted as true
- Create separate code regions for then & else expressions
- Execute appropriate one
- How might you make this more efficient? using conditional move instruction
Conditionals: x86-64
cmovCsrc, dest- Move value from src to dest if condition C holds
- More efficient than conditional braching (simple control flow)
- But overhead: both branches are evaluated
int absdiff(int x, int y)
{
int result;
if (x > y) {
result = x - y;
} else {
result = y - x;
}
return result;
}
absdiff: # x in %edi, y in %esi
movl %edi, %eax # eax = x
movl %esi, %edx # edx = y
subl %esi, %eax # eax = x-y
subl %edi, %edx # edx = y-x
cmpl %esi, %edi # x : y
cmovle %edx, %eax # eax=edx if <=
ret
PC Relative Addressing
- PC relative branches are relocatable
- Absolute branches are not
Loops
while (sum != 0) {
<loop body>
}
loopTop: cmpl $0, %eax
je loopDone
<Loop body code>
jmp loopTop
loopDone:
- How to compile other loops should be straightforward
- The only slightly tricky part is to be sure where the conditional branch occurs: top or bottom of the loop
- How would for(i=0; i<100;i++) be implemented?
Do-While loop
- C
int fact_do(int x)
{
int result = 1;
do {
result *= x;
x = x - 1;
} whle (x > 1);
return result;
}
- Goto Version
int fact_goto(int x)
{
int result = 1;
loop:
result *= x;
x = x - 1;
if (x > 1) goto loop;
return result;
}
-
Use backward branch to continue looping
-
Only take branch when “while” condition holds
-
Do-while loop compilation
Registers:
%edx x
%eax result
fact_goto:
push1 %ebp # Setup
movl %esp, %ebp # Setup
movl $1, %eax # eax = 1
movl 8(%ebp), %edx # edx = x
.L11:
imull %edx,%eax # result *= x
decl %edx # x--
cmpl $1, %edx # compare x : 1
jg .L11 # if > goto loop
movl %ebp, %esp # Finish
popl %ebp # Finish
ret # Finish
- Do- While Translation
# c code
do
Body
while (Test);
# goto version
loop:
Body
if (Test)
goto loop
- Body
{
Statement1;
Statement2;
----
Statementn;
}
- Test returns integer
- = 0 interpreted as false
- != 0 interpreted as true
While Loops
# C code
int fact_while(int x)
{
int result = 1;
while (x > 1) {
result *= x;
x = x - 1;
};
return result;
}
# goto version
int fact_while_goto(int x)
{
int result = 1;
goto middle;
loop:
result *= x;
x = x - 1;
middle:
if (x>1)
goto loop;
return result;
}
- Used by GCC for both IA32 & x86-64
- First iteration jumps over body computation within loop straight to test
# x in %edx, result in %eax
jmp .L34 # goto Middle
.L35: # Loop:
imull %edx, %eax # result *= x
decl %edx # x--
.L34: # Middle:
cmpl $1, %edx # x:1
jg .L35 # if > goto loop
# While version
Init;
while (Test) {
Body
Update;
}
# Goto version
Init;
goto middle;
loop:
Body
Update;
middle:
if (Test)
goto loop;
done:
For loop
/* Compute x raised to nonnegative power p */
int ipwr_for(int x, unsigned int p)
{
int result;
for (result = 1; p != 0; p = p>>1) {
if (p & 0x1)
result *= x;
x *= x;
}
return result;
}
-
Algorithm:
-
Exploit bit representations
$$ p = p_0 + 2p_1 + 2^{6}p_2+…2^{n-1}p_{n-1} $$
- Gives:
$$ x^p = z_0 . {z_1}^2 . ({z_2}^2)^2 … (…(({z_{n-1}}^2)^2)…)^2 $$ $$ z_i = 1 when p_i = 0 $$ $$ z_i = x when p_i = 1 $$
-
Complexity O(logp)
-
General form(for loop)
for (Init; Test; Update)
Body
# Goto version
Init;
goto middle;
loop:
Body
Update;
middle:
if (Test)
goto loop;
done:
for (result = 1; p != 0; p = p>>1)
{
if (p & ox1)
result *= x;
x = x*x;
}
# goto version
result = 1;
goto middle;
loop:
if (p & 0x1)
result *= x;
x = x*x;
p = p>>1;
middle:
if (p != 0)
goto loop;
done:
Switch Statements
long switch_eg (unsigned long x, long y, long z)
{
long w = 1;
switch(x) {
case 1:
w = y*x;
break;
case 2:
w = y/x;
/* Fall through */
case 3:
w += z;
break;
case 5:
case 6:
w -= z;
break;
default:
w = 2;
}
return w;
}
- Muliple case labes
- 5, 6
- Fall through cases
- 2
- Missing cases
- 4
- Lots to manage, we need a jump table
Jump Table
- Jump Table translation
target = JTab[x];
goto *target;
.section .rodata
.align 4
.L62:
.long .L61 # x = 0
.long .L56 # x = 1
.long .L57 # x = 2
.long .L58 # x = 3
.long .L61 # x = 4
.long .L60 # x = 5
.long .L60 # x = 6
switch (x) {
case 1: // .L56
w = y*z;
break;
case 2: // .L57
w = y/x;
/* Fall through */
case 3: // .L58
w += z;
break;
case 5:
case 6: // .L60
w -= z;
break;
default: // .L61
w - 2;
}
long switch_eg (unsigned long x, long y, long z)
{
long w = 1;
switch(x) {
case 1:
w = y*x;
break;
case 2:
w = y/x;
/* Fall through */
case 3:
w += z;
break;
case 5:
case 6:
w -= z;
break;
default:
w = 2;
}
return w;
}
- Assembly example
switch_eg:
push1 %ebp # Setup
movl %esp, %ebp # Setup
pushl %ebx # Setup
movl $1, %ebx # w = 1
movl 8(%ebp), %edx # edx = x
movl 16(%ebp), %ecx # ecx = z
cmpl $6, 8(%edx) # x: 6
ja .L61 # if > goto default
jmp *.L62(,%edx,4) # goto JTab[x]
-
Label
.L61becomes address0x08048630 -
Label
.L62becomes address0x080488dc -
Assembly code
switch_eg:
...
ja .L61 # if > goto default
jmp *.L62(,%edx,4) # goto JTab[x]
- Disassembled Object code
08048610 <switch_eg>:
...
08048622: 77 0c ja 08048630
08048624: ff 24 95 dc 88 04 08 jmp *0x080488dc(,%edx,4)
- Jump Table
- Doesn’t show up in disassembled code
- Can inspect using GDB
gdb asm-cntl
(gdb) x/7xw 0x080488dc
- Examine 7 hexadecimal format “words” (4-bytes each)
- Use command “help x” to get format documentation
- Jump tables are used when you have few case values.