ENOSUCHBLOG

Programming, philosophy, pedaling.


How x86_64 addresses memory

Jun 13, 2020     Tags: programming, x86    

This post is at least a year old.

Today I’m going to write up one small (and yet still remarkably complicated) fragment of x86_64’s instruction semantics: memory addressing.

Specifically, I’m going to write up the different ways in which x86_64 allows the user to address memory via just one instruction: mov.

I won’t attempt to cover other instructions that can touch memory (which is pretty much all of them, thanks CISC), ones that write massive chunks of memory (looking at you, fxsave), or any adjacent subjects (code models, position independent code, binary relocations). I also won’t even try to cover historical addressing modes or modes that work when an x86_64 processor isn’t in 64-bit mode (i.e., any modes other than long mode with 64-bit code).

Some constraints

Despite (or perhaps thanks to?) the legacy hell that is x86_64’s instruction encoding, there are some constraints on how memory is addressed.

First, the good news:

Now, the bad news:

“Scale-Index-Base-Displacement” addressing

I call this mode “Scale-Index-Base-Displacement” because I have no idea what else to call it.

As far as I can tell, neither Intel nor AMD actually considers this to be a singular mode; instead, they refer to it as a general collection of related modes with a wide variety of different encodings.

But we’re not talking about encodings today: we’re talking about semantics, and semantically each of these related modes falls back to some combination of four parameters:

Various combinations of the four (including all four) are valid. Here are the valid combinations, in roughly increasing order of complexity:

Let’s go through them one by one.

Displacement

This is arguably the simplest addressing mechanism in the x86 family: the displacement field is treated as an absolute memory address.

Unfortunately, it’s also almost completely useless on x86_64. Remember that note about displacements almost always being 32 bits? That means you can’t represent an absolute address, since an absolute x86_64 address is 64 bits (really 48, but whatever) and just won’t fit in the displacement.

There’s one exception to this: x86_64 allows for a 64-bit displacement with the a* registers.

In Intel syntax:

1
2
3
4
5
6
7
8
; store the qword at 0x00000000000000ff into rax
mov rax, [0xff]
; store the dword at 0x00000000000000ff into eax
mov eax, [0xff]
; store the word at 0x00000000000000ff into ax
mov ax, [0xff]
; store the byte at 0x00000000000000ff into al
mov al, [0xff]

gas (the GNU assembler) refers to these as movabs in both 32-bit and 64-bit modes.

Why would I (or my compiler) use this mode?

First of all, for code model reasons that aren’t relevant to this post. Eli Bendersky has a fantastic blog post on those.

More concretely: most programs have at least a few static addresses that are determined at compile-time, like global variables.

For example, this trivial program:

1
2
3
extern long var;

void f(long x) { var = x; }

…yields:

1
2
3
4
f:
        mov     rax, rdi
        movabs  QWORD PTR [var], rax
        ret

(View it on Godbolt.)

Note: The above example was originally misleading; many thanks to haberman on HN for pointing out the error and offering a correct example.

Base

Addressing via the base register adds one layer of indirection over absolute addressing: instead of an absolute address encoded into the instruction’s displacement field, an address is loaded from the specified general-purpose register (any GPR! Hooray!).

This indirection allows us to do absolute addressing with an arbitrary destination register via the following pattern:

1
2
3
4
5
; store the immediate (not displacement) into rbx
mov rbx, 0xacabacabacabacab

; store the qword at the address stored in rbx into rcx
mov rcx, [rbx]

…but we have relatively few reasons to do that, given the richer addressing modes we’re about to see.

Why would I (or my compiler) use this mode?

Because sometimes we have a calculated address already lying around from another operation, and we just want to use it.

The disassembly from the displacement sample above has a good example of this as well:

1
mov rax, qword ptr [rax]

Base + Index

This is just like addressing via the base register, except that we also add in the value of the index register.

For example:

1
2
3
; store the qword in rcx into the memory address computed
; as the sum of the values in rax and rbx
mov [rax + rbx], rcx

Why would I (or my compiler) use this mode?

I had a hard time contriving an example for this, which of course means that my coworkers immediately found one:

1
2
3
int foo(char * buf, int index) {
  return buf[index];
}

…which yields:

1
2
3
4
5
6
7
8
9
push    rbp
mov     rbp, rsp
mov     qword ptr [rbp - 8], rdi
mov     dword ptr [rbp - 12], esi
mov     rax, qword ptr [rbp - 8]  ; rax is buf
movsxd  rcx, dword ptr [rbp - 12] ; rcx is index
movsx   eax, byte ptr [rax + rcx] ; store buf[index] into eax
pop     rbp
ret

(View it on Godbolt.)

This is obvious in retrospect: Base + Index is perfect for modeling array accesses where neither the array’s starting address nor the offset into the array is fixed at compile-time.

Base + Displacement

More indirection! In case you haven’t guessed it, calculating the effective address with both the base register and the displacement field corresponds to two operations:

  1. We load the value stored in the base register
  2. Adding the loaded value to the value of the displacement field

Then, we take that sum and use it as our effective address. By way of example:

1
2
3
; add 0xcafe to the value stored in rax
; then, store the qword at the computed address into rbx
mov rbx, [rax + 0xcafe]

Why would I (or my compiler) use this mode?

As we’ve seen with Base + Index, some addressing modes naturally reflect C-like array semantics.

Base + Displacement can be thought of in a similar manner, but for structure semantics: the base register holds the address to the beginning of the structure, and the displacement field holds the fixed offset into that structure.

For example, the following:

1
2
3
4
5
6
7
8
struct foo {
    long a;
    long b;
};

long bar(struct foo *foobar) {
    return foobar->b;
}

assembles as:

1
2
3
4
5
6
7
push    rbp
mov     rbp, rsp
mov     qword ptr [rbp - 8], rdi
mov     rax, qword ptr [rbp - 8] ; rax is foobar
mov     rax, qword ptr [rax + 8] ; rax + 8 is foobar->b; store back into rax
pop     rbp
ret

(View it on Godbolt.)

This also makes sense if you think about the stack construction and layout at the beginning of every function as a custom structure: accesses like [rbp - N] are basically stack->objN.

Base + Index + Displacement

If the last mode makes sense to you, then this one is the logical next step: it’s semantically identical, except that we also add the value of the index register.

Just as above, but with one more register:

1
2
3
; add 0xcafe to the values stores in rax and rcx
; then, store the qword at the computer address into rbx
mov rbx, [rax + rcx + 0xcafe]

Why would I (or my compiler) use this mode?

Just as Base + Index naturally models an array access and Base + Displacement naturally models structure access, Base + Index + Displacement naturally models structure access within an array!

I had a hard time getting clang to emit one of these on Godbolt, but eventually got one with -O1:

1
2
3
4
5
6
7
8
9
struct foo {
    long a;
    long b;
};

long square(struct foo foos[], long i) {
    struct foo x = foos[i];
    return x.b;
}

assembles to the very terse:

1
2
3
shl     rsi, 4
mov     rax, qword ptr [rdi + rsi + 8] ; rdi is foos, rsi is i, 8 is the field offset
ret

(View it on Godbolt.)

Base + (Index * Scale)

Our first multiplication!

The scale field is like displacement in that it’s a constant factor that’s encoded into our instruction. Unlike displacement, however, scale is extremely constrained: it’s only two bits wide, meaning that it can only be 1 of 4 possible values: 1, 2, 4, or 8.

As the name implies, the scale field is used to scale (i.e., multiply) another field. In particular, it always scales the index register — scale cannot be used without index.

Why would I (or my compiler) use this mode?

Among many other things, Base + (Index * Scale) naturally models accesses into an array of pointers (distinct from an array of laid-out structures, like above!):

1
2
3
4
5
6
7
8
9
struct foo {
    long a;
    long b;
};

long bar(struct foo *foos[], long i) {
    struct foo *x = foos[i];
    return x->b;
}

assembles to:

1
2
3
mov     rax, qword ptr [rdi + 8*rsi] ; rdi is foos, rsi is i, 8 is the scale (pointer-sized!)
mov     rax, qword ptr [rax + 8]
ret

(View it on Godbolt.)

(Index * Scale) + Displacement

Let’s keep going. This is almost identical to the last mode, except that we’ve swapped the base register out for the displacement field. No particular complexity there.

Why would I (or my compiler) use this mode?

(Index * Scale) + Displacement naturally models a specialized case of array access: when the array is statically addressable (e.g., a global) and the element size is computable via the scale.

For example:

1
2
3
4
5
int tbl[10];

int foo(int i) {
    return tbl[i];
}

assembles to:

1
2
3
movsxd  rax, edi
mov     eax, dword ptr [4*rax + tbl] ; rax is i, 4 is the scale (sizeof(int) == 4)
ret

(View it on Godbolt.)

Base + (Index * Scale) + Displacement

Now we’re cooking with gas. This is the final and most complex x86_64 addressing form, but there’s absolutely nothing conceptually special about it: it’s just one more arithmetic operation on top of the three-parameter addressing modes.

Why would I (or my compiler) use this mode?

Base + (Index * Scale) + Displacement naturally models a two-dimensional array access:

1
2
3
4
5
long tbl[10][10];

long foo(long i, long j) {
    return tbl[i][j];
}

assembles to:

1
2
3
4
lea     rax, [rdi + 4*rdi]
shl     rax, 4
mov     rax, qword ptr [rax + 8*rsi + tbl]
ret

(View it on Godbolt.)

RIP-relative addressing

The addressing mode documented above is almost identical to its historical x86_32 equivalent — its biggest changes are allowing 64-bit GPRs and (sometimes) 64-bit displacements.

Where x86_64 really diverges is in its addition of a brand new addressing mode, best known as “RIP-relative” addressing.

Why is it called “RIP-relative”? Because it encodes a displacement relative to the RIP register’s value (specifically the RIP of the next instruction, not the current one). This is usually represented with the familiar [Base + Displacement] syntax, except that the base register is now rip instead of a GPR:

1
mov rax, [rip + 16]

Why would I (or my compiler) use this mode?

For reasons that I originally said that I wouldn’t go into in this blog post: position-independent code and code models.

We’ll make a brief exception: using RIP-relative addressing makes position-independent code smaller and simpler, and is a natural fit for the “small” (and default) code model, where all code and data needs to be addressable within a 32-bit offset.

For example, the following when compiled with -O1 and -fpic:

1
2
3
4
5
long tbl[10];

int foo(int i) {
    return tbl[i];
}

requires just two movs on x86_64:

1
2
3
4
foo:
        mov     rax, qword ptr [rip + tbl@GOTPCREL]
        mov     rax, qword ptr [rax + 8*rdi]
        ret

…but three and some additional boilerplate on x86_32:

1
2
3
4
5
6
7
8
9
10
foo:
        call    .L0$pb
.L0$pb:
        pop     eax
.Ltmp0:
        add     eax, offset _GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb)
        mov     ecx, dword ptr [esp + 4]
        mov     eax, dword ptr [eax + tbl@GOT]
        mov     eax, dword ptr [eax + 4*ecx]
        ret

One last catch: segmentation

x86_64 almost killed segmentation. Almost. Segment registers are no longer necessary thanks to the flat address space, but they still show up in a few places:

So, unfortunately, we still need to care about these. The good news is that caring about them isn’t too bad: they essentially boil down to adding the value in the segment register2 to the rest of the address calculation.

By way of example with a thread-local variable:

1
2
3
4
5
6
int __thread x = 0;

int foo(void) {
    int *y = &x;
    return *y;
}

assembles to:

1
2
3
4
5
6
7
8
9
push    rbp
mov     rbp, rsp
mov     rax, qword ptr fs:[0]    ; grab the base address of the thread-local storage area
lea     rax, [rax + x@TPOFF]     ; calculate the effective address of x within the TLS
mov     qword ptr [rbp - 8], rax ; store the address of x into y
mov     rax, qword ptr [rbp - 8]
mov     eax, dword ptr [rax]
pop     rbp
ret

(View it on Godbolt.)


  1. Our very first gotcha: this is true when using 64-bit registers for addressing, but not when using 32-bit registers. When addressing with 32-bit registers we can use any 32-bit GPR as an index except esp, thanks to an encoding quirk (the bit pattern that would indicate esp (0b100) is instead used to indicate…something). 

  2. Not actually, as pointed out by haberman: in 32-bit modes the segment register’s value corresponds to a GDT offset, while in 64-bit modes the value is unused and is replaced with the FS.base and GS.base MSRs. The SWAPGS page on on the OSDev Wiki has the details.