Stack allocation, padding, and alignment

I've been trying to gain a deeper understanding of how compilers generate machine code, and more specifically how GCC deals with the stack. In doing so I've been writing simple C programs, compiling them into assembly and trying my best to understand the outcome. Here's a simple program and the output it generates:

asmtest.c :

void main() {
    char buffer[5];
}

asmtest.s :

pushl   %ebp
movl    %esp, %ebp
subl    $24, %esp
leave
ret

What's puzzling to me is why 24 bytes are being allocated for the stack. I know that because of how the processor addresses memory, the stack has to be allocated in increments of 4, but if this were the case, we should only move the stack pointer by 8 bytes, not 24. For reference, a buffer of 17 bytes produces a stack pointer moved 40 bytes and no buffer at all moves the stack pointer 8. A buffer between 1 and 16 bytes inclusive moves ESP 24 bytes.

Now assuming the 8 bytes is a necessary constant (what is it needed for?), this means that we're allocating in chunks of 16 bytes. Why would the compiler be aligning in such a way? I'm using an x86_64 processor, but even a 64bit word should only require an 8 byte alignment. Why the discrepancy?

For reference I'm compiling this on a Mac running 10.5 with gcc 4.0.1 and no optimizations enabled.


It's a gcc feature controlled by -mpreferred-stack-boundary=n where the compiler tries to keep items on the stack aligned to 2^n . If you changed n to 2 , it would only allocate 8 bytes on the stack. The default value for n is 4 ie it will try to align to 16-byte boundaries.

Why there's the "default" 8 bytes and then 24=8+16 bytes is because the stack already contains 8 bytes for leave and ret , so the compiled code must adjust the stack first by 8 bytes to get it aligned to 2^4=16.


The SSEx family of instructions REQUIRES packed 128-bit vectors to be aligned to 16 bytes - otherwise you get a segfault trying to load/store them. Ie if you want to safely pass 16-byte vectors for use with SSE on the stack, the stack needs to be consistently kept aligned to 16. GCC accounts for that by default.


I found this site, which has some decent explanation at the bottom of the page about why the stack might be larger. Scale the concept up to a 64bit machine and it might explain what you are seeing.

链接地址: http://www.djcxy.com/p/14102.html

上一篇: 调试指令IP指向0时的指针

下一篇: 堆栈分配,填充和对齐