have I understood correctly?

2018-06-19 04:54:53

I am trying to dig a bit deeper into memory allocation, addressing and I also run into the concept of stack alignment, and in general, memory alignment. I would like to understand if I got correctly all the concepts. My questions are all referred to nowdays computers and processors such as the one we have on our laptops. I wish to underline that I read lots of other questions on stackoverflow, and most of my actual knowledge come from them.

My first doubts were related to the concept of memory word. The memory word define not only the registers and bus size, but also the basic memory unit (eg, 64 bit on 64bit arch, 32 bit on 32bit arch ecc). However, as far as I know, each address references exactly 1 byte of memory, regardless of the size of a memory word. So we might say that each byte has its own address. But:

1) Is it correct that the CPU is not able to access a single byte but it access the entire word in which that byte is contained? Thus, if a specific byte is request (for example, I access a the direct address of a char), it access the entire word and makes some calculations to remove the other portion and return the exact byte?

2) Is it then correct that the CPU can practically access ONLY memory word unit and that each memory word starts from a even address, multiple of the unit itself?

So, for example, on a 64bit architecture the memory word are 8 bytes thus the (example) address 0x2710 (10000 in base 10) would be the start of a memory word. If I try to access 0x2711 the CPU will access from 0x2710 to 0x2717 and then extract only the single byte. Correct?!

Second. As, I said before, I run into the memory alignment. At start It created me some confusion. Please, help me to understand if I got it right. The problem essentially is related to performance or, in some cases, to SSE specific instructions that need 16-bytes alignment. In the first case, for example, If (on a 64bit arch) a 8 bytes data (eg, a long int) is stored across 2 memory words, the CPU needs 2 access instead of one.

3) So, take the following example:

0x2710 | .... |
0x2711 | .... |
0x2712 | .... |
0x2713 | .... |
0x2714 | data |
0x2715 | data |
0x2716 | data |
0x2717 | data |
0x2718 | data |
0x2719 | data |
0x2720 | data |
0x2721 | data |
0x2722 | .... |
0x2723 | .... |
0x2724 | .... |
0x2725 | .... |

In this case memory is not aligned. Correct? With alignment, CPU would have stored data only from 0x2710 or, if occupied until 0x2713, it would have inserted padding and then stored the 8-bytes data from 0x2718. Right?

4) Thus, memory alignment essentially consists in storying multi-byte data only starting from address which are multiple of the desired bytes unit (usually, the memory word itself but also other custom units - eg, using mpreferred-stack-boundaries on GCC). I said "multi-byte" data because if the data is only one-byte it will always fit inside only a single word. Is all this correct?

5) Memory alignment is applied by the compiler? Thus, is it applied in the binary (assembly) code or it is applied in some way by the CPU when storing data? And, isn't it a big waste of memory? I mean, if it is always applied each multibyte data may imply also a padding! I can be an enormous waste of memory space!

That's all! Thanks, really thanks in advance!

Data alignment is a hardware architecture optimization, and is specific to the CPU. It can be faster and simpler to fetch 8 bytes from memory in a single fetch, then discard the extra data and shift the data around inside the CPU. It can also make the data bus simpler by ignoring the bottom 3 bits (0-7) of the address, saving three signal lines between the CPU/MMU and the memory bus. Fewer data bus lines means routing the signals on the PCB is easier, with less RF noise.

However, if the CPU uses 8-byte alignment and stores an 8-byte value at an address that is not aligned, accessing that value now requires 2 fetches -- resulting in poor execution performance. If the programmer / compiler is aware of the data alignment he/it can arrange the data to avoid a double-fetch. This may waste some memory to save CPU cycles, which is fine if memory is cheap and time is not. Or, if memory is not cheap the programmer can override the default data alignment using #pragma pack(1) which tells the compiler to ignore data alignment.

The stack is usually aligned to make it easier to push and pop using generic instructions. In this case, it is used to make life simpler at the cost of wasting a trivial amount of memory.

3) The CPU does not decide where to store the data, the programmer/compiler (sometimes the OS) makes that decision. The CPU is perfectly capable of reading any sized data from any address, but not necessarily in a single operation. Poorly aligned data will require more fetches and more time. Some CPUs will fault on misaligned operations (the Motorola 68000 and many low-cost microcontrollers), but most CPUs with MMUs will handle it internally.

4) Not quite. The important thing is that the multi-byte data not span an alignment boundary. In the case of 8-byte alignment with a 2-byte value, the value could be stored at addresses 0x1000, 0x1001, 0x1002, 0x1003, 0x1004, 0x1005, 0x1006 without requiring multiple fetches. Only storing it at 0x1007 would cause a problem as the CPU would need to fetch 0x1000[..0x1007] and 0x1008[..0x100F] to read the entire value.

5) Yes, some memory may be wasted but not much. There is no performance hit to reading 8 bytes when you only wanted 1. If your code has eight char values, the compiler will arrange them so they are all in the same 8-byte word. This results in no wasted space and no performance hit.

Every platform has a set of conventions called ABI, or Application Binary Interface. They are usually documented in a document provided by platform developer. These conventions cover a number of topics, and alignment rules are one of these. There may exist more than one platform on a given hardware architecture; an example is x64, where there are two major ABIs, the Microsoft ABI (used on Windows) and System V ABI (used on Linux).

The alignment rules are usually dictated by hardware. For example, some hardware architectures are plainly incapable of transferring misaligned data between the CPU core and memory. Some hardware architectures, while are capable of doing so, incur performance penalty for every such transfer.

To produce a program complying with the ABI of the target platform, compiler toolchains cooperate with the operating system. For example, the OS guarantees that an executable file section shall always be loaded at an address which satisfies the most stringent alignment requirement imposed by the ABI. When a linker generates a section containing aligned objects, it relies on that. AC compiler shall annotate sections in object files with their alignment requirements so that linker may use that information to lay things out accordingly when composing a single file out of multiple compilation units.

When it comes to stack, there may exist different strategies. On some platforms, it is always required that a function eats up stack in multiples of the most stringent alignment requirement. If the compiler may rely on that, it will lay out a function's stack frame accordingly.

However, on some platforms, the stack alignment requirement is not so stringent. For example, a SSE data type is aligned at 32 bytes but it is felt that it would be too lavish a requirement to eat up in multiple of 32 bytes for every function: the type is used relatively rarely. This means that, when compiling a function placing an __m256 on stack, a compiler may not generally rely on the stack being aligned enough at function start. The compiler will then insert a code in prolog to check if it is, and to additionally grow the stack if it isn't. Obviously, that's a trade-off: if you require a stricter alignment, your programs begin to waste too much stack space, if the requirement is too lax, the compilers will need to issue alignment code, which inflates the code and hampers performance.

链接地址: http://www.djcxy.com/p/54090.html

上一篇: 我在哪里可以找到任何编译器的对齐要求？

下一篇: 我理解正确吗？