Why do we use heap to store memory?

2018-06-29 13:22:55

If this sounds like a kindergarden question then forgive me ;) But, in C++ a heap is used for memory allocation since... ever (at least the 80's). Is it the best algorithm for the job, or did we just get stuck with it (as happened with javascript...)? Do all (non embedded) OS's use a heap to store memory?

Edit: So, what structures/algorithms ARE used. And how is it (if) related to the heap algorithm?

No need to compare with "stack" allocation (it's all over the web), or discuss the C++ semantics - A tl;dr of what a memory heap is today?

It sounds like you are confusing a heap with the "heap". The heap data structure is rarely, if ever, used for dynamic memory allocation.

Now, re: why dynamic memory allocation? Sometimes you don't know how much memory you will need, and don't want to just allocate a massive buffer just in case. Allocating on the heap allows the amount of storage space you have to change at runtime.

In the context of memory allocation, heap is not a data structure or an algorithm. It is more consistent with the english-language definition which is essentially an unstructured group of things in some place.

Historically, early computer systems had a very small amount of memory, and early operating systems only managed a small amount. Amounts like 64K (that's kilobytes, not megabytes or gigabytes) were actually a significant amount of memory in the early days, and early operating systems were designed with ability to support no more than 640K to run programs. That is total RAM.

It was actually a bit of an innovation when some computer had two or more separate banks of physical memory chips. One of these was used for running programs in memory, and the others used to store data needed by the program in a manner that allowed it to be accessed more quickly than reading it from disk. These two areas of memory came to be referred to as stack and heap respectively. The heap needed to be accessed with the help of special device drivers. The amount of such heap memory available was usually (not always) much larger than the stack memory.

In practice, in early implementations of C and C++, static and automatic variables typically used the stack memory, and dynamically allocated memory ( malloc() , etc) used the heap. And the names have stuck even though the distinction is now largely academic (eg heap and stack limits are set as logical quotas, rather than reflecting physically available banks of memory).

The correct term for "heap", in modern C and C++, is therefore "dynamically allocated memory". Dynamic memory allocation (eg functions like malloc() ) does not necessarily use any area of memory that is called the "heap" (although, clearly, a host system must use some data structure to track memory allocation and deallocation).

In the case, the "heap" in question isn't the same as the data structure known as a heap.

Typically "heap" refers to the memory managed by malloc / realloc / free . These are typically used quite rarely (if at all) in reasonably written C++.

For dynamic allocation in C++, you more often use new and delete (at least indirectly, such as via std::allocator<T> being used by a container). Some people sometimes refer to this as the "heap" as well, but people trying to be more proper more often refer to it as the "free store" (that's the phrasing used in the standard).

Either way, however, there's rarely (if ever) an actual heap involved though. Neither specifies (or is intended to refer to) the data structure(s) used to manage the memory.

For what it's worth, the closest thing to an actual heap I've seen used for managing memory is with a "buddy system" allocator (but I've seen relatively few of these in actual use, even though Knuth goes into quite a bit of detail about them.

As to what structures are used, one extremely common structure is just a linked list of blocks. Each block will at least record its own size.

The common elementary algorithms are best-fit, worst-fit and first-fit.

Best fit means finding the smallest free block large enough to satisfy a request. To do this, you typically keep your list of free blocks sorted in ascending order by size, so the first block that's large enough is also the best fit.

Worst fit means always starting from the largest block. To do this, you typically keep the list of free blocks sorted in descending order by size. This way, either the first block in the list is the largest so you always either use it, or the allocation fails (or you need to do something like allocation more from the OS). In most cases you still need to do some list traversal though, because you split the largest block into two: the one you're allocating to the user, and the other of what's left, which you then re-insert into the list in order by its new size.

first fist means you walk through the list and use the first block that can satisfy the allocation.

In all cases, you typically have a policy for block splitting. That is, if an allocation asks for a smaller size than the block you select, you have a choice between leaving that block as it is, and just giving the user a little extra memory, or else splitting that block into two: one that's used to satisfy the allocation, and the other goes back into the free list. In most cases, you try to avoid creating minuscule blocks, so unless the "left over" part is greater than some particular size, you just leave the original block intact.

If they don't think much about things, most people's first instinct is to use best-fit. The problem with best fit is that when you split a block, it produces the smallest left-over piece, so you tend to end up with lots of tiny blocks that can't satisfy any allocations. If you do use it, you typically want to set a fairly high threshold where you'll just keep a block intact instead of splitting it.

Worst fit tries to counteract that. Although it's likely to split the largest block, it tends to leave the largest left over block, so it's the most likely to be usable.

There are also hybrid, such as exact-fit, worst fit. That is, instead of always using the largest block, you first look for a block that fits exactly (or close enough that you wouldn't split that block), and only if that fails, you split the biggest block.

If you're keeping the free blocks in some order, there's also the obvious modification of using a tree of some sort or other to store the blocks, so you can find a block in roughly logarithmic time instead of linear time.

链接地址: http://www.djcxy.com/p/82504.html

上一篇: 命令行参数在哪里？

下一篇: 为什么我们使用堆来存储内存？