What can explain heap corruption on a call to free()?
I have been debugging a crash for days now, that occurs in the depths of OpenSSL (discussion with the maintainers here). I took some time investigating so I'll try to make this question interesting and informative.
First and to give some context, my minimal-sample that reproduces the crash is as follow:
#include <openssl/crypto.h>
#include <openssl/ec.h>
#include <openssl/objects.h>
#include <openssl/pem.h>
#include <openssl/err.h>
#include <openssl/engine.h>
int main()
{
ERR_load_crypto_strings(); OpenSSL_add_all_algorithms();
ENGINE_load_builtin_engines();
EC_GROUP* group = EC_GROUP_new_by_curve_name(NID_sect571k1);
EC_GROUP_set_point_conversion_form(group, POINT_CONVERSION_UNCOMPRESSED);
EC_KEY* eckey = EC_KEY_new();
EC_KEY_set_group(eckey, group);
EC_KEY_generate_key(eckey);
BIO* out = BIO_new(BIO_s_file());
BIO_set_fp(out, stdout, BIO_NOCLOSE);
PEM_write_bio_ECPrivateKey(out, eckey, NULL, NULL, 0, NULL, NULL); // <= CRASH.
}
Basically, this code generates an Elliptic Curve key and tries to output it to stdout
. Similar code can be found in openssl.exe ecparam
and on Wikis online. It works fine on Linux (valgrind reports no error at all). It only crashes on Windows (Visual Studio 2013 - x64). I made sure the proper runtimes were linked-to ( /MD
in my case, for all dependencies).
Fearing no evil, I recompiled OpenSSL in x64-debug (this time linking everything in /MDd
), and stepped through the code to find the offending set of instructions. My search led me to this code (in OpenSSL's tasn_fre.c
file):
static void asn1_item_combine_free(ASN1_VALUE **pval, const ASN1_ITEM *it, int combine)
{
// ... some code, not really relevant.
tt = it->templates + it->tcount - 1;
for (i = 0; i < it->tcount; tt--, i++) {
ASN1_VALUE **pseqval;
seqtt = asn1_do_adb(pval, tt, 0);
if (!seqtt) continue;
pseqval = asn1_get_field_ptr(pval, seqtt);
ASN1_template_free(pseqval, seqtt);
}
if (asn1_cb)
asn1_cb(ASN1_OP_FREE_POST, pval, it, NULL);
if (!combine) {
OPENSSL_free(*pval); // <= CRASH OCCURS ON free()
*pval = NULL;
}
// Some more code...
}
For those not too familiar with OpenSSL and it's ASN.1 routines, basically what this for
-loop does is that it goes trough all the elements of a sequence (starting with the last element) and "deletes" them (more on that later).
Right before the crash happens, a sequence of 3 elements is deleted (at *pval
, which is 0x00000053379575E0
). Looking at the memory, one can see the following things happen:
The sequence is 12 bytes long, each element being 4-bytes long (in this case, 2
, 5
, and 10
). On each loop iteration, elements are "deleted" by OpenSSL (in this context, neither delete
or free
are called: they are just set to a specific value). Here is how the memory looks after one iteration:
The last element here was set to ff ff ff 7f
which I assume is OpenSSL's way of ensuring no key information leaks when the memory is unallocated later.
Right after the loop (and before the call to OPENSSL_free()
), the memory is as follow:
All elements were set to ff ff ff 7f
, asn1_cb
is NULL
so no call is made. The next thing that goes on is the call to OPENSSL_free(*pval)
.
This call to free()
on what seems to be a valid & allocated memory fails and cause the execution to abort with a message: "HEAP CORRUPTION DETECTED" .
Curious about this, I hooked into malloc
, realloc
and free
(as OpenSSL permits) to ensure this wasn't a double-free or a free on never-allocated memory. It turns out the memory at 0x00000053379575E0
really is a 12 bytes block that was indeed allocated (and never freed before).
I'm at loss figuring out what happens here: from my research, it seems that free()
fails on a pointer that was normally returned by malloc()
. In addition to that, this memory location was being written to a couple of instructions before without any problem which confirms the hypothesis that the memory be correctly allocated.
I know it's hard, if not impossible, to debug remotely without all the information but I have no idea what my next steps should be.
So my question is: how exactly is this "HEAP CORRUPTION" detected by Visual Studio's debugger ? What are all the possible causes for it when originating from a call to free()
?
In general the possibilities include:
malloc()
and friends put extra bookkeeping information in here, such as the size, and probably a sanity-check, which you will fail by overwriting. malloc()
-ed. free()
-d. I could finally find the problem and solve it.
Turned out some instruction was writing bytes past the allocated heap buffer (hence the 0x00000000
instead of the expected 0xfdfdfdfd
).
In debug mode this overwrite of the memory guards remains undetected until the memory is freed with free()
or reallocated with realloc()
. This is what caused the HEAP CORRUPTION message I faced.
I expect that in release mode, this could have had dramatic effects, like overwritting a valid memory block used somewhere else in the application.
For future reference to people facing similar issues, here is how I did:
OpenSSL provides a CRYPTO_set_mem_ex_functions()
function, defined like so:
int CRYPTO_set_mem_ex_functions(void *(*m) (size_t, const char *, int),
void *(*r) (void *, size_t, const char *,
int), void (*f) (void *))
This function allows you to hook in and replace memory allocation/freeing functions within OpenSSL. The nice thing is the addition of the const char *, int
parameters which are basically filled for you by OpenSSL and contain the filename and line number of the allocation.
Armed with this information, it was easy to find out the place where the memory block was allocated. I could then step through the code while looking at the memory inspector waiting for the memory block to be corrupted.
In my case what happenned was:
if (!combine) {
*pval = OPENSSL_malloc(it->size); // <== The allocation is here.
if (!*pval) goto memerr;
memset(*pval, 0, it->size);
asn1_do_lock(pval, 0, it);
asn1_enc_init(pval, it);
}
for (i = 0, tt = it->templates; i < it->tcount; tt++, i++) {
pseqval = asn1_get_field_ptr(pval, tt);
if (!ASN1_template_new(pseqval, tt))
goto memerr;
}
ASN1_template_new()
is called on the 3 sequence elements to initialize them.
Turns out ASN1_template_new()
calls in turn asn1_item_ex_combine_new()
which does this:
if (!combine)
*pval = NULL;
pval
being a ASN1_VALUE**
, this instruction sets 8 bytes on Windows x64 systems instead of the intended 4 bytes, leading to memory corruption for the last element of the list.
For the full discussion on how this problem was solved upstream, see this thread.
链接地址: http://www.djcxy.com/p/82328.html上一篇: 可以在动态分配的内存上调用memset会导致堆损坏
下一篇: 什么可以解释对free()的调用堆腐败?