Why would fclose hang / deadlock? (Windows)
I have a directory change monitor process that reads updates from files within a set of directories. I have another process that performs small writes to a lot of files to those directories (test program). Figure about 100 directories with 10 files in each, and about 500 files being modified per second.
After running for a while, the directory monitor process hangs on a call to fclose()
in a method that is basically tailing the file. In this method, I fopen()
the file, check that the handle is valid, do a few seeks and reads, and then call fclose()
. These reads are all performed by the same thread in the process. After the hang, the thread never progresses.
I couldn't find any good information on why fclose()
might deadlock instead of returning some kind of error code. The documentation does mention _fclose_nolock()
, but it doesn't seem to be available to me (Visual Studio 2003).
The hang occurs for both debug and release builds. In a debug build, I can see that fclose()
calls _free_base()
, which hangs before returning. Some kind of call into kernel32.dll => ntdll.dll => KernelBase.dll => ntdll.dll is spinning. Here's the assembly from ntdll.dll that loops indefinitely:
77CEB83F cmp dword ptr [edi+4Ch],0
77CEB843 lea esi,[ebx-8]
77CEB846 je 77CEB85E
77CEB848 mov eax,dword ptr [edi+50h]
77CEB84B xor dword ptr [esi],eax
77CEB84D mov al,byte ptr [esi+2]
77CEB850 xor al,byte ptr [esi+1]
77CEB853 xor al,byte ptr [esi]
77CEB855 cmp byte ptr [esi+3],al
77CEB858 jne 77D19A0B
77CEB85E mov eax,200h
77CEB863 cmp word ptr [esi],ax
77CEB866 ja 77CEB815
77CEB868 cmp dword ptr [edi+4Ch],0
77CEB86C je 77CEB87E
77CEB86E mov al,byte ptr [esi+2]
77CEB871 xor al,byte ptr [esi+1]
77CEB874 xor al,byte ptr [esi]
77CEB876 mov byte ptr [esi+3],al
77CEB879 mov eax,dword ptr [edi+50h]
77CEB87C xor dword ptr [esi],eax
77CEB87E mov ebx,dword ptr [ebx+4]
77CEB881 lea eax,[edi+0C4h]
77CEB887 cmp ebx,eax
77CEB889 jne 77CEB83F
Any ideas what might be happening here?
I posted this as a comment, but I realize this could be an answer in its own right...
Based on the disassembly, my guess is you've overwritten some internal heap structure maintained by ntdll
, and it is looping forever iterating through a linked list.
In particular at the start of the loop, the current list node seems to be in ebx
. At the end of the loop, the expected last node (or terminator, if you like -- it looks a bit like these are circular lists and the last node is the same as the first, pointer to this node being at [edi+4Ch]
) is contained in eax
. Probably the result of cmp ebx, eax
is never equal, because there is some cycle in the list introduced by a heap corruption.
I don't think this has anything to do with locks, otherwise we would see some atomic instructions (eg. lock cmpxchg
, xchg
, etc.) or calls to other synchronization functions.
I had a same case with file close function. In my case, I solved by located the close function embedded other function body instead of having own function.
I was also suspicious on (1) the name of file being duplicated (2) Windows scheduling (file IO wasn't completed before next task treading being started. Windows scheduling and multi-threading is behind of the curtain, so it is hard to verify, but I have similar issue when I tried to save many data in ASCII in the loop. Saving on binary solved at this case.)
My environment, IDE: Visual Studio 2015, OS: Windows 7, language: C++
链接地址: http://www.djcxy.com/p/82316.html上一篇: 如何在armv5上调试堆损坏
下一篇: 为什么会死锁/死锁? (视窗)