How do I track down a Heisenbug in some Python code?

Quick background: we have a large source base written in Python. It is a compiler for a domain specific language, and internally everything is represented as directed graphs. These digraphs are built up from sets, and so we use the builtin set type in Python.

The problem is that we didn't originally realise that Python actively uses the lack of ordering guarantee in a set object to use a faster non-deterministic implementation. So when you iterate over the objects in a set (as we do frequently) the order they are returned in is weakly random. It doesn't change on every execution, but it does change frequently. From what I've seen debugging our code it seems as if a hash of the source code acts as a seed for the random number generator. So changing the code, even in a path that is not executed, causes the set iterator to change the order that elements are generated in.

Our basic debugging strategy is to dump prints into where we think the error is, refining them based on the output until we find the bug. Not very elegant but for most things it largely works. The basic problems is that adding/changing any print statement triggers different behaviour and a wildly different execution path. We can't afford to print/log everything as it slows down the compiler's execution time from about 20s (managable) to about an hour (not so managable).

How would you diagnose a problem that only occurs infrequently and disappears when you start to look for it?

Edit for clarification : Several answers suggest ways to fix the ordering of the sets. As torkildr says below "The problem here isn't that sets behave strangely, the problem is that your program behaves as if it doesn't". This is exactly the problem, but the solution is not to use deterministic sets. This would simply mask the behaviour. The problem is to find why our program behaves this way and fix that behaviour. The algorithms that we use should work on graphs represented as unordered sets. They don't. I need to find out where these bugs and why they occur.

Problem solved : It turns out that on the implementation of Python that I'm using (2.6 on OS-X) if the relationship between the __eq__ and __hash__ methods of the objects being stored in the set is not quite a valid ordering then the system exhibits the weakly random behaviour described. There must be some code in the C implementation of set.add() that uses something from the random module to build the representation. This causes a dependency on the system entropy pool which changes the ordering on disk writes.

No direct answers, but reading kriss' follow-up question caused the insight to solve this problem so he gets the vote.


Why not just change nothing in the code that affects set output ordering and use pdb instead of adding prints ? Does setting a breakpoint also change set ordering ? If not, pdb will allow you to inspect internal variables.

Your description of your problem also lead to some mysteries. How do you detect there is a bug ? If this detection can be done at run time a possible strategy would be to run pdb from your code (as simple as import pdb; pdb.set_trace() ) as soon as you see it (using run time ifs), not in a subsequent execution after changing the code. This way you wouldn't have to change the code at all to debug (but maybe to remove those assertions later when the code will be debugged and rock strong).

By the way you should also unit test all your code when writing it, then it'll be much less likely to hide subtile bugs.


I don't really see why you would want to look at the problem further when you obviously have an idea what the root of the problem is.

The main issue her is: You need deterministic, ordered graphs; Python sets doesn't give you this. Why wouldn't you change implementation to some list/set implementation that does?

When you say it changes whenever you change the code, this doesn't sound at all surprising, when you consider that the set algorithm probably is optimized to use the fastest possible storage pattern it see fit. In other words, you do some other operation, store a variable, print a line, and the memory changes.

edit: To sum it up; The problem here isn't that sets behave strangely, the problem is that your program behaves as if it doesn't.


Raymond Hettinger has written a recipe for ordered sets. Its Big-O running times for all methods are the same as for sets. You might want to try changing all your sets over to ordersets.

链接地址: http://www.djcxy.com/p/48088.html

上一篇: 有没有办法自行下载Xcode?

下一篇: 如何在一些Python代码中追踪Heisenbug?