How to access a data structure from a currently running Python process on Linux?

I have a long-running Python process that is generating more data than I planned for. My results are stored in a list that will be serialized (pickled) and written to disk when the program completes -- if it gets that far. But at this rate, it's more likely that the list will exhaust all 1+ GB free RAM and the process will crash, losing all my results in the process.

I plan to modify my script to write results to disk periodically, but I'd like to save the results of the currently-running process if possible. Is there some way I can grab an in-memory data structure from a running process and write it to disk?

I found code.interact(), but since I don't have this hook in my code already, it doesn't seem useful to me (Method to peek at a Python program running right now).

I'm running Python 2.5 on Fedora 8. Any thoughts?

Thanks a lot.

Shahin


There is not much you can do for a running program. The only thing I can think of is to attach the gdb debugger, stop the process and examine the memory. Alternatively make sure that your system is set up to save core dumps then kill the process with kill --sigsegv <pid> . You should then be able to open the core dump with gdb and examine it at your leisure.

There are some gdb macros that will let you examine python data structures and execute python code from within gdb, but for these to work you need to have compiled python with debug symbols enabled and I doubt that is your case. Creating a core dump first then recompiling python with symbols will NOT work, since all the addresses will have changed from the values in the dump.

Here are some links for introspecting python from gdb:

http://wiki.python.org/moin/DebuggingWithGdb

http://chrismiles.livejournal.com/20226.html

or google for 'python gdb'

NB to set linux to create coredumps use the ulimit command.

ulimit -a will show you what the current limits are set to.

ulimit -c unlimited will enable core dumps of any size.


While certainly not very pretty you could try to access data of your process through the proc filesystem.. /proc/[pid-of-your-process]. The proc filesystem stores a lot of per process information such as currently open file pointers, memory maps and what not. With a bit of digging you might be able to access the data you need though.

Still i suspect you should rather look at this from within python and do some runtime logging&debugging.


+1 Very interesting question.

I don't know how well this might work for you (especially since I don't know if you'll reuse the pickled list in the program), but I would suggest this: as you write to disk, print out the list to STDOUT. When you run your python script (I'm guessing also from command line), redirect the output to append to a file like so:

python myScript.py >> logFile. 

This should store all the lists in logFile. This way, you can always take a look at what's in logFile and you should have the most up to date data structures in there (depending on where you call print).

Hope this helps

链接地址: http://www.djcxy.com/p/48240.html

上一篇: Python:在一般情况下,a可以舍入到b

下一篇: 如何从Linux上当前运行的Python进程访问数据结构?