How can I find out what is preventing a Linux task from being scheduled?

2018-06-30 10:07:47

I have an embedded Linux system with several single-threaded user processes. One of these periodically (very occasionally) fails to be scheduled, even though there is work waiting for it to do. How can I find out what is preventing the process (task/thread) from being scheduled?

I have used strace -p <pid> to trace the process's kernel calls when it hangs, and got this:

...
ioctl(13, 0x40104604, 0xffff6ecf08)     = 0
_newselect(13, [8 9 10 11 12], [], [], {0, 0}) = 0 (Timeout)
_newselect(13, [8 9 10 11 12], [], [], {0, 0}) = 0 (Timeout)
_newselect(13, [8 9 10 11 12], [], [], {0, 15000}) = 0 (Timeout)
_newselect(13, [8 9 10 11 12], [], [], {0, 19000}) = 1 (in [12], left {0, 705})
read(12, "3$GPZDA,072522.038,06,01,1980,,*"..., 1600) = 32
_newselect(13, [8 9 10 11 12], [], [], {0, 0}) = 0 (Timeout)
_newselect(13, [8 9 10 11 12], [], [], {0, 0}) = 0 (Timeout)
_newselect(13, [8 9 10 11 12], [], [], {0, 15000}

The last select() call ( _newselect() in the strace output) is not returning after the 15ms timeout. It appears that a context switch occurs in the select, after which the task does not run again for a very long time (tens of seconds). When the task eventually does resume, it runs as normal again.

I rebuilt the kernel with ftrace enabled, and enabled the sched_switch tracer, and got this output when the process resumes:

...
<idle>-0     [000] 10876.339906:      0:120:R   + [000]  1385:120:R ems
<idle>-0     [000] 10876.339915:      0:120:R ==> [000]  1385:120:R ems
   ems-1385  [000] 10876.340006:   1385:120:S ==> [000]     0:120:R <idle>
<idle>-0     [000] 10876.340300:      0:120:R ==> [000]  1379:100:R gps
   gps-1379  [000] 10876.340453:   1379:100:R   + [000]  1377:120:R dgs
...

The process of interest is gps (pid 1379) which is resuming here in the second-last line after a 37 second period of inactivity. (The duraction of the inactivity is known from debug printfs in the process itself.) Note that there is no '+' line to indicate that the task has just become ready - I'm assuming that that happened 37 seconds ago (of course the trace does not go that far back!). Instead, the task just starts running, with no indication of why it had been held up.

I've tried elevating the task's priority from another process after a few seconds when the hangup occurs using setpriority(PRIO_PROCESS, <pid>, -20) (that's why the priority appears as 100 in the above trace, instead of the default 120) but it made no difference, so I don't believe the issue is priority related.

What can I do now to find out what is causing the task suspension? I'm not that familiar with debugging in kernel space - are there any other tools in the ftrace suite that can be run on a single pid to see what it's doing? Any other kernel debug tools? I can recognise when the problem has occurred, but only after a few seconds has passed - so I can trigger or stop any data captures at that point, but tracing events that happened more than that time ago is tricky.

The kernel is version 2.6.33, if that helps. Upgrading to a later version isn't a practical proposition for various reasons.

Any advice or suggestions on how to debug this further are very welcome!

链接地址: http://www.djcxy.com/p/84882.html

上一篇: 用户空间IO操作期间的线程延迟

下一篇: 我怎样才能找出什么阻止了Linux任务的安排？