Userspace Thread Latency during IO operations
I'm working on a project using an embedded Linux kernel and I encounter a problem of thread latency when accessing a flash memory.
My application is multithreaded and some threads have to complete a given task in less than 500 ms. The problem is that these threads are sometimes "frozen" during more than 1 second and my 500 ms execution time is exceded.
This behaviour seems to be linked to flash writes since it occurs also when I execute a "dd" command from shell to write continuously in the flash memory.
I tried various configurations :
By using the ftrace tool I could see that, during the "freeze" time, some threads and processes are still running, with a lot of "idle" task time between the others tasks (idle task timeslot duration is > 20ms):
I don't understand:
I suspect something linked with the IO management in the kernel, as if the kernel preempted every non IO thread in order to do all the works relating to IO (network, files, ...).
Does anybody have an idea of what might cause this latency ?
My kernel settings:
Edit:
As I'm not an expert, I share with you ftrace capture (to be viewed with kernelshark): https://drive.google.com/file/d/0B6pJb20-D0D2NHZBUHJVRlV0aDg/view?usp=sharing
Maybe it could help you to see what is really happening on my system.
In this capture I reproduced, with an external "dd" command, a similar behavior I encountered with my application in nominal condition.
The "hole" ("freeze") is (no more custom ftrace marker from my application) at timestamps:
Another little "hole"
I think this can be because the kernel has decided it must flush some filesystem metadata, or do other filesystem housekeeping, and must stall your process until it has done enough.
I had similar problems and used multi-threading and a userland buffer to absorb the stalls. See my old question and answer here.
I update the status of this topic: We think we found the root cause of the lock.
My company hired a Linux expert for 2 days.
Thanks to him we found that:
The locks were caused by the fact that all flash access are blocked when a flush of data is done by kernel.
Especially our logger module (used for timestamping...) which calls the syslog() function.
But this syslog() function blocks also the process, even if syslogd daemon is the real process accessing to flash... (we suspect that unix sockets used for syslog communications block until resources are available, like a bash pipe '|' when writing a lot of logs into a file located in flash).
The solution was to split all the access to flash between real time threads and the other by doing log/flash access into an isolated thread (with a non blocking custom message Queue as communication item)
And it seems that it works !
I didn't read blueshift answer before but it seems he was right ;-)
链接地址: http://www.djcxy.com/p/84884.html上一篇: 我想睡一边拿着一个互斥体
下一篇: 用户空间IO操作期间的线程延迟