jueves, 21 de diciembre de 2017

Processes in uninterruptable state (D) - What to do with them?

On the previous entry a I talked about the infamous zombie state and how they just can't be killed. Now is time to some even more scary than the Z state, [suspense sounds].... the D state [/suspense sounds].

As its name precisely suggests, processes in uninterruptible state (D in ps/top/etc) cannot be interrupted hence they can't be killed either!!! An unlike a process Z state, the former might be indeed occupying memory you can't get back!

How do you get into D state?


Most of the time people associates processes in D state with IO activity, and that's not wrong actually that's exactly what man ps says:

PROCESS STATE CODES
       Here are the different values that the s, stat and state output specifiers (header "STAT" or "S") will display to describe the state of a
       process:

               D    uninterruptible sleep (usually IO)
               R    running or runnable (on run queue)
               S    interruptible sleep (waiting for an event to complete)
               T    stopped, either by a job control signal or because it is being traced
               W    paging (not valid since the 2.6.xx kernel)
               X    dead (should never be seen)
               Z    defunct ("zombie") process, terminated but not reaped by its parent

However processes that go into D state due to IO usually come back to R once the IO operation that was holding them is completed (unless something goes nuts). Now, there are some scenarios that don't involve IO and can also lead to D state, for example when using serialization mechanisms like mutexes. When you try to acquire a mutex that has been locked already you have two options, either you go into interruptable sleep (S) or you go into uninterruptible sleep (D) this is something that will depend on what the code is supposed to do and whether you want to be able to receive signals and stuff while waiting for the mutex to be available. If you are curious about mutexes in the kernel have a look at this or a shorter and perhaps more human version here.

So how do you detect a process in D state?


That's rather simple! You can use the regular tools like ps and top and look for the State field/column. For example:

juan@test:~$ ps aux|grep test
root     20592  0.0  0.0   4220   656 pts/2    S+   21:36   0:00 ./test
root     20593  0.0  0.0   4216   640 pts/24   D+   21:36   0:00 ./test
juan     20982  0.0  0.0  15964  2268 pts/27   R+   23:26   0:00 grep --color=auto test
juan@test:~$

in this case, process 20593 is in D+ state, so Uninterruptable sleep and is a foreground process.

What to do with a process in D state? How to know what is actually locking the process:


This is where things get interesting. As I mentioned before D state is usually temporary and most of the times if you see a process in that state as soon as you run ps again the process will have changed to either R or S. BUT... if you believe the process is just refusing to leave D state, then what to do?

Lots of people answer this question with "I kill the process"... well that's exactly why is called UNinterruptible, it will absolutely ignore your most violent attempts to kill it. Any signals you send to the process will be queued and delivered eventually once the process leaves the D state, if it ever does of course.

Fortunately there are a few things that can help you debug this lovely situation and prevent it from happening in the future.

Have a look at the process stack:

This would be an initial approach, by having a look at the process stack you can get an idea of what code path took the process to that D condition. Luckily having a look at the stack isn't that hard now a days and you can do that by using /proc/:

juan@test:~$ sudo cat /proc/20593/stack
[<ffffffffc04ee045>] dev_open+0x15/0x40 [ebbcharmutex]
[<ffffffff811f0a7f>] chrdev_open+0x9f/0x1d0
[<ffffffff811e9527>] do_dentry_open+0x1f7/0x340
[<ffffffff811ead87>] vfs_open+0x57/0x60
[<ffffffff811faccc>] do_last+0x4ec/0x1190
[<ffffffff811fb9f0>] path_openat+0x80/0x600
[<ffffffff811fd10a>] do_filp_open+0x3a/0x90
[<ffffffff811eb109>] do_sys_open+0x129/0x280
[<ffffffff811eb27e>] SyS_open+0x1e/0x20
[<ffffffff817b788d>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
juan@test:~$

nice! So the process is locked in Kernel space and started by running an open syscall (SyS_Open+0x1e/0x20) on some sort of character device (chrdev_open+0x9f/0x1d0) and there's a module intervening here called [ebbcharmutex]. Since this is a stack trace we know with high certainty that the process is blocked particularly inside dev_open call, and that the code for the function belongs to module ebbcharmutex.
For more details about ebbcharmutex, I'd suggest to go to the source (really nice website around Linux and embedded systems) anyways here a piece of the code:
/** @brief The device open function that is called each time the device is opened
 *  This will only increment the numberOpens counter in this case.
 *  @param inodep A pointer to an inode object (defined in linux/fs.h)
 *  @param filep A pointer to a file object (defined in linux/fs.h)
 */
static int dev_open(struct inode *inodep, struct file *filep){

//   if(!mutex_trylock(&ebbchar_mutex)){                  // Try to acquire the mutex (returns 0 on fail)
//      printk(KERN_ALERT "EBBChar: Device in use by another process");
//      return -EBUSY;
//  }
   mutex_lock(&ebbchar_mutex);
   numberOpens++;
   printk(KERN_INFO "EBBChar: Device has been opened %d time(s)\n", numberOpens);
   return 0;
}

as you can see the very first line of the function calls mutex_lock() in its uninterrtible fashion hence this is where our process is stuck.

Now what? well... technically there's not much you can do to unlock the process, all you can try to do is identify who could be the one holding the lock and not releasing it (of course this would require some serious understanding of the problem and the code itself). In my simple case scenario you can see that I had two test processes one of them in S+ state (scroll up, I'm not going to paste it twice :P), if we check the opened files for that process we can see it opened /dev/ebbchar so he is the one holding the lock:
root@test:/home/juan# lsof -p 20592
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
test    20592 root  cwd    DIR    8,1     4096 466847 /home/juan/ModTest/exploringBB/extras/kernel/ebbcharmutex
test    20592 root  rtd    DIR    8,1     4096      2 /
test    20592 root  txt    REG    8,1     9102 466927 /home/juan/ModTest/exploringBB/extras/kernel/ebbcharmutex/test
test    20592 root  mem    REG    8,1  1857312 583879 /lib/x86_64-linux-gnu/libc-2.19.so
test    20592 root  mem    REG    8,1   149120 583895 /lib/x86_64-linux-gnu/ld-2.19.so
test    20592 root    0u   CHR  136,2      0t0      5 /dev/pts/2
test    20592 root    1u   CHR  136,2      0t0      5 /dev/pts/2
test    20592 root    2u   CHR  136,2      0t0      5 /dev/pts/2
test    20592 root    3u   CHR  250,0      0t0  69224 /dev/ebbchar
root@test:/home/juan#

If you can't really understand why the process isn't releasing the lock, since it is in S+ state, you can kill it and this should free the lock hence releasing the other stuck process:

root@test:/home/juan# kill -9 20592
root@test:/home/juan# ps aux|grep test
root     20593  0.0  0.0   4220   640 pts/24   S+   05:39   0:00 ./test
root     21460  0.0  0.0  15964  2140 pts/26   S+   12:06   0:00 grep --color=auto test
root@test:/home/juan#

perfect, now process 20593 is in S state!

The kernel doesn't like processes in D state either:


Can't blame it, right? Long time in D state is usually sign of some kind of issue, either high load or locking problems. Because of this, the kernel has 4 features to detect and act on processes in D state. Here you can see the default setup for Ubuntu:

root@test:/home/juan# cat /proc/sys/kernel/hung_task_timeout_secs
120
root@test:/home/juan# cat /proc/sys/kernel/hung_task_panic
0
root@test:/home/juan# cat /proc/sys/kernel/hung_task_check_count
4194304
root@test:/home/juan# cat /proc/sys/kernel/hung_task_warnings
0
root@test:/home/juan#

What do these values mean (for more precision read this):
  • hung_task_timeout_secs: if a task doesn't get scheduled in more than that number of seconds a warning will be reported.
  • hung_task_panic: to indicate the kernel whether it should panic or not when detecting a hung task. 0 for no panic, 1 for panic. This for example could be use to reboot the system.
  • hung_task_check_count: maximum number of tasks to be checked, don't really get this one...
  • hung_task_warnings: maximum number of warnings due to hung tasks to be reported, every time a task is detected as hung this number gets decreased, when reaching 0 no more reports will be issued.
 How does a warning report look like? Oh.. they are lovely:

Dec 20 19:57:25 test kernel: [78751.283729] INFO: task test:20593 blocked for more than 120 seconds.
Dec 20 19:57:25 test kernel: [78751.283737]       Tainted: G           OE  3.19.0-51-generic #58~14.04.1-Ubuntu
Dec 20 19:57:25 test kernel: [78751.283739] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 20 19:57:25 test kernel: [78751.283741] test            D ffff88003610bb68     0 20593  18722 0x00000000
Dec 20 19:57:25 test kernel: [78751.283748]  ffff88003610bb68 ffff88007ce9f5c0 0000000000013e80 ffff88003610bfd8
Dec 20 19:57:25 test kernel: [78751.283753]  0000000000013e80 ffff8800b771f5c0 ffff88007ce9f5c0 ffff88007ce9f5c0
Dec 20 19:57:25 test kernel: [78751.283756]  ffffffffc04f0100 ffffffffc04f0104 ffff88007ce9f5c0 00000000ffffffff
Dec 20 19:57:25 test kernel: [78751.283760] Call Trace:
Dec 20 19:57:25 test kernel: [78751.283773]  [<ffffffff817b3a29>] schedule_preempt_disabled+0x29/0x70
Dec 20 19:57:25 test kernel: [78751.283778]  [<ffffffff817b5745>] __mutex_lock_slowpath+0x95/0x100
Dec 20 19:57:25 test kernel: [78751.283781]  [<ffffffff817b57b5>] ? mutex_lock+0x5/0x37
Dec 20 19:57:25 test kernel: [78751.283785]  [<ffffffff817b57d3>] mutex_lock+0x23/0x37
Dec 20 19:57:25 test kernel: [78751.283790]  [<ffffffffc04ee045>] ? dev_open+0x15/0x40 [ebbcharmutex]
Dec 20 19:57:25 test kernel: [78751.283793]  [<ffffffffc04ee045>] dev_open+0x15/0x40 [ebbcharmutex]
Dec 20 19:57:25 test kernel: [78751.283798]  [<ffffffff811f0a7f>] chrdev_open+0x9f/0x1d0
Dec 20 19:57:25 test kernel: [78751.283802]  [<ffffffff811e9527>] do_dentry_open+0x1f7/0x340
Dec 20 19:57:25 test kernel: [78751.283805]  [<ffffffff811f09e0>] ? cdev_put+0x30/0x30
Dec 20 19:57:25 test kernel: [78751.283808]  [<ffffffff811ead35>] ? vfs_open+0x5/0x60
Dec 20 19:57:25 test kernel: [78751.283812]  [<ffffffff811ead87>] vfs_open+0x57/0x60
Dec 20 19:57:25 test kernel: [78751.283815]  [<ffffffff811faccc>] do_last+0x4ec/0x1190
Dec 20 19:57:25 test kernel: [78751.283820]  [<ffffffff811fb9f0>] path_openat+0x80/0x600
Dec 20 19:57:25 test kernel: [78751.283825]  [<ffffffff810a6ef5>] ? update_stats_wait_end+0x5/0xd0
Dec 20 19:57:25 test kernel: [78751.283828]  [<ffffffff817b38b5>] ? _cond_resched+0x5/0x40
Dec 20 19:57:25 test kernel: [78751.283832]  [<ffffffff811fd10a>] do_filp_open+0x3a/0x90
Dec 20 19:57:25 test kernel: [78751.283836]  [<ffffffff81209cd7>] ? __alloc_fd+0xa7/0x130
Dec 20 19:57:25 test kernel: [78751.283839]  [<ffffffff811fd0d5>] ? do_filp_open+0x5/0x90
Dec 20 19:57:25 test kernel: [78751.283843]  [<ffffffff811eb109>] do_sys_open+0x129/0x280
Dec 20 19:57:25 test kernel: [78751.283846]  [<ffffffff811eafe5>] ? do_sys_open+0x5/0x280
Dec 20 19:57:25 test kernel: [78751.283850]  [<ffffffff811eb27e>] SyS_open+0x1e/0x20
Dec 20 19:57:25 test kernel: [78751.283853]  [<ffffffff817b788d>] system_call_fastpath+0x16/0x1b

The report provides an even more detailed stack trace along with some other details. Now, if you have a look at the top of the stack trace becomes crystal clear the fact that the process is stuck trying to acquire a lock a mutex.

Summing this up:


Processes in D state are rare and could be an indication of system issues like intense IO, high load, or maybe locking problems. Because of being uninterruptible, they don't get to be scheduled hence signals can't be delivered so the process can't be killed. Fortunately there are a few things you can do in order to gather more information to have the chance to troubleshoot the problem.

No hay comentarios:

Publicar un comentario