Bug #46269
closedceph-fuse: ceph-fuse process is terminated by the logratote task and what is more serious is that one Uninterruptible Sleep process will be produced
0%
Description
1. reproduce the scene as shown below:
(1) step 1:
Open the terminal_1, and
Prepare the cmd: "killall -q -1 ceph-fuse" from /etc/logrotate.d/ceph-common
(2) step 2:
Open the terminal_2 and
run the cmd: "ceph-fuse -m monip:port mountpoint"
(3) step 3:
Switch to terminal_1 immediately, and
Keep running the cmd: "killall -q -1 ceph-fuse"
2. You will find the following anomalies:
The ceph-fuse process has abnormal exit, and there is an Uninterruptible Sleep mount process,just like the following:
root@***:~# ps -aux | grep mount
root 271493 0.0 0.0 26484 1252 ? D Jun28 0:00 mount -i -o remount mountpoint
The ceph-fuse abnormal exit logs as bellow:
7fe99f769680 0 pidfile_write: ignore empty --pid-file
7fe99f769680 -1 init, newargv = 0x5583ca2243a0 newargc=9
7fe9952d3700 1 client.63156060 handle_mds_map epoch 29934
7fe999adc700 -1 received signal: Hangup pid: 1062451 from PID: 1062486 task name: killall -q -1 ceph-fuse UID: 0
7fe9912cb700 1 client.63156060 using remount_cb
7fe990aca700 -1 fuse_ll: do_init: safe_write failed with error (32) Broken pipe
7fe990aca700 -1 fuse_ll: do_init: safe_write failed with error (32) Broken pipe
7fe990aca700 -1 *** Caught signal (Aborted) **
in thread 7fe990aca700 thread_name:ceph-fuse
1: (()+0x6ddf14) [0x5583c1086f14]
2: (()+0x110e0) [0x7fe99d7fb0e0]
3: (gsignal()+0xcf) [0x7fe99c5affff]
4: (abort()+0x16a) [0x7fe99c5b142a]
5: (()+0x1ed03b) [0x5583c0b9603b]
6: (()+0x1536c) [0x7fe99f0c736c]
7: (()+0x165a1) [0x7fe99f0c85a1]
8: (()+0x12d48) [0x7fe99f0c4d48]
9: (()+0x74a4) [0x7fe99d7f14a4]
10: (clone()+0x3f) [0x7fe99c665d0f]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
3. The reason
The SIGHUP signal handlers are not registered in parent ceph-fuse process.
more detailed explanation is as follows:
In the process of starting the ceph-fuse,if system just calls the logratote("killall -q -1 ceph-fuse ..") before the function of safe_read_exact in parent ceph-fuse process is complete,
it will cause the parent ceph-fuse process abnormal exit because of the parent process don't handle the SIGHUP signal,then it will lead to the child ceph-fuse process assert because of the function call of safe_write in do_init,then it will lead the system call of "mount -i -o remount" in remount_cb become Uninterruptible Sleep process.
4.solution
Register the SIGHUP signal handlers in parent ceph-fuse process.