Project

General

Profile

Bug #46269

ceph-fuse: ceph-fuse process is terminated by the logratote task and what is more serious is that one Uninterruptible Sleep process will be produced

Added by hongsong wu 13 days ago. Updated 13 days ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
ceph-fuse
Labels (FS):
Pull request ID:
Crash signature:

Description

1. reproduce the scene as shown below:

(1) step 1:
Open the terminal_1, and
Prepare the cmd: "killall -q -1 ceph-fuse" from /etc/logrotate.d/ceph-common

(2) step 2:
Open the terminal_2 and
run the cmd: "ceph-fuse -m monip:port mountpoint"

(3) step 3:
Switch to terminal_1 immediately, and
Keep running the cmd: "killall -q -1 ceph-fuse"

2. You will find the following anomalies:
The ceph-fuse process has abnormal exit, and there is an Uninterruptible Sleep mount process,just like the following:

root@***:~#  ps -aux  | grep mount
root      271493  0.0  0.0  26484  1252 ?        D    Jun28   0:00 mount -i -o remount mountpoint

The ceph-fuse abnormal exit logs as bellow:

7fe99f769680  0 pidfile_write: ignore empty --pid-file
7fe99f769680 -1 init, newargv = 0x5583ca2243a0 newargc=9
7fe9952d3700  1 client.63156060 handle_mds_map epoch 29934
7fe999adc700 -1 received  signal: Hangup pid: 1062451 from  PID: 1062486 task name: killall -q -1 ceph-fuse  UID: 0

7fe9912cb700  1 client.63156060 using remount_cb
7fe990aca700 -1 fuse_ll: do_init: safe_write failed with error (32) Broken pipe
7fe990aca700 -1 fuse_ll: do_init: safe_write failed with error (32) Broken pipe
7fe990aca700 -1 *** Caught signal (Aborted) **
  in thread 7fe990aca700 thread_name:ceph-fuse

 1: (()+0x6ddf14) [0x5583c1086f14]
 2: (()+0x110e0) [0x7fe99d7fb0e0]
 3: (gsignal()+0xcf) [0x7fe99c5affff]
 4: (abort()+0x16a) [0x7fe99c5b142a]
 5: (()+0x1ed03b) [0x5583c0b9603b]
 6: (()+0x1536c) [0x7fe99f0c736c]
 7: (()+0x165a1) [0x7fe99f0c85a1]
 8: (()+0x12d48) [0x7fe99f0c4d48]
 9: (()+0x74a4) [0x7fe99d7f14a4]
 10: (clone()+0x3f) [0x7fe99c665d0f]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

3. The reason
The SIGHUP signal handlers are not registered in parent ceph-fuse process.

more detailed explanation is as follows:
In the process of starting the ceph-fuse,if system just calls the logratote("killall -q -1 ceph-fuse ..") before the function of safe_read_exact in parent ceph-fuse process is complete,
it will cause the parent ceph-fuse process abnormal exit because of the parent process don't handle the SIGHUP signal,then it will lead to the child ceph-fuse process assert because of the function call of safe_write in do_init,then it will lead the system call of "mount -i -o remount" in remount_cb become Uninterruptible Sleep process.

4.solution

Register the SIGHUP signal handlers in parent ceph-fuse process.

History

#1 Updated by Xiubo Li 13 days ago

  • Assignee set to Xiubo Li

#2 Updated by Patrick Donnelly 13 days ago

  • Status changed from New to Fix Under Review
  • Assignee changed from Xiubo Li to hongsong wu
  • Pull request ID set to 35844
  • Affected Versions deleted (v16.0.0)

Also available in: Atom PDF