Actions
Bug #17050
closedsending signal 1(sighup) terminates the osd
Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
1) setup the cluster using ceph-deploy or ceph-ansible
2) send signal 1 to osd and the osd immediately terminates
[ubuntu@magna011 ~]$ ps -eaf | grep ceph ceph 8170 1 0 03:31 ? 00:01:51 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph ceph 9202 1 0 03:32 ? 00:01:50 /usr/bin/ceph-osd -f --cluster ceph --id 4 --setuser ceph --setgroup ceph ceph 10253 1 0 03:32 ? 00:01:46 /usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph root 14015 13970 0 21:01 pts/1 00:00:00 tail -f /var/log/ceph/ceph-osd.3.log ubuntu 14021 13869 0 21:02 pts/0 00:00:00 grep --color=auto ceph [ubuntu@magna011 ~]$ sudo kill 1 8170 [ubuntu@magna011 ~]$ ps -eaf | grep ceph ceph 9202 1 0 03:32 ? 00:01:51 /usr/bin/ceph-osd -f --cluster ceph --id 4 --setuser ceph --setgroup ceph ceph 10253 1 0 03:32 ? 00:01:46 /usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph root 14061 13970 0 21:06 pts/1 00:00:00 vi /var/log/ceph/ceph-osd.3.log ubuntu 14110 13869 0 21:10 pts/0 00:00:00 grep --color=auto ceph
2016-08-16 04:32:29.284659 7ff46b68d700 0 -- 10.8.128.11:6801/8170 >> 10.8.128.34:6809/142391 pipe(0x7ff4a160a000 sd=112 :6801 s=2 pgs=3 cs=1 l=0 c=0x7ff4a0b6db00).fault with nothing to send, going to standby 2016-08-16 04:32:30.063222 7ff46c9a0700 0 -- 10.8.128.11:6801/8170 >> 10.8.128.34:6801/141940 pipe(0x7ff4a156e800 sd=109 :6801 s=2 pgs=4 cs=1 l=0 c=0x7ff4a0b6e400).fault with nothing to send, going to standby 2016-08-16 21:02:56.456235 7ff46d7a7700 -1 osd.3 100 *** Got signal Terminated *** 2016-08-16 21:02:56.456250 7ff46d7a7700 0 osd.3 100 prepare_to_stop telling mon we are shutting down 2016-08-16 21:02:56.605331 7ff4810cf700 0 osd.3 100 got_stop_ack starting shutdown 2016-08-16 21:02:56.605343 7ff46d7a7700 0 osd.3 100 prepare_to_stop starting shutdown 2016-08-16 21:02:56.605352 7ff46d7a7700 -1 osd.3 100 shutdown 2016-08-16 21:02:56.605556 7ff46d7a7700 20 osd.3 100 kicking pg 1.7d 2016-08-16 21:02:56.605564 7ff46d7a7700 30 osd.3 pg_epoch: 100 pg[1.7d( v 87'38 (0'0,87'38] local-les=58 n=0 ec=57 les/c/f 58/58/0 57/57/57) [3] r=0 lpr=57 crt=82'35 lcod 82'36 mlcod 82'36 active+clean] lock 2016-08-16 21:02:56.605587 7ff46d7a7700 10 osd.3 pg_epoch: 100 pg[1.7d( v 87'38 (0'0,87'38] local-les=58 n=0 ec=57 les/c/f 58/58/0 57/57/57) [3] r=0 lpr=57 crt=82'35 lcod 82'36 mlcod 82'36 active+clean] on_shutdown 2016-08-16 21:02:56.605614 7ff46d7a7700 15 osd.3 pg_epoch: 100 pg[1.7d( v 87'38 (0'0,87'38] local-les=58 n=0 ec=57 les/c/f 58/58/0 57/57/57) [3] r=0 lpr=57 crt=82'35 lcod 82'36 mlcod 82'36 active+clean] publish_stats_to_osd 100:39 2016-08-16 21:02:56.605629 7ff46d7a7700 15 osd.3 pg_epoch: 100 pg[1.7d( v 87'38 (0'0,87'38] local-les=58 n=0 ec=57 les/c/f 58/58/0 57/57/57) [3] r=0 lpr=57 crt=82'35 lcod 82'36 mlcod 82'36 active+clean] requeue_ops 2016-08-16 21:02:56.605651 7ff46d7a7700 10 osd.3 pg_epoch: 100 pg[1.7d( v 87'38 (0'0,87'38] local-les=58 n=0 ec=57 les/c/f 58/58/0 57/57/57) [3] r=0 lpr=57 crt=82'35 lcod 82'36 mlcod 82'36 active+clean] cancel_copy_ops 2016-08-16 21:02:56.605667 7ff46d7a7700 10 osd.3 pg_epoch: 100 pg[1.7d( v 87'38 (0'0,87'38] local-les=58 n=0 ec=57 les/c/f 58/58/0 57/57/57) [3] r=0 lpr=57 crt=82'35 lcod 82'36 mlcod 82'36 active+clean] cancel_flush_ops 2016-08-16 21:02:56.605674 7ff46d7a7700 10 osd.3 pg_epoch: 100 pg[1.7d( v 87'38 (0'0,87'38] local-les=58 n=0 ec=57 les/c/f 58/58/0 57/57/57) [3] r=0 lpr=57 crt=82'35 lcod 82'36 mlcod 82'36 active+clean] cancel_proxy_ops 2016-08-16 21:02:56.605683 7ff46d7a7700 10 osd.3 pg_epoch: 100 pg[1.7d( v 87'38 (0'0,87'38] local-les=58 n=0 ec=57 les/c/f 58/58/0 57/57/57) [3] r=0 lpr=57 crt=82'35 lcod 82'36 mlcod 82'36 active+clean] on_change 2016-08-16 21:02:56.605756 7ff46d7a7700 10 osd.3 pg_epoch: 100 pg[1.7d( v 87'38 (0'0,87'38] local-les=58 n=0 ec=57 les/c/f 58/58/0 57/57/57) [3] r=0 lpr=57 crt=82'35 lcod 82'36 mlcod 82'36 active+clean] clear_primary_state
Updated by Kefu Chai over 7 years ago
- Status changed from New to Rejected
quote from KILL
OPTIONS
<pid> [...]
Send signal to every <pid> listed.
so,
[ubuntu@magna011 ~]$ sudo kill 1 8170
i am closing this issue, please feel free to reopen it if i am wrong.
killed "init" and osd.3 with SIGTERM.
instead, you might want to
sudo kill -1 8170
or
sudo kill -SIGHUP 8170
and they work as expected.
the log reads
2016-08-17 17:04:59.943012 7f71fd8e4700 -1 received signal: Hangup from PID: 3258 task name: -bash UID: 1000
Actions