Project

General

Profile

Actions

Bug #17050

closed

sending signal 1(sighup) terminates the osd

Added by Vasu Kulkarni over 7 years ago. Updated over 7 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

1) setup the cluster using ceph-deploy or ceph-ansible
2) send signal 1 to osd and the osd immediately terminates

[ubuntu@magna011 ~]$ ps -eaf | grep ceph
ceph      8170     1  0 03:31 ?        00:01:51 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph
ceph      9202     1  0 03:32 ?        00:01:50 /usr/bin/ceph-osd -f --cluster ceph --id 4 --setuser ceph --setgroup ceph
ceph     10253     1  0 03:32 ?        00:01:46 /usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph
root     14015 13970  0 21:01 pts/1    00:00:00 tail -f /var/log/ceph/ceph-osd.3.log
ubuntu   14021 13869  0 21:02 pts/0    00:00:00 grep --color=auto ceph

[ubuntu@magna011 ~]$ sudo kill 1 8170

[ubuntu@magna011 ~]$ ps -eaf | grep ceph
ceph      9202     1  0 03:32 ?        00:01:51 /usr/bin/ceph-osd -f --cluster ceph --id 4 --setuser ceph --setgroup ceph
ceph     10253     1  0 03:32 ?        00:01:46 /usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph
root     14061 13970  0 21:06 pts/1    00:00:00 vi /var/log/ceph/ceph-osd.3.log
ubuntu   14110 13869  0 21:10 pts/0    00:00:00 grep --color=auto ceph

   2016-08-16 04:32:29.284659 7ff46b68d700  0 -- 10.8.128.11:6801/8170 >> 10.8.128.34:6809/142391 pipe(0x7ff4a160a000 sd=112 :6801 s=2 pgs=3 cs=1 l=0 c=0x7ff4a0b6db00).fault with nothing to send, going to standby
2016-08-16 04:32:30.063222 7ff46c9a0700  0 -- 10.8.128.11:6801/8170 >> 10.8.128.34:6801/141940 pipe(0x7ff4a156e800 sd=109 :6801 s=2 pgs=4 cs=1 l=0 c=0x7ff4a0b6e400).fault with nothing to send, going to standby
2016-08-16 21:02:56.456235 7ff46d7a7700 -1 osd.3 100 *** Got signal Terminated ***
2016-08-16 21:02:56.456250 7ff46d7a7700  0 osd.3 100 prepare_to_stop telling mon we are shutting down
2016-08-16 21:02:56.605331 7ff4810cf700  0 osd.3 100 got_stop_ack starting shutdown
2016-08-16 21:02:56.605343 7ff46d7a7700  0 osd.3 100 prepare_to_stop starting shutdown
2016-08-16 21:02:56.605352 7ff46d7a7700 -1 osd.3 100 shutdown
2016-08-16 21:02:56.605556 7ff46d7a7700 20 osd.3 100  kicking pg 1.7d
2016-08-16 21:02:56.605564 7ff46d7a7700 30 osd.3 pg_epoch: 100 pg[1.7d( v 87'38 (0'0,87'38] local-les=58 n=0 ec=57 les/c/f 58/58/0 57/57/57) [3] r=0 lpr=57 crt=82'35 lcod 82'36 mlcod 82'36 active+clean] lock
2016-08-16 21:02:56.605587 7ff46d7a7700 10 osd.3 pg_epoch: 100 pg[1.7d( v 87'38 (0'0,87'38] local-les=58 n=0 ec=57 les/c/f 58/58/0 57/57/57) [3] r=0 lpr=57 crt=82'35 lcod 82'36 mlcod 82'36 active+clean] on_shutdown
2016-08-16 21:02:56.605614 7ff46d7a7700 15 osd.3 pg_epoch: 100 pg[1.7d( v 87'38 (0'0,87'38] local-les=58 n=0 ec=57 les/c/f 58/58/0 57/57/57) [3] r=0 lpr=57 crt=82'35 lcod 82'36 mlcod 82'36 active+clean] publish_stats_to_osd 100:39
2016-08-16 21:02:56.605629 7ff46d7a7700 15 osd.3 pg_epoch: 100 pg[1.7d( v 87'38 (0'0,87'38] local-les=58 n=0 ec=57 les/c/f 58/58/0 57/57/57) [3] r=0 lpr=57 crt=82'35 lcod 82'36 mlcod 82'36 active+clean]  requeue_ops
2016-08-16 21:02:56.605651 7ff46d7a7700 10 osd.3 pg_epoch: 100 pg[1.7d( v 87'38 (0'0,87'38] local-les=58 n=0 ec=57 les/c/f 58/58/0 57/57/57) [3] r=0 lpr=57 crt=82'35 lcod 82'36 mlcod 82'36 active+clean] cancel_copy_ops
2016-08-16 21:02:56.605667 7ff46d7a7700 10 osd.3 pg_epoch: 100 pg[1.7d( v 87'38 (0'0,87'38] local-les=58 n=0 ec=57 les/c/f 58/58/0 57/57/57) [3] r=0 lpr=57 crt=82'35 lcod 82'36 mlcod 82'36 active+clean] cancel_flush_ops
2016-08-16 21:02:56.605674 7ff46d7a7700 10 osd.3 pg_epoch: 100 pg[1.7d( v 87'38 (0'0,87'38] local-les=58 n=0 ec=57 les/c/f 58/58/0 57/57/57) [3] r=0 lpr=57 crt=82'35 lcod 82'36 mlcod 82'36 active+clean] cancel_proxy_ops
2016-08-16 21:02:56.605683 7ff46d7a7700 10 osd.3 pg_epoch: 100 pg[1.7d( v 87'38 (0'0,87'38] local-les=58 n=0 ec=57 les/c/f 58/58/0 57/57/57) [3] r=0 lpr=57 crt=82'35 lcod 82'36 mlcod 82'36 active+clean] on_change
2016-08-16 21:02:56.605756 7ff46d7a7700 10 osd.3 pg_epoch: 100 pg[1.7d( v 87'38 (0'0,87'38] local-les=58 n=0 ec=57 les/c/f 58/58/0 57/57/57) [3] r=0 lpr=57 crt=82'35 lcod 82'36 mlcod 82'36 active+clean] clear_primary_state
Actions #1

Updated by Kefu Chai over 7 years ago

  • Status changed from New to Rejected

quote from KILL

OPTIONS
<pid> [...]
Send signal to every <pid> listed.

so,

[ubuntu@magna011 ~]$ sudo kill 1 8170

i am closing this issue, please feel free to reopen it if i am wrong.

killed "init" and osd.3 with SIGTERM.

instead, you might want to

sudo kill -1 8170

or
sudo kill -SIGHUP 8170

and they work as expected.

the log reads

2016-08-17 17:04:59.943012 7f71fd8e4700 -1 received  signal: Hangup from  PID: 3258 task name: -bash  UID: 1000

Actions

Also available in: Atom PDF