Actions
Bug #4703
closedceph health hangs when upgrading from bobtail to next branch
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
while upgrading from bobtail to next [ceph version 0.60-451-g3888a12 ] all daemons at once [sudo service ceph -a restart after upgrade] works fine, it is strange that upgrading monitors first, then osds and mds causes ceph health command to hang and all i see in the monitor logs is,
2013-04-10 14:28:01.475513 7ff0c8332700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-04-10 14:28:01.475527 7ff0c8332700 0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47636 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-04-10 14:28:01.475572 7ff0c8332700 0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47636 s=1 pgs=0 cs=0 l=0).fault 2013-04-10 14:28:01.476053 7ff0c8332700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-04-10 14:28:01.476063 7ff0c8332700 0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47637 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-04-10 14:28:01.676928 7ff0c8332700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-04-10 14:28:01.676942 7ff0c8332700 0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47638 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-04-10 14:28:02.077732 7ff0c8332700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-04-10 14:28:02.077746 7ff0c8332700 0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47639 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-04-10 14:28:02.878605 7ff0c8332700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-04-10 14:28:02.878618 7ff0c8332700 0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47640 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-04-10 14:28:03.475230 7ff0c8332700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-04-10 14:28:03.475243 7ff0c8332700 0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47641 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-04-10 14:28:05.475256 7ff0c8332700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-04-10 14:28:05.475269 7ff0c8332700 0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47642 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-04-10 14:28:07.475500 7ff0c8332700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-04-10 14:28:07.475513 7ff0c8332700 0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47643 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-04-10 14:28:09.475658 7ff0c8332700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-04-10 14:28:09.475672 7ff0c8332700 0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47644 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-04-10 14:28:11.475794 7ff0c8332700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-04-10 14:28:11.475807 7ff0c8332700 0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47645 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-04-10 14:28:13.475955 7ff0c8332700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-04-10 14:28:13.475969 7ff0c8332700 0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47646 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-04-10 14:28:15.476074 7ff0c8332700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-04-10 14:28:15.476088 7ff0c8332700 0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47647 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-04-10 14:28:17.476239 7ff0c8332700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-04-10 14:28:17.476253 7ff0c8332700 0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47648 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-04-10 14:28:19.476374 7ff0c8332700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-04-10 14:28:19.476387 7ff0c8332700 0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47649 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-04-10 14:28:21.476527 7ff0c8332700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-04-10 14:28:21.476540 7ff0c8332700 0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47653 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-04-10 14:28:23.476658 7ff0c8332700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-04-10 14:28:23.476671 7ff0c8332700 0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47654 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-04-10 14:28:25.476815 7ff0c8332700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-04-10 14:28:25.476828 7ff0c8332700 0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47655 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-04-10 14:28:26.475037 7ff0bf7fe700 1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum 2013-04-10 14:28:26.475129 7ff0bf7fe700 1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum 2013-04-10 14:28:26.475180 7ff0bf7fe700 1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum 2013-04-10 14:28:26.475210 7ff0bf7fe700 1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum 2013-04-10 14:28:26.475255 7ff0bf7fe700 1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum 2013-04-10 14:28:36.475623 7ff0bf7fe700 1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum 2013-04-10 14:28:36.475656 7ff0bf7fe700 1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum 2013-04-10 14:28:36.475663 7ff0bf7fe700 1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum 2013-04-10 14:28:36.475670 7ff0bf7fe700 1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum 2013-04-10 14:28:36.475676 7ff0bf7fe700 1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum 2013-04-10 14:28:37.477708 7ff0c8332700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-04-10 14:28:37.477722 7ff0c8332700 0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47699 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-04-10 14:28:39.477872 7ff0c8332700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-04-10 14:28:39.477884 7ff0c8332700 0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47700 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-04-10 14:28:41.475841 7ff0bf7fe700 1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum 2013-04-10 14:28:41.475869 7ff0bf7fe700 1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum 2013-04-10 14:28:41.475876 7ff0bf7fe700 1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum 2013-04-10 14:28:41.475883 7ff0bf7fe700 1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum 2013-04-10 14:28:41.475892 7ff0bf7fe700 1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum 2013-04-10 14:28:41.478003 7ff0c8332700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-04-10 14:28:41.478017 7ff0c8332700 0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47705 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-04-10 14:28:43.478101 7ff0c8332700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption ubuntu@burnupi13:~$ sudo service ceph stop mon.a === mon.a === Stopping Ceph mon.a on burnupi13...kill 8059...done ubuntu@burnupi13:~$ sudo service ceph start mon.a === mon.a === Starting Ceph mon.a on burnupi13... Invalid argument: /var/lib/ceph/mon/ceph-a/store.db: does not exist (create_if_missing is false) starting mon.a rank 1 at 10.214.134.26:6789/0 mon_data /var/lib/ceph/mon/ceph-a fsid 4775b5a9-2dd6-4ca1-90a8-0928d3115b79 Starting ceph-create-keys on burnupi13... ubuntu@burnupi13:~$ sudo service ceph stop osd.1 === osd.1 === Stopping Ceph osd.1 on burnupi13...kill 8273...done ubuntu@burnupi13:~$ sudo service ceph start osd.1 === osd.1 === Starting Ceph osd.1 on burnupi13... starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal ubuntu@burnupi13:~$ sudo service ceph restart osd.2 === osd.2 === === osd.2 === Stopping Ceph osd.2 on burnupi13...kill 8394...done === osd.2 === Starting Ceph osd.2 on burnupi13... starting osd.2 at :/0 osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal ubuntu@burnupi13:~$ sudo service ceph restart mds.a === mds.a === === mds.a === Stopping Ceph mds.a on burnupi13...kill 8189...done === mds.a === Starting Ceph mds.a on burnupi13... starting mds.a at :/0
also, when restarting the first monitor, there is "Invalid argument: /var/lib/ceph/mon/ceph-a/store.db: does not exist (create_if_missing is false)" in the command output. do we really need to print this out?
Actions