Project

General

Profile

Actions

Bug #4703

closed

ceph health hangs when upgrading from bobtail to next branch

Added by Tamilarasi muthamizhan about 11 years ago. Updated about 11 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

while upgrading from bobtail to next [ceph version 0.60-451-g3888a12 ] all daemons at once [sudo service ceph -a restart after upgrade] works fine, it is strange that upgrading monitors first, then osds and mds causes ceph health command to hang and all i see in the monitor logs is,

2013-04-10 14:28:01.475513 7ff0c8332700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-04-10 14:28:01.475527 7ff0c8332700  0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47636 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-04-10 14:28:01.475572 7ff0c8332700  0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47636 s=1 pgs=0 cs=0 l=0).fault
2013-04-10 14:28:01.476053 7ff0c8332700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-04-10 14:28:01.476063 7ff0c8332700  0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47637 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-04-10 14:28:01.676928 7ff0c8332700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-04-10 14:28:01.676942 7ff0c8332700  0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47638 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-04-10 14:28:02.077732 7ff0c8332700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-04-10 14:28:02.077746 7ff0c8332700  0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47639 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-04-10 14:28:02.878605 7ff0c8332700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-04-10 14:28:02.878618 7ff0c8332700  0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47640 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-04-10 14:28:03.475230 7ff0c8332700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-04-10 14:28:03.475243 7ff0c8332700  0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47641 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-04-10 14:28:05.475256 7ff0c8332700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-04-10 14:28:05.475269 7ff0c8332700  0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47642 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-04-10 14:28:07.475500 7ff0c8332700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-04-10 14:28:07.475513 7ff0c8332700  0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47643 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-04-10 14:28:09.475658 7ff0c8332700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-04-10 14:28:09.475672 7ff0c8332700  0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47644 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-04-10 14:28:11.475794 7ff0c8332700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-04-10 14:28:11.475807 7ff0c8332700  0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47645 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-04-10 14:28:13.475955 7ff0c8332700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-04-10 14:28:13.475969 7ff0c8332700  0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47646 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-04-10 14:28:15.476074 7ff0c8332700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-04-10 14:28:15.476088 7ff0c8332700  0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47647 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-04-10 14:28:17.476239 7ff0c8332700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-04-10 14:28:17.476253 7ff0c8332700  0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47648 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-04-10 14:28:19.476374 7ff0c8332700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-04-10 14:28:19.476387 7ff0c8332700  0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47649 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-04-10 14:28:21.476527 7ff0c8332700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-04-10 14:28:21.476540 7ff0c8332700  0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47653 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-04-10 14:28:23.476658 7ff0c8332700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-04-10 14:28:23.476671 7ff0c8332700  0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47654 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-04-10 14:28:25.476815 7ff0c8332700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-04-10 14:28:25.476828 7ff0c8332700  0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47655 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-04-10 14:28:26.475037 7ff0bf7fe700  1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum
2013-04-10 14:28:26.475129 7ff0bf7fe700  1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum
2013-04-10 14:28:26.475180 7ff0bf7fe700  1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum
2013-04-10 14:28:26.475210 7ff0bf7fe700  1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum
2013-04-10 14:28:26.475255 7ff0bf7fe700  1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum
2013-04-10 14:28:36.475623 7ff0bf7fe700  1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum
2013-04-10 14:28:36.475656 7ff0bf7fe700  1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum
2013-04-10 14:28:36.475663 7ff0bf7fe700  1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum
2013-04-10 14:28:36.475670 7ff0bf7fe700  1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum
2013-04-10 14:28:36.475676 7ff0bf7fe700  1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum
2013-04-10 14:28:37.477708 7ff0c8332700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-04-10 14:28:37.477722 7ff0c8332700  0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47699 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-04-10 14:28:39.477872 7ff0c8332700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-04-10 14:28:39.477884 7ff0c8332700  0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47700 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-04-10 14:28:41.475841 7ff0bf7fe700  1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum
2013-04-10 14:28:41.475869 7ff0bf7fe700  1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum
2013-04-10 14:28:41.475876 7ff0bf7fe700  1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum
2013-04-10 14:28:41.475883 7ff0bf7fe700  1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum
2013-04-10 14:28:41.475892 7ff0bf7fe700  1 mon.a@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum
2013-04-10 14:28:41.478003 7ff0c8332700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-04-10 14:28:41.478017 7ff0c8332700  0 -- 10.214.134.26:6789/0 >> 10.214.134.24:6789/0 pipe(0x3057410 sd=21 :47705 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-04-10 14:28:43.478101 7ff0c8332700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption

ubuntu@burnupi13:~$ sudo service ceph stop mon.a
=== mon.a === 
Stopping Ceph mon.a on burnupi13...kill 8059...done
ubuntu@burnupi13:~$ sudo service ceph start mon.a
=== mon.a === 
Starting Ceph mon.a on burnupi13...
Invalid argument: /var/lib/ceph/mon/ceph-a/store.db: does not exist (create_if_missing is false)
starting mon.a rank 1 at 10.214.134.26:6789/0 mon_data /var/lib/ceph/mon/ceph-a fsid 4775b5a9-2dd6-4ca1-90a8-0928d3115b79
Starting ceph-create-keys on burnupi13...
ubuntu@burnupi13:~$ sudo service ceph stop osd.1
=== osd.1 === 
Stopping Ceph osd.1 on burnupi13...kill 8273...done
ubuntu@burnupi13:~$ sudo service ceph start osd.1
=== osd.1 === 
Starting Ceph osd.1 on burnupi13...
starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal
ubuntu@burnupi13:~$ sudo service ceph restart osd.2
=== osd.2 === 
=== osd.2 === 
Stopping Ceph osd.2 on burnupi13...kill 8394...done
=== osd.2 === 
Starting Ceph osd.2 on burnupi13...
starting osd.2 at :/0 osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
ubuntu@burnupi13:~$ sudo service ceph restart mds.a
=== mds.a === 
=== mds.a === 
Stopping Ceph mds.a on burnupi13...kill 8189...done
=== mds.a === 
Starting Ceph mds.a on burnupi13...
starting mds.a at :/0

also, when restarting the first monitor, there is "Invalid argument: /var/lib/ceph/mon/ceph-a/store.db: does not exist (create_if_missing is false)" in the command output. do we really need to print this out?

Actions

Also available in: Atom PDF