Project

General

Profile

Actions

Bug #4448

closed

ceph upgrade from bobtail to master fails

Added by Tamilarasi muthamizhan about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Immediate
Category:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

upgrading from bobtail to master branch worked fine, but when restarting ceph daemons, monitors crash.

2013-03-14T17:44:16.820 DEBUG:teuthology.orchestra.run:Running [10.214.131.10]: '/tmp/cephtest/tamil@ubuntu-2013-03-14_17-41-11/enable-coredump ceph-coverage /tmp/cephtest/tamil@ubuntu-2013-03-14_17-41-11/archive/coverage ceph mds set_max_mds 1'
2013-03-14T17:44:16.824 INFO:teuthology.task.ceph.mon.a.err:Invalid argument: /var/lib/ceph/mon/ceph-a/store.db: does not exist (create_if_missing is false)
2013-03-14T17:44:16.827 INFO:teuthology.task.ceph.mon.a.err:mon/Monitor.cc: In function 'bool Monitor::StoreConverter::needs_conversion()' thread 7faf33f22780 time 2013-03-14 17:44:17.097212
2013-03-14T17:44:16.827 INFO:teuthology.task.ceph.mon.a.err:mon/Monitor.cc: 4097: FAILED assert(0 "non-existent store in mon data directory")
2013-03-14T17:44:16.828 INFO:teuthology.task.ceph.mon.a.err: ceph version 0.58-637-g7370b55 (7370b5564606474f11b9ac5afb7cc60e0ac36ed1)
2013-03-14T17:44:16.828 INFO:teuthology.task.ceph.mon.a.err: 1: (Monitor::StoreConverter::needs_conversion()+0x544) [0x49edb4]
2013-03-14T17:44:16.828 INFO:teuthology.task.ceph.mon.a.err: 2: (main()+0x7a1) [0x486491]
2013-03-14T17:44:16.829 INFO:teuthology.task.ceph.mon.a.err: 3: (_libc_start_main()+0xed) [0x7faf31fdc76d]
2013-03-14T17:44:16.829 INFO:teuthology.task.ceph.mon.a.err: 4: ceph-mon() [0x48983d]
2013-03-14T17:44:16.829 INFO:teuthology.task.ceph.mon.a.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2013-03-14T17:44:16.829 INFO:teuthology.task.ceph.mon.a.err:2013-03-14 17:44:17.097995 7faf33f22780 -1 mon/Monitor.cc: In function 'bool Monitor::StoreConverter::needs_conversion()' thread 7faf33f22780 time 2013-03-14 17:44:17.097212
2013-03-14T17:44:16.830 INFO:teuthology.task.ceph.mon.a.err:mon/Monitor.cc: 4097: FAILED assert(0 "non-existent store in mon data directory")
2013-03-14T17:44:16.831 INFO:teuthology.task.ceph.mon.a.err:
2013-03-14T17:44:16.832 INFO:teuthology.task.ceph.mon.a.err: ceph version 0.58-637-g7370b55 (7370b5564606474f11b9ac5afb7cc60e0ac36ed1)
2013-03-14T17:44:16.832 INFO:teuthology.task.ceph.mon.a.err: 1: (Monitor::StoreConverter::needs_conversion()+0x544) [0x49edb4]
2013-03-14T17:44:16.832 INFO:teuthology.task.ceph.mon.a.err: 2: (main()+0x7a1) [0x486491]
2013-03-14T17:44:16.832 INFO:teuthology.task.ceph.mon.a.err: 3: (
_libc_start_main()+0xed) [0x7faf31fdc76d]
2013-03-14T17:44:16.833 INFO:teuthology.task.ceph.mon.a.err: 4: ceph-mon() [0x48983d]
2013-03-14T17:44:16.833 INFO:teuthology.task.ceph.mon.a.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2013-03-14T17:44:16.833 INFO:teuthology.task.ceph.mon.a.err:
2013-03-14T17:44:16.834 INFO:teuthology.task.ceph.mon.a.err: 0> 2013-03-14 17:44:17.097995 7faf33f22780 -1 mon/Monitor.cc: In function 'bool Monitor::StoreConverter::needs_conversion()' thread 7faf33f22780 time 2013-03-14 17:44:17.097212
2013-03-14T17:44:16.834 INFO:teuthology.task.ceph.mon.a.err:mon/Monitor.cc: 4097: FAILED assert(0 == "non-existent store in mon data directory")
2013-03-14T17:44:16.834 INFO:teuthology.task.ceph.mon.a.err:
2013-03-14T17:44:16.835 INFO:teuthology.task.ceph.mon.a.err: ceph version 0.58-637-g7370b55 (7370b5564606474f11b9ac5afb7cc60e0ac36ed1)
2013-03-14T17:44:16.835 INFO:teuthology.task.ceph.mon.a.err: 1: (Monitor::StoreConverter::needs_conversion()+0x544) [0x49edb4]
2013-03-14T17:44:16.835 INFO:teuthology.task.ceph.mon.a.err: 2: (main()+0x7a1) [0x486491]
2013-03-14T17:44:16.836 INFO:teuthology.task.ceph.mon.a.err: 3: (__libc_start_main()+0xed) [0x7faf31fdc76d]
2013-03-14T17:44:16.836 INFO:teuthology.task.ceph.mon.a.err: 4: ceph-mon() [0x48983d]
2013-03-14T17:44:16.836 INFO:teuthology.task.ceph.mon.a.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2013-03-14T17:44:16.837 INFO:teuthology.task.ceph.mon.a.err:
2013-03-14T17:44:16.837 INFO:teuthology.task.ceph.mon.a.err:terminate called after throwing an instance of 'ceph::FailedAssertion'

have copied the logs to burnupi06.front.sepia.ceph.com:/home/ubuntu/upgrade_bug

Actions #1

Updated by Ian Colle about 11 years ago

  • Priority changed from High to Urgent
Actions #2

Updated by Sage Weil about 11 years ago

  • Priority changed from Urgent to Immediate
Actions #3

Updated by Sage Weil about 11 years ago

I can't reproduce this by doing the upgrade manually.

Actions #4

Updated by Joao Eduardo Luis about 11 years ago

  • Status changed from New to In Progress

I can reproduce this rather reliably using the install.upgrade task.

The error message is quite misleading, and the whole assert doesn't help.

What is in fact happening is that by the time the v0.58 monitor is started, the previous monitor is still running. During 'need_conversion()' a 'MonitorStore::mount()' is called and it can't lock the store lock, thus bringing the process down -- and when the code was made I assumed the only chance that wouldn't work was if the store didn't exist, thus the misleading message.

Also, almost every time I reproduce this with the install.upgrade task, I am able to trigger #4394 which happens whenever /var/lib/ceph/mon/ceph-foo is removed from under a running monitor.

I'm about to push a patch with a fixed message, but afaict there's not much more to do on the monitor side.

Actions #5

Updated by Greg Farnum about 11 years ago

Sounds like maybe the package uninstalls aren't killing the daemons. Is this a feature of Ubuntu upgrades, or just another post-rm thing we forgot and need to add in?

Actions #6

Updated by Joao Eduardo Luis about 11 years ago

I've been bumping into several issues with the upgrade task. Should we create a different bug on teuthology to address them, or keep on documenting them here and move this one to the teuthology project?

Actions #7

Updated by Ian Colle about 11 years ago

  • Project changed from Ceph to teuthology
Actions #8

Updated by Tamilarasi muthamizhan about 11 years ago

  • Status changed from In Progress to Resolved
  • Assignee changed from Joao Eduardo Luis to Tamilarasi muthamizhan

This was actually happening as ceph.restart was using DaemonState.restart(), which only starts the daemons instead of restarting. I've this fixed in ceph.restart task and it shouldn't happen again.

Actions

Also available in: Atom PDF