Project

General

Profile

Bug #22266

mgr/PyModuleRegistry.cc: 139: FAILED assert(map.epoch > 0)

Added by Sage Weil 8 months ago. Updated 6 months ago.

Status:
Pending Backport
Priority:
Urgent
Assignee:
Category:
Correctness/Safety
Target version:
-
Start date:
11/28/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):

Description

2017-11-28T16:02:48.792 INFO:tasks.ceph.mgr.x.smithi106.stderr:/build/ceph-12.2.1-838-gacb0271/src/mgr/PyModuleRegistry.cc: In function 'int PyModuleRegistry::init(const MgrMap&)' thread 7f9148335700 time 2017-11-28 16:02:48.618200
2017-11-28T16:02:48.792 INFO:tasks.ceph.mgr.x.smithi106.stderr:/build/ceph-12.2.1-838-gacb0271/src/mgr/PyModuleRegistry.cc: 139: FAILED assert(map.epoch > 0)
2017-11-28T16:02:48.795 INFO:tasks.ceph.mgr.x.smithi106.stderr: ceph version 12.2.1-838-gacb0271 (acb02717f6e96f96d4128bbebd946238d3c79291) luminous (stable)
2017-11-28T16:02:48.795 INFO:tasks.ceph.mgr.x.smithi106.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x558bcc270fa2]
2017-11-28T16:02:48.795 INFO:tasks.ceph.mgr.x.smithi106.stderr: 2: (PyModuleRegistry::init(MgrMap const&)+0xd24) [0x558bcc113e34]
2017-11-28T16:02:48.795 INFO:tasks.ceph.mgr.x.smithi106.stderr: 3: (MgrStandby::handle_mgr_map(MMgrMap*)+0x1c6) [0x558bcc1279e6]
2017-11-28T16:02:48.796 INFO:tasks.ceph.mgr.x.smithi106.stderr: 4: (MgrStandby::ms_dispatch(Message*)+0x254) [0x558bcc1287d4]
2017-11-28T16:02:48.796 INFO:tasks.ceph.mgr.x.smithi106.stderr: 5: (DispatchQueue::entry()+0xf4a) [0x558bcc58c17a]
2017-11-28T16:02:48.796 INFO:tasks.ceph.mgr.x.smithi106.stderr: 6: (DispatchQueue::DispatchThread::entry()+0xd) [0x558bcc32ad7d]
2017-11-28T16:02:48.796 INFO:tasks.ceph.mgr.x.smithi106.stderr: 7: (()+0x76ba) [0x7f914efee6ba]
2017-11-28T16:02:48.796 INFO:tasks.ceph.mgr.x.smithi106.stderr: 8: (clone()+0x6d) [0x7f914e05a3dd]
2017-11-28T16:02:48.796 INFO:tasks.ceph.mgr.x.smithi106.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

/a/yuriw-2017-11-28_15:44:54-rados-luminous-distro-basic-smithi/1901404

Related issues

Duplicated by RADOS - Bug #22415: 'pg dump' fails after mon rebuild Duplicate 12/12/2017
Duplicated by mgr - Bug #22682: "PyModuleRegistry.cc: 139: FAILED assert(map.epoch > 0)" in rados-luminous-distro-basic-smithi Duplicate 01/15/2018
Copied to RADOS - Backport #22275: luminous: mgr/PyModuleRegistry.cc: 139: FAILED assert(map.epoch > 0) Resolved

History

#1 Updated by John Spray 8 months ago

The monitor is sending the manager a MgrMap with epoch zero. It's happening immediately after a monitor restart, which correlates with the rebuild_mondb task executing.

I think the manager is correct to assert on this: the rebuild_mondb task should be stopping other services if it's going to make map epochs go back in time.

#2 Updated by Sage Weil 8 months ago

/a/yuriw-2017-11-27_23:31:26-rados-luminous-distro-basic-smithi/1897131

reproducible!

#3 Updated by Kefu Chai 8 months ago

  • Assignee set to Kefu Chai

#4 Updated by Kefu Chai 8 months ago

  • Status changed from Verified to Testing

#5 Updated by Kefu Chai 8 months ago

  • Status changed from Testing to Need Review

#6 Updated by Kefu Chai 8 months ago

  • Project changed from mgr to RADOS
  • Category set to Correctness/Safety
  • ceph-qa-suite rados added

#7 Updated by Kefu Chai 8 months ago

  • Copied to Backport #22275: luminous: mgr/PyModuleRegistry.cc: 139: FAILED assert(map.epoch > 0) added

#8 Updated by Josh Durgin 8 months ago

  • Status changed from Need Review to Pending Backport

#9 Updated by Kefu Chai 8 months ago

  • Status changed from Pending Backport to Resolved

#10 Updated by Sage Weil 7 months ago

  • Status changed from Resolved to Verified

/a/sage-2017-12-19_06:01:05-rados-wip-sage2-testing-2017-12-18-2147-distro-basic-smithi/1979661

saw this again on master!

#11 Updated by Kefu Chai 6 months ago

  • Status changed from Verified to Need Review

#12 Updated by Sage Weil 6 months ago

  • Status changed from Need Review to Resolved

#13 Updated by Kefu Chai 6 months ago

  • Status changed from Resolved to Pending Backport
  • Backport set to luminous

#14 Updated by Kefu Chai 6 months ago

  • Backport changed from luminous to luminous,mimic-dev1

#15 Updated by Kefu Chai 6 months ago

  • Backport changed from luminous,mimic-dev1 to luminous

#16 Updated by Kefu Chai 6 months ago

  • Duplicated by Bug #22415: 'pg dump' fails after mon rebuild added

#17 Updated by Nathan Cutler 6 months ago

  • Related to Bug #22682: "PyModuleRegistry.cc: 139: FAILED assert(map.epoch > 0)" in rados-luminous-distro-basic-smithi added

#18 Updated by Nathan Cutler 6 months ago

Master PR for second round of backporting: https://github.com/ceph/ceph/pull/19780

Luminous backport PR: https://github.com/ceph/ceph/pull/20116

There will be no backport tracker issue for this backport because it's a follow-on fix for an issue (i.e. this one) for which an earlier fix was already backported in the v12.2.2 cycle.

#19 Updated by Kefu Chai 6 months ago

  • Related to deleted (Bug #22682: "PyModuleRegistry.cc: 139: FAILED assert(map.epoch > 0)" in rados-luminous-distro-basic-smithi)

#20 Updated by Kefu Chai 6 months ago

  • Duplicated by Bug #22682: "PyModuleRegistry.cc: 139: FAILED assert(map.epoch > 0)" in rados-luminous-distro-basic-smithi added

Also available in: Atom PDF