Project

General

Profile

Bug #24612

FAILED assert(osdmap_manifest.pinned.empty()) in OSDMonitor::prune_init()

Added by Neha Ojha over 1 year ago. Updated 10 months ago.

Status:
Resolved
Priority:
Urgent
Category:
Correctness/Safety
Target version:
-
Start date:
06/21/2018
Due date:
% Done:

0%

Source:
Tags:
Backport:
mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:

Description

2018-06-21T05:40:08.919 INFO:tasks.ceph.mon.b.smithi141.stderr:/build/ceph-14.0.0-586-g47c0cd8/src/mon/OSDMonitor.cc: In function 'void OSDMonitor::prune_init()' thread 7f9878853700 time 2018-06-21 05:40:08.918368
2018-06-21T05:40:08.919 INFO:tasks.ceph.mon.b.smithi141.stderr:/build/ceph-14.0.0-586-g47c0cd8/src/mon/OSDMonitor.cc: 1736: FAILED assert(osdmap_manifest.pinned.empty())
2018-06-21T05:40:08.920 INFO:tasks.ceph.mon.b.smithi141.stderr: ceph version 14.0.0-586-g47c0cd8 (47c0cd87f07352f00bb964a25af79776e61a07c6) nautilus (dev)
2018-06-21T05:40:08.921 INFO:tasks.ceph.mon.b.smithi141.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7f9882bcc6b2]
2018-06-21T05:40:08.921 INFO:tasks.ceph.mon.b.smithi141.stderr: 2: (()+0x2e6877) [0x7f9882bcc877]
2018-06-21T05:40:08.922 INFO:tasks.ceph.mon.b.smithi141.stderr: 3: (OSDMonitor::prune_init()+0x1cd) [0x55bf478be41d]
2018-06-21T05:40:08.922 INFO:tasks.ceph.mon.b.smithi141.stderr: 4: (OSDMonitor::do_prune(std::shared_ptr<MonitorDBStore::Transaction>)+0x236) [0x55bf478e3266]
2018-06-21T05:40:08.922 INFO:tasks.ceph.mon.b.smithi141.stderr: 5: (OSDMonitor::encode_pending(std::shared_ptr<MonitorDBStore::Transaction>)+0x7b) [0x55bf478e440b]
2018-06-21T05:40:08.923 INFO:tasks.ceph.mon.b.smithi141.stderr: 6: (PaxosService::propose_pending()+0x113) [0x55bf478b18d3]
2018-06-21T05:40:08.923 INFO:tasks.ceph.mon.b.smithi141.stderr: 7: (C_MonContext::finish(int)+0x39) [0x55bf477528a9]
2018-06-21T05:40:08.923 INFO:tasks.ceph.mon.b.smithi141.stderr: 8: (Context::complete(int)+0x9) [0x55bf4778a9a9]
2018-06-21T05:40:08.923 INFO:tasks.ceph.mon.b.smithi141.stderr: 9: (SafeTimer::timer_thread()+0x18b) [0x7f9882bc902b]
2018-06-21T05:40:08.924 INFO:tasks.ceph.mon.b.smithi141.stderr: 10: (SafeTimerThread::entry()+0xd) [0x7f9882bca5ed]
2018-06-21T05:40:08.924 INFO:tasks.ceph.mon.b.smithi141.stderr: 11: (()+0x76ba) [0x7f98824c86ba]
2018-06-21T05:40:08.925 INFO:tasks.ceph.mon.b.smithi141.stderr: 12: (clone()+0x6d) [0x7f98811ec41d]
2018-06-21T05:40:08.925 INFO:tasks.ceph.mon.b.smithi141.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

/a/nojha-2018-06-21_00:18:52-rados-wip-24487-distro-basic-smithi/2686256


Related issues

Duplicated by RADOS - Bug #25181: /mon/OSDMonitor.cc: 1821: FAILED assert(osdmap_manifest.pinned.empty()) Duplicate 07/30/2018
Copied to RADOS - Backport #35071: mimic: FAILED assert(osdmap_manifest.pinned.empty()) in OSDMonitor::prune_init() Resolved

History

#1 Updated by Josh Durgin about 1 year ago

  • Priority changed from Normal to Urgent

#2 Updated by Sage Weil about 1 year ago

  • Status changed from New to Verified

/a/sage-2018-07-31_21:57:28-rados-wip-sage-testing-2018-07-31-1436-distro-basic-smithi/2844443
/a/sage-2018-07-30_13:46:50-rados-wip-sage3-testing-2018-07-28-1512-distro-basic-smithi/2838737

#3 Updated by Sage Weil about 1 year ago

  • Duplicated by Bug #25181: /mon/OSDMonitor.cc: 1821: FAILED assert(osdmap_manifest.pinned.empty()) added

#4 Updated by Sage Weil about 1 year ago

/a/sage-2018-08-15_15:49:39-rados-wip-sage2-testing-2018-08-15-0731-distro-basic-smithi/2908178

#5 Updated by Joao Eduardo Luis about 1 year ago

  • Assignee set to Joao Eduardo Luis

#6 Updated by Joao Eduardo Luis about 1 year ago

  • Category set to Correctness/Safety
  • Status changed from Verified to In Progress

#7 Updated by Joao Eduardo Luis about 1 year ago

https://github.com/ceph/ceph/pull/23742

Currently missing: a reproducer. Reproducing may not be trivial because this requires a few conditions to be true to trigger the bug:

1. do_prune() must have been called, and the in-memory state must have been changed;
2. the transaction containing the update must be postponed because Paxos is busy;
3. an election must be called before the Paxos transaction is committed.

While triggering an election is trivial enough, ensuring we have the first two conditions is not, and would likely require instrumentation in the code to force the condition. I think we're better off relying on the suites, that seem to be reproducing this every now and then.

FWIW, the cause appears to be that we would not clear

#8 Updated by Sage Weil about 1 year ago

  • Status changed from In Progress to Pending Backport
  • Backport set to mimic

#9 Updated by Nathan Cutler about 1 year ago

  • Copied to Backport #35071: mimic: FAILED assert(osdmap_manifest.pinned.empty()) in OSDMonitor::prune_init() added

#10 Updated by Nathan Cutler 10 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF