Project

General

Profile

Actions

Bug #24612

closed

FAILED assert(osdmap_manifest.pinned.empty()) in OSDMonitor::prune_init()

Added by Neha Ojha almost 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Joao Eduardo Luis
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2018-06-21T05:40:08.919 INFO:tasks.ceph.mon.b.smithi141.stderr:/build/ceph-14.0.0-586-g47c0cd8/src/mon/OSDMonitor.cc: In function 'void OSDMonitor::prune_init()' thread 7f9878853700 time 2018-06-21 05:40:08.918368
2018-06-21T05:40:08.919 INFO:tasks.ceph.mon.b.smithi141.stderr:/build/ceph-14.0.0-586-g47c0cd8/src/mon/OSDMonitor.cc: 1736: FAILED assert(osdmap_manifest.pinned.empty())
2018-06-21T05:40:08.920 INFO:tasks.ceph.mon.b.smithi141.stderr: ceph version 14.0.0-586-g47c0cd8 (47c0cd87f07352f00bb964a25af79776e61a07c6) nautilus (dev)
2018-06-21T05:40:08.921 INFO:tasks.ceph.mon.b.smithi141.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7f9882bcc6b2]
2018-06-21T05:40:08.921 INFO:tasks.ceph.mon.b.smithi141.stderr: 2: (()+0x2e6877) [0x7f9882bcc877]
2018-06-21T05:40:08.922 INFO:tasks.ceph.mon.b.smithi141.stderr: 3: (OSDMonitor::prune_init()+0x1cd) [0x55bf478be41d]
2018-06-21T05:40:08.922 INFO:tasks.ceph.mon.b.smithi141.stderr: 4: (OSDMonitor::do_prune(std::shared_ptr<MonitorDBStore::Transaction>)+0x236) [0x55bf478e3266]
2018-06-21T05:40:08.922 INFO:tasks.ceph.mon.b.smithi141.stderr: 5: (OSDMonitor::encode_pending(std::shared_ptr<MonitorDBStore::Transaction>)+0x7b) [0x55bf478e440b]
2018-06-21T05:40:08.923 INFO:tasks.ceph.mon.b.smithi141.stderr: 6: (PaxosService::propose_pending()+0x113) [0x55bf478b18d3]
2018-06-21T05:40:08.923 INFO:tasks.ceph.mon.b.smithi141.stderr: 7: (C_MonContext::finish(int)+0x39) [0x55bf477528a9]
2018-06-21T05:40:08.923 INFO:tasks.ceph.mon.b.smithi141.stderr: 8: (Context::complete(int)+0x9) [0x55bf4778a9a9]
2018-06-21T05:40:08.923 INFO:tasks.ceph.mon.b.smithi141.stderr: 9: (SafeTimer::timer_thread()+0x18b) [0x7f9882bc902b]
2018-06-21T05:40:08.924 INFO:tasks.ceph.mon.b.smithi141.stderr: 10: (SafeTimerThread::entry()+0xd) [0x7f9882bca5ed]
2018-06-21T05:40:08.924 INFO:tasks.ceph.mon.b.smithi141.stderr: 11: (()+0x76ba) [0x7f98824c86ba]
2018-06-21T05:40:08.925 INFO:tasks.ceph.mon.b.smithi141.stderr: 12: (clone()+0x6d) [0x7f98811ec41d]
2018-06-21T05:40:08.925 INFO:tasks.ceph.mon.b.smithi141.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

/a/nojha-2018-06-21_00:18:52-rados-wip-24487-distro-basic-smithi/2686256


Related issues 2 (0 open2 closed)

Has duplicate RADOS - Bug #25181: /mon/OSDMonitor.cc: 1821: FAILED assert(osdmap_manifest.pinned.empty())Duplicate07/30/2018

Actions
Copied to RADOS - Backport #35071: mimic: FAILED assert(osdmap_manifest.pinned.empty()) in OSDMonitor::prune_init()ResolvedNathan CutlerActions
Actions #1

Updated by Josh Durgin almost 6 years ago

  • Priority changed from Normal to Urgent
Actions #2

Updated by Sage Weil over 5 years ago

  • Status changed from New to 12

/a/sage-2018-07-31_21:57:28-rados-wip-sage-testing-2018-07-31-1436-distro-basic-smithi/2844443
/a/sage-2018-07-30_13:46:50-rados-wip-sage3-testing-2018-07-28-1512-distro-basic-smithi/2838737

Actions #3

Updated by Sage Weil over 5 years ago

  • Has duplicate Bug #25181: /mon/OSDMonitor.cc: 1821: FAILED assert(osdmap_manifest.pinned.empty()) added
Actions #4

Updated by Sage Weil over 5 years ago

/a/sage-2018-08-15_15:49:39-rados-wip-sage2-testing-2018-08-15-0731-distro-basic-smithi/2908178

Actions #5

Updated by Joao Eduardo Luis over 5 years ago

  • Assignee set to Joao Eduardo Luis
Actions #6

Updated by Joao Eduardo Luis over 5 years ago

  • Category set to Correctness/Safety
  • Status changed from 12 to In Progress
Actions #7

Updated by Joao Eduardo Luis over 5 years ago

https://github.com/ceph/ceph/pull/23742

Currently missing: a reproducer. Reproducing may not be trivial because this requires a few conditions to be true to trigger the bug:

1. do_prune() must have been called, and the in-memory state must have been changed;
2. the transaction containing the update must be postponed because Paxos is busy;
3. an election must be called before the Paxos transaction is committed.

While triggering an election is trivial enough, ensuring we have the first two conditions is not, and would likely require instrumentation in the code to force the condition. I think we're better off relying on the suites, that seem to be reproducing this every now and then.

FWIW, the cause appears to be that we would not clear

Actions #8

Updated by Sage Weil over 5 years ago

  • Status changed from In Progress to Pending Backport
  • Backport set to mimic
Actions #9

Updated by Nathan Cutler over 5 years ago

  • Copied to Backport #35071: mimic: FAILED assert(osdmap_manifest.pinned.empty()) in OSDMonitor::prune_init() added
Actions #10

Updated by Nathan Cutler over 5 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF