Project

General

Profile

Actions

Bug #10422

closed

osdmap full crc mismatch due to divergent cache_target_{dirty,full}_ratio_micro

Added by Yuri Weinstein over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

All test failed in run http://pulpito.front.sepia.ceph.com/teuthology-2014-12-22_09:28:15-upgrade:dumpling-firefly-x:stress-split-next-distro-basic-multi/

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-12-22_09:28:15-upgrade:dumpling-firefly-x:stress-split-next-distro-basic-multi/672412/

2014-12-22T11:28:03.278 INFO:tasks.ceph.mon.c.plana58.stderr:    -4> 2014-12-22 11:28:03.139305 7f4b30faa700 -1 mon.c@2(peon).osd e1679 previous osdmap reload, raw crc 3487354266
2014-12-22T11:28:03.278 INFO:tasks.ceph.mon.c.plana58.stderr:    -3> 2014-12-22 11:28:03.139316 7f4b30faa700 -1 mon.c@2(peon).osd e1679 reencode of that is 0
2014-12-22T11:28:03.278 INFO:tasks.ceph.mon.c.plana58.stderr:    -2> 2014-12-22 11:28:03.139372 7f4b30faa700 -1 mon.c@2(peon).osd e1679 again raw crc is 0
2014-12-22T11:28:03.278 INFO:tasks.ceph.mon.c.plana58.stderr:    -1> 2014-12-22 11:28:03.139376 7f4b30faa700 -1 mon.c@2(peon).osd e1679  full_crc 818170385
2014-12-22T11:28:03.278 INFO:tasks.ceph.mon.c.plana58.stderr:     0> 2014-12-22 11:28:03.175230 7f4b30faa700 -1 mon/OSDMonitor.cc: In function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread 7f4b30faa700 time 2014-12-22 11:28:03.139384
2014-12-22T11:28:03.279 INFO:tasks.ceph.mon.c.plana58.stderr:mon/OSDMonitor.cc: 251: FAILED assert(0 == "got mismatched crc encoding full map")
2014-12-22T11:28:03.279 INFO:tasks.ceph.mon.c.plana58.stderr:
2014-12-22T11:28:03.279 INFO:tasks.ceph.mon.c.plana58.stderr: ceph version 0.90-586-g4047f2d (4047f2ddef22794fad1e97bb852151e4c6b3cf82)
2014-12-22T11:28:03.279 INFO:tasks.ceph.mon.c.plana58.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0x79831f]
2014-12-22T11:28:03.279 INFO:tasks.ceph.mon.c.plana58.stderr: 2: (OSDMonitor::update_from_paxos(bool*)+0x2dcb) [0x5eab0b]
2014-12-22T11:28:03.279 INFO:tasks.ceph.mon.c.plana58.stderr: 3: (PaxosService::refresh(bool*)+0x3c6) [0x5cba06]
2014-12-22T11:28:03.279 INFO:tasks.ceph.mon.c.plana58.stderr: 4: (Monitor::refresh_from_paxos(bool*)+0x34b) [0x576fab]
2014-12-22T11:28:03.279 INFO:tasks.ceph.mon.c.plana58.stderr: 5: (Paxos::do_refresh()+0x4c) [0x5b386c]
2014-12-22T11:28:03.280 INFO:tasks.ceph.mon.c.plana58.stderr: 6: (Paxos::handle_commit(MMonPaxos*)+0x243) [0x5ba7d3]
2014-12-22T11:28:03.280 INFO:tasks.ceph.mon.c.plana58.stderr: 7: (Paxos::dispatch(PaxosServiceMessage*)+0x22b) [0x5c30ab]
2014-12-22T11:28:03.280 INFO:tasks.ceph.mon.c.plana58.stderr: 8: (Monitor::dispatch(MonSession*, Message*, bool)+0x8c3) [0x598443]
2014-12-22T11:28:03.280 INFO:tasks.ceph.mon.c.plana58.stderr: 9: (Monitor::_ms_dispatch(Message*)+0x2a4) [0x598884]
2014-12-22T11:28:03.280 INFO:tasks.ceph.mon.c.plana58.stderr: 10: (Monitor::ms_dispatch(Message*)+0x32) [0x5b2b62]
2014-12-22T11:28:03.280 INFO:tasks.ceph.mon.c.plana58.stderr: 11: (Messenger::ms_deliver_dispatch(Message*)+0x77) [0x88a087]
2014-12-22T11:28:03.280 INFO:tasks.ceph.mon.c.plana58.stderr: 12: (DispatchQueue::entry()+0x44a) [0x88722a]
2014-12-22T11:28:03.280 INFO:tasks.ceph.mon.c.plana58.stderr: 13: (DispatchQueue::DispatchThread::entry()+0xd) [0x780dad]
2014-12-22T11:28:03.280 INFO:tasks.ceph.mon.c.plana58.stderr: 14: (()+0x7e9a) [0x7f4b375a2e9a]
2014-12-22T11:28:03.281 INFO:tasks.ceph.mon.c.plana58.stderr: 15: (clone()+0x6d) [0x7f4b35d4c31d]
2014-12-22T11:28:03.281 INFO:tasks.ceph.mon.c.plana58.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-12-22T11:28:03.281 INFO:tasks.ceph.mon.c.plana58.stderr:
2014-12-22T11:28:03.281 INFO:tasks.ceph.mon.c.plana58.stderr:terminate called after throwing an instance of 'ceph::FailedAssertion'
2014-12-22T11:28:03.281 INFO:tasks.ceph.mon.c.plana58.stderr:*** Caught signal (Aborted) **
2014-12-22T11:28:03.281 INFO:tasks.ceph.mon.c.plana58.stderr: in thread 7f4b30faa700
2014-12-22T11:28:03.281 INFO:tasks.ceph.mon.c.plana58.stderr: ceph version 0.90-586-g4047f2d (4047f2ddef22794fad1e97bb852151e4c6b3cf82)
2014-12-22T11:28:03.282 INFO:tasks.ceph.mon.c.plana58.stderr: 1: ceph-mon() [0x8df645]
2014-12-22T11:28:03.282 INFO:tasks.ceph.mon.c.plana58.stderr: 2: (()+0xfcb0) [0x7f4b375aacb0]
2014-12-22T11:28:03.282 INFO:tasks.ceph.mon.c.plana58.stderr: 3: (gsignal()+0x35) [0x7f4b35c8e0d5]
2014-12-22T11:28:03.282 INFO:tasks.ceph.mon.c.plana58.stderr: 4: (abort()+0x17b) [0x7f4b35c9183b]
2014-12-22T11:28:03.282 INFO:tasks.ceph.mon.c.plana58.stderr: 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f4b365e069d]
2014-12-22T11:28:03.282 INFO:tasks.ceph.mon.c.plana58.stderr: 6: (()+0xb5846) [0x7f4b365de846]
2014-12-22T11:28:03.282 INFO:tasks.ceph.mon.c.plana58.stderr: 7: (()+0xb5873) [0x7f4b365de873]
2014-12-22T11:28:03.282 INFO:tasks.ceph.mon.c.plana58.stderr: 8: (()+0xb596e) [0x7f4b365de96e]
2014-12-22T11:28:03.283 INFO:tasks.ceph.mon.c.plana58.stderr: 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x259) [0x7984f9]
2014-12-22T11:28:03.283 INFO:tasks.ceph.mon.c.plana58.stderr: 10: (OSDMonitor::update_from_paxos(bool*)+0x2dcb) [0x5eab0b]
2014-12-22T11:28:03.283 INFO:tasks.ceph.mon.c.plana58.stderr: 11: (PaxosService::refresh(bool*)+0x3c6) [0x5cba06]
2014-12-22T11:28:03.283 INFO:tasks.ceph.mon.c.plana58.stderr: 12: (Monitor::refresh_from_paxos(bool*)+0x34b) [0x576fab]
2014-12-22T11:28:03.283 INFO:tasks.ceph.mon.c.plana58.stderr: 13: (Paxos::do_refresh()+0x4c) [0x5b386c]
2014-12-22T11:28:03.283 INFO:tasks.ceph.mon.c.plana58.stderr: 14: (Paxos::handle_commit(MMonPaxos*)+0x243) [0x5ba7d3]
2014-12-22T11:28:03.283 INFO:tasks.ceph.mon.c.plana58.stderr: 15: (Paxos::dispatch(PaxosServiceMessage*)+0x22b) [0x5c30ab]
2014-12-22T11:28:03.283 INFO:tasks.ceph.mon.c.plana58.stderr: 16: (Monitor::dispatch(MonSession*, Message*, bool)+0x8c3) [0x598443]
2014-12-22T11:28:03.283 INFO:tasks.ceph.mon.c.plana58.stderr: 17: (Monitor::_ms_dispatch(Message*)+0x2a4) [0x598884]
2014-12-22T11:28:03.284 INFO:tasks.ceph.mon.c.plana58.stderr: 18: (Monitor::ms_dispatch(Message*)+0x32) [0x5b2b62]
2014-12-22T11:28:03.284 INFO:tasks.ceph.mon.c.plana58.stderr: 19: (Messenger::ms_deliver_dispatch(Message*)+0x77) [0x88a087]
2014-12-22T11:28:03.284 INFO:tasks.ceph.mon.c.plana58.stderr: 20: (DispatchQueue::entry()+0x44a) [0x88722a]
2014-12-22T11:28:03.284 INFO:tasks.ceph.mon.c.plana58.stderr: 21: (DispatchQueue::DispatchThread::entry()+0xd) [0x780dad]
2014-12-22T11:28:03.284 INFO:tasks.ceph.mon.c.plana58.stderr: 22: (()+0x7e9a) [0x7f4b375a2e9a]
2014-12-22T11:28:03.284 INFO:tasks.ceph.mon.c.plana58.stderr: 23: (clone()+0x6d) [0x7f4b35d4c31d]
2014-12-22T11:28:03.284 INFO:tasks.ceph.mon.c.plana58.stderr:2014-12-22 11:28:03.183647 7f4b30faa700 -1 *** Caught signal (Aborted) **
2014-12-22T11:28:03.284 INFO:tasks.ceph.mon.c.plana58.stderr: in thread 7f4b30faa700
2014-12-22T11:28:03.285 INFO:tasks.ceph.mon.c.plana58.stderr:
2014-12-22T11:28:03.285 INFO:tasks.ceph.mon.c.plana58.stderr: ceph version 0.90-586-g4047f2d (4047f2ddef22794fad1e97bb852151e4c6b3cf82)
2014-12-22T11:28:03.285 INFO:tasks.ceph.mon.c.plana58.stderr: 1: ceph-mon() [0x8df645]
2014-12-22T11:28:03.285 INFO:tasks.ceph.mon.c.plana58.stderr: 2: (()+0xfcb0) [0x7f4b375aacb0]
2014-12-22T11:28:03.285 INFO:tasks.ceph.mon.c.plana58.stderr: 3: (gsignal()+0x35) [0x7f4b35c8e0d5]
2014-12-22T11:28:03.285 INFO:tasks.ceph.mon.c.plana58.stderr: 4: (abort()+0x17b) [0x7f4b35c9183b]
2014-12-22T11:28:03.285 INFO:tasks.ceph.mon.c.plana58.stderr: 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f4b365e069d]
2014-12-22T11:28:03.286 INFO:tasks.ceph.mon.c.plana58.stderr: 6: (()+0xb5846) [0x7f4b365de846]
2014-12-22T11:28:03.286 INFO:tasks.ceph.mon.c.plana58.stderr: 7: (()+0xb5873) [0x7f4b365de873]
2014-12-22T11:28:03.286 INFO:tasks.ceph.mon.c.plana58.stderr: 8: (()+0xb596e) [0x7f4b365de96e]
2014-12-22T11:28:03.286 INFO:tasks.ceph.mon.c.plana58.stderr: 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x259) [0x7984f9]
2014-12-22T11:28:03.286 INFO:tasks.ceph.mon.c.plana58.stderr: 10: (OSDMonitor::update_from_paxos(bool*)+0x2dcb) [0x5eab0b]
2014-12-22T11:28:03.286 INFO:tasks.ceph.mon.c.plana58.stderr: 11: (PaxosService::refresh(bool*)+0x3c6) [0x5cba06]
2014-12-22T11:28:03.286 INFO:tasks.ceph.mon.c.plana58.stderr: 12: (Monitor::refresh_from_paxos(bool*)+0x34b) [0x576fab]
2014-12-22T11:28:03.286 INFO:tasks.ceph.mon.c.plana58.stderr: 13: (Paxos::do_refresh()+0x4c) [0x5b386c]
2014-12-22T11:28:03.287 INFO:tasks.ceph.mon.c.plana58.stderr: 14: (Paxos::handle_commit(MMonPaxos*)+0x243) [0x5ba7d3]
2014-12-22T11:28:03.287 INFO:tasks.ceph.mon.c.plana58.stderr: 15: (Paxos::dispatch(PaxosServiceMessage*)+0x22b) [0x5c30ab]
2014-12-22T11:28:03.287 INFO:tasks.ceph.mon.c.plana58.stderr: 16: (Monitor::dispatch(MonSession*, Message*, bool)+0x8c3) [0x598443]
2014-12-22T11:28:03.287 INFO:tasks.ceph.mon.c.plana58.stderr: 17: (Monitor::_ms_dispatch(Message*)+0x2a4) [0x598884]
2014-12-22T11:28:03.287 INFO:tasks.ceph.mon.c.plana58.stderr: 18: (Monitor::ms_dispatch(Message*)+0x32) [0x5b2b62]
2014-12-22T11:28:03.287 INFO:tasks.ceph.mon.c.plana58.stderr: 19: (Messenger::ms_deliver_dispatch(Message*)+0x77) [0x88a087]
2014-12-22T11:28:03.287 INFO:tasks.ceph.mon.c.plana58.stderr: 20: (DispatchQueue::entry()+0x44a) [0x88722a]
2014-12-22T11:28:03.287 INFO:tasks.ceph.mon.c.plana58.stderr: 21: (DispatchQueue::DispatchThread::entry()+0xd) [0x780dad]
2014-12-22T11:28:03.288 INFO:tasks.ceph.mon.c.plana58.stderr: 22: (()+0x7e9a) [0x7f4b375a2e9a]
2014-12-22T11:28:03.288 INFO:tasks.ceph.mon.c.plana58.stderr: 23: (clone()+0x6d) [0x7f4b35d4c31d]
2014-12-22T11:28:03.288 INFO:tasks.ceph.mon.c.plana58.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2014-12-22T11:28:03.288 INFO:tasks.ceph.mon.c.plana58.stderr:
2014-12-22T11:28:03.288 INFO:tasks.ceph.mon.c.plana58.stderr:     0> 2014-12-22 11:28:03.183647 7f4b30faa700 -1 *** Caught signal (Aborted) **
2014-12-22T11:28:03.288 INFO:tasks.ceph.mon.c.plana58.stderr: in thread 7f4b30faa700
2014-12-22T11:28:03.288 INFO:tasks.ceph.mon.c.plana58.stderr:
2014-12-22T11:28:03.288 INFO:tasks.ceph.mon.c.plana58.stderr: ceph version 0.90-586-g4047f2d (4047f2ddef22794fad1e97bb852151e4c6b3cf82)
2014-12-22T11:28:03.288 INFO:tasks.ceph.mon.c.plana58.stderr: 1: ceph-mon() [0x8df645]
2014-12-22T11:28:03.289 INFO:tasks.ceph.mon.c.plana58.stderr: 2: (()+0xfcb0) [0x7f4b375aacb0]
2014-12-22T11:28:03.289 INFO:tasks.ceph.mon.c.plana58.stderr: 3: (gsignal()+0x35) [0x7f4b35c8e0d5]
2014-12-22T11:28:03.289 INFO:tasks.ceph.mon.c.plana58.stderr: 4: (abort()+0x17b) [0x7f4b35c9183b]
2014-12-22T11:28:03.289 INFO:tasks.ceph.mon.c.plana58.stderr: 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f4b365e069d]
2014-12-22T11:28:03.289 INFO:tasks.ceph.mon.c.plana58.stderr: 6: (()+0xb5846) [0x7f4b365de846]
2014-12-22T11:28:03.289 INFO:tasks.ceph.mon.c.plana58.stderr: 7: (()+0xb5873) [0x7f4b365de873]
2014-12-22T11:28:03.289 INFO:tasks.ceph.mon.c.plana58.stderr: 8: (()+0xb596e) [0x7f4b365de96e]
2014-12-22T11:28:03.289 INFO:tasks.ceph.mon.c.plana58.stderr: 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x259) [0x7984f9]
2014-12-22T11:28:03.290 INFO:tasks.ceph.mon.c.plana58.stderr: 10: (OSDMonitor::update_from_paxos(bool*)+0x2dcb) [0x5eab0b]
2014-12-22T11:28:03.290 INFO:tasks.ceph.mon.c.plana58.stderr: 11: (PaxosService::refresh(bool*)+0x3c6) [0x5cba06]
2014-12-22T11:28:03.290 INFO:tasks.ceph.mon.c.plana58.stderr: 12: (Monitor::refresh_from_paxos(bool*)+0x34b) [0x576fab]
2014-12-22T11:28:03.290 INFO:tasks.ceph.mon.c.plana58.stderr: 13: (Paxos::do_refresh()+0x4c) [0x5b386c]
2014-12-22T11:28:03.290 INFO:tasks.ceph.mon.c.plana58.stderr: 14: (Paxos::handle_commit(MMonPaxos*)+0x243) [0x5ba7d3]
2014-12-22T11:28:03.290 INFO:tasks.ceph.mon.c.plana58.stderr: 15: (Paxos::dispatch(PaxosServiceMessage*)+0x22b) [0x5c30ab]
2014-12-22T11:28:03.290 INFO:tasks.ceph.mon.c.plana58.stderr: 16: (Monitor::dispatch(MonSession*, Message*, bool)+0x8c3) [0x598443]
2014-12-22T11:28:03.290 INFO:tasks.ceph.mon.c.plana58.stderr: 17: (Monitor::_ms_dispatch(Message*)+0x2a4) [0x598884]
2014-12-22T11:28:03.290 INFO:tasks.ceph.mon.c.plana58.stderr: 18: (Monitor::ms_dispatch(Message*)+0x32) [0x5b2b62]
2014-12-22T11:28:03.291 INFO:tasks.ceph.mon.c.plana58.stderr: 19: (Messenger::ms_deliver_dispatch(Message*)+0x77) [0x88a087]
2014-12-22T11:28:03.291 INFO:tasks.ceph.mon.c.plana58.stderr: 20: (DispatchQueue::entry()+0x44a) [0x88722a]
2014-12-22T11:28:03.291 INFO:tasks.ceph.mon.c.plana58.stderr: 21: (DispatchQueue::DispatchThread::entry()+0xd) [0x780dad]
2014-12-22T11:28:03.291 INFO:tasks.ceph.mon.c.plana58.stderr: 22: (()+0x7e9a) [0x7f4b375a2e9a]
2014-12-22T11:28:03.291 INFO:tasks.ceph.mon.c.plana58.stderr: 23: (clone()+0x6d) [0x7f4b35d4c31d]
2014-12-22T11:28:03.291 INFO:tasks.ceph.mon.c.plana58.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #1

Updated by Sage Weil over 9 years ago

  • Subject changed from "ceph::__ceph_assert_fail(char const*, char const*, int, char const*) "in upgrade:dumpling-firefly-x:stress-split-next-distro-basic-multi run to osdmap full crc mismatch due to divergent cache_target_{dirty,full}_ratio_micro
  • Assignee set to Sage Weil

in a long-past epoch these values diverge. strangely the incremental for that epoch changes the pool info for a different pool. the firefly mon is the outlier. later, when it gets to master, it hits the crc mismatch due to the old full map difference.

Actions #2

Updated by Sage Weil over 9 years ago

  • Status changed from New to 12

this was super creepy before, but now each time i reproduce it the maps diverge very early on simply because dumpling != firefly. in particular the problem is this change in epoch 57:

root@plana59:~# ceph-dencoder type OSDMap::Incremental import /tmp/inc decode dump_json
{ "epoch": 57,
  "fsid": "2e6d593b-0c42-404a-81d0-4e5141f5fc37",
  "modified": "2014-12-23 09:27:05.911777",
  "new_pool_max": 17,
  "new_flags": -1,
  "new_max_osd": -1,
  "new_pools": [
        { "pool": 17,
          "flags": 1,
          "flags_names": "hashpspool",
          "type": 1,
          "size": 2,
          "min_size": 1,
          "crush_ruleset": 0,
          "object_hash": 2,
          "pg_num": 8,
          "pg_placement_num": 8,
          "crash_replay_interval": 0,
          "last_change": "57",
          "last_force_op_resend": "0",
          "auid": 0,
          "snap_mode": "selfmanaged",
          "snap_seq": 0,
          "snap_epoch": 0,
          "pool_snaps": [],
          "removed_snaps": "[]",
          "quota_max_bytes": 0,
          "quota_max_objects": 0,
          "tiers": [],
          "tier_of": -1,
          "read_tier": -1,
          "write_tier": -1,
          "cache_mode": "none",
          "target_max_bytes": 0,
          "target_max_objects": 0,
          "cache_target_dirty_ratio_micro": 400000,
          "cache_target_full_ratio_micro": 800000,
          "cache_min_flush_age": 0,
          "cache_min_evict_age": 0,
          "erasure_code_profile": "",
          "hit_set_params": { "type": "none"},
          "hit_set_period": 0,
          "hit_set_count": 0,
          "min_read_recency_for_promote": 1,
          "stripe_width": 0,
          "expected_num_objects": 0}],
  "new_pool_names": [
        { "pool": 17,
          "name": "test-rados-api-plana20-16513-1"}],
  "old_pools": [],
  "new_up_osds": [],
  "new_weight": [],
  "osd_state_xor": [],
  "new_pg_temp": [],
  "primary_temp": [],
  "new_up_thru": [],
  "new_lost": [],
  "new_last_clean_interval": [],
  "new_blacklist": [],
  "old_blacklist": [],
  "new_xinfo": [],
  "new_uuid": [],
  "erasure_code_profiles": {},
  "old_erasure_code_profiles": []}

note the ratio values, populated on the primary based on config options. those aren't understood by dumpling and aren't encoded that way.

one workaround here is to zero those if all mons in the quorum don't have a tiering feature bit. that avoids this particular instantiation of the problem.

a long-term solution is to simply stop the per-mon fullmap encoding and stuff it in the paxos transaction. this will eliminate this entire category of bugs and simplify things considerably.

Actions #3

Updated by Sage Weil over 9 years ago

fwiw i'm concerned that not that the osdmap crc checks will notice (by killing a mon!) when the encoding doesn't match the primary we will uncover sources of full map inconsistency across mons that we haven't reproduced in qa.

one option would be to ask users to do a mon scrub prior to upgrade to ensure it is safe.

another option is a one-time broadcast of the full osdmap across mons to ensure they are consistent.

and simplest option is to do that on all osdmap changes.

let's assume an osdmap is 10MB (relatively huge). if paxos commits every 2s (relatively quickly), that's 5 MB/sec * 5 mons -> 25 MB/sec of total 'wasted' bandwidth. Not a big deal I think to eliminate this complexity...

Actions #4

Updated by David Zafman over 9 years ago

Yuri probably meant to add this to here:

Looks like this issue was caught in http://qa-proxy.ceph.com/teuthology/teuthology-2014-12-21_17:13:02-upgrade:firefly-x-next-distro-basic-multi/671671/ job:

Assertion: mon/OSDMonitor.cc: 251: FAILED assert(0 == "got mismatched crc encoding full map")
ceph version 0.90-584-g243b9e4 (243b9e4350555a323ccc9f8bb5fb364cce98dff5)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7ccefb]
 2: (OSDMonitor::update_from_paxos(bool*)+0x22fc) [0x61e9bc]
 3: (PaxosService::refresh(bool*)+0x19a) [0x5fedca]
 4: (Monitor::refresh_from_paxos(bool*)+0x183) [0x5a71f3]
 5: (Paxos::do_refresh()+0x2e) [0x5e8cbe]
 6: (Paxos::handle_commit(MMonPaxos*)+0x283) [0x5f05e3]
 7: (Paxos::dispatch(PaxosServiceMessage*)+0x1bb) [0x5f7abb]
 8: (Monitor::dispatch(MonSession*, Message*, bool)+0x8b0) [0x5c8e40]
 9: (Monitor::_ms_dispatch(Message*)+0x1a6) [0x5c9246]
 10: (Monitor::ms_dispatch(Message*)+0x23) [0x5e7d53]
 11: (DispatchQueue::entry()+0x649) [0x910889]
 12: (DispatchQueue::DispatchThread::entry()+0xd) [0x7b5c5d]
 13: (()+0x8182) [0x7faba1b8a182]
 14: (clone()+0x6d) [0x7faba00f4efd]
Actions #5

Updated by David Zafman over 9 years ago

Yuri probably meant to add this to here:

And one more in run http://pulpito.ceph.com/teuthology-2014-12-21_17:15:02-upgrade:dumpling-firefly-x:parallel-next-distro-basic-vps/

Jobs ['671677', '671681']

Assertion: mon/OSDMonitor.cc: 251: FAILED assert(0 == "got mismatched crc encoding full map")
ceph version 0.90-584-g243b9e4 (243b9e4350555a323ccc9f8bb5fb364cce98dff5)
 1: (OSDMonitor::update_from_paxos(bool*)+0x2818) [0x5fdf58]
 2: (PaxosService::refresh(bool*)+0x193) [0x5d0733]
 3: (Monitor::refresh_from_paxos(bool*)+0x153) [0x58f703]
 4: (Paxos::do_refresh()+0x4c) [0x5bb17c]
 5: (Paxos::handle_commit(MMonPaxos*)+0x243) [0x5c7793]
 6: (Paxos::dispatch(PaxosServiceMessage*)+0x23b) [0x5c8d0b]
 7: (Monitor::dispatch(MonSession*, Message*, bool)+0x7fb) [0x59a62b]
 8: (Monitor::_ms_dispatch(Message*)+0x1f6) [0x59a966]
 9: (Monitor::ms_dispatch(Message*)+0x32) [0x5b9412]
 10: (DispatchQueue::entry()+0x4fa) [0x8b4ffa]
 11: (DispatchQueue::DispatchThread::entry()+0xd) [0x87b40d]
 12: (()+0x79d1) [0x7ffa0624d9d1]
 13: (clone()+0x6d) [0x7ffa04fd586d]
Actions #6

Updated by Sage Weil over 9 years ago

  • Status changed from 12 to Fix Under Review
Actions #7

Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF