Project

General

Profile

Actions

Bug #12908

closed

osd/osd_types.cc: 459: FAILED assert(m_seed < old_pg_num) (asyncmsgr encoding problem?)

Added by Sage Weil over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

dzafman-2015-08-28_21:56:45-rados-wip-zafman-testing---basic-multi/1036649/remote/burnupi14/log/ceph-osd.2.log.gz
/a/sage-2015-08-30_05:52:47-rados-master---basic-multi/1039310

2015-08-31 00:40:05.229587 7f435c791700 10 osd.3 1164 do_waiters -- finish
2015-08-31 00:40:05.229590 7f435c791700 20 osd.3 1164 _dispatch 0x7f4368453c00 pg_query(0.277,0.284 epoch 1164) v3
2015-08-31 00:40:05.229631 7f435c791700  7 osd.3 1164 handle_pg_query from osd.5 epoch 1164
2015-08-31 00:40:05.229635 7f435c791700 15 osd.3 1164 require_same_or_newer_map 1164 (i am 1164) 0x7f4368453c00
2015-08-31 00:40:05.229652 7f435c791700 10 osd.3 pg_epoch: 1164 pg[0.277( empty local-les=0 n=0 ec=1 les/c 1118/1119 1164/1164/1156) [3,5]/[5,0] r=-1 lpr=1164 pi=1116-1163/7 crt=0'0 remapped NOTIFY] handle_query query(info 0'0) from replica 5
2015-08-31 00:40:05.229680 7f435c791700 10 osd.3 pg_epoch: 1164 pg[0.277( empty local-les=0 n=0 ec=1 les/c 1118/1119 1164/1164/1156) [3,5]/[5,0] r=-1 lpr=1164 pi=1116-1163/7 crt=0'0 remapped NOTIFY] old_peering_msg reply_epoch 4 query_epoch 4 last_peering_reset 1164
2015-08-31 00:40:05.229709 7f435c791700 15 osd.3 1164 project_pg_history 0.284 from 3 to 1164, start ec=1 les/c 1114/1115 1164/1164/988
2015-08-31 00:40:05.282767 7f435c791700 -1 osd/osd_types.cc: In function 'bool pg_t::is_split(unsigned int, unsigned int, std::set<pg_t>*) const' thread 7f435c791700 time 2015-08-31 00:40:05.230201
osd/osd_types.cc: 459: FAILED assert(m_seed < old_pg_num)

 ceph version 9.0.3-1035-g5462635 (54626351679fe312d5b96cc0304755ae5f1ece40)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f4363c110a5]
 2: (()+0x40523c) [0x7f43637ae23c]
 3: (spg_t::is_split(unsigned int, unsigned int, std::set<spg_t, std::less<spg_t>, std::allocator<spg_t> >*) const+0x6e) [0x7f4363703aee]
 4: (OSD::project_pg_history(spg_t, pg_history_t&, unsigned int, std::vector<int, std::allocator<int> > const&, int, std::vector<int, std::allocator<int> > const&, int)+0x389) [0x7f43636c3e79]
 5: (OSD::handle_pg_query(std::shared_ptr<OpRequest>)+0xb01) [0x7f43636c9f71]
 6: (OSD::dispatch_op(std::shared_ptr<OpRequest>)+0x108) [0x7f43636de0a8]
 7: (OSD::_dispatch(Message*)+0x225) [0x7f43636df165]
 8: (OSD::ms_dispatch(Message*)+0x21f) [0x7f43636df81f]
 9: (Messenger::ms_deliver_dispatch(Message*)+0x77) [0x7f4363d18407]
 10: (EventCenter::process_events(int)+0x6fc) [0x7f4363cd257c]
 11: (Worker::entry()+0xf0) [0x7f4363cb0da0]
 12: (()+0x7df5) [0x7f4361d20df5]
 13: (clone()+0x6d) [0x7f43605c91ad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions #1

Updated by Sage Weil over 8 years ago

  • Subject changed from osd/osd_types.cc: 459: FAILED assert(m_seed < old_pg_num) to osd/osd_types.cc: 459: FAILED assert(m_seed < old_pg_num) (asyncmsgr encoding problem?)

The problem is that "from" epcoh is 3 here.. but it is filled in from

    bool valid_history = project_pg_history(
      pgid, history, it->second.epoch_sent,
      up, up_primary, acting, acting_primary);

and the epoch was 1164 on the sender.

AsyncMessenger.. maybe the encoding is getting goofed? In MOSDPGQuery we pass features bits in to the _pg_list encoding.. if hte feature were zeroed we wouldn't encode the sent_epoch...

Actions #2

Updated by Sage Weil over 8 years ago

  • Status changed from New to 7
Actions #3

Updated by Sage Weil over 8 years ago

  • Status changed from 7 to Resolved

Merged probable fix.

Actions

Also available in: Atom PDF