Actions
Bug #8646
closedOSD: assert in share_map() when marked down by an OSDMap
Status:
Resolved
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Source:
Community (dev)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
0> 2014-06-09 18:06:22.922629 7fcfab369700 -1 osd/OSD.cc: In function 'void OSDService::share_map(entity_name_t, Connection*, epoch_t, OSDMapRef&, epoch_t*)' thread 7fcfab369700 time 2014-06-09 18:06:22.921311 osd/OSD.cc: 4781: FAILED assert(osd->is_active() || osd->is_stopping()) ceph version andisk-sprint-2-drop-3-390-g2dbd85c (2dbd85c94cf27a1ff0419c5ea9359af7fe30e9b6) 1: (OSDService::share_map(entity_name_t, Connection*, unsigned int, std::tr1::shared_ptr<OSDMap const>&, unsigned int*)+0x58f) [0x6351df] 2: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x182) [0x635442] 3: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x346) [0x635ce6] 4: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ce) [0xa4a1ce] 5: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xa4c420] 6: (()+0x8182) [0x7fcfc4a7d182] 7: (clone()+0x6d) [0x7fcfc2e1e30d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
This is from a custom build, but the issue exists in master. We're calling share_map in OSD::dequeue_op(), but we might be dequeuing after changing the OSD state to STATE_WAITING_FOR_HEALTHY. I think the fix is just to condition trying to call share_map on actually being STATE_ACTIVE.
Actions