Project

General

Profile

Actions

Bug #17857

closed

osd/osd_types.h: 4287: FAILED assert(rwstate.empty())

Added by Kefu Chai over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):


Related issues 2 (0 open2 closed)

Related to RADOS - Bug #17830: osd-scrub-repair.sh is failing (intermittently?) on JenkinsCan't reproduceDavid Zafman11/08/2016

Actions
Has duplicate Ceph - Bug #17860: osd/osd_types.h: 4287: FAILED assert(rwstate.empty())Duplicate11/10/2016

Actions
Actions #1

Updated by Kefu Chai over 7 years ago

ctest -R test-erasure-eio.sh

failed in TEST_rados_get_bad_size_shard_0

2016-11-11 15:23:07.863392 7f8f4bdef700 -1 /var/ceph/ceph/src/osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7f8f4bdef700 time 2016-11-11 15:23:07.833415
/var/ceph/ceph/src/osd/osd_types.h: 4287: FAILED assert(rwstate.empty())

 ceph version Development (no_version)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x562ee684c720]
 2: (ObjectContext::~ObjectContext()+0x41) [0x562ee609923b]                                                                                                                          3: (SharedLRU<hobject_t, ObjectContext, hobject_t::ComparatorWithDefault, std::hash<hobject_t> >::Cleanup::operator()(ObjectContext*)+0x40) [0x562ee611431a]
 4: (std::_Sp_counted_deleter<ObjectContext*, SharedLRU<hobject_t, ObjectContext, hobject_t::ComparatorWithDefault, std::hash<hobject_t> >::Cleanup, std::allocator<void>, (__gnu_c\xx::_Lock_policy)2>::_M_dispose()+0x33) [0x562ee6120315]
 5: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()+0x42) [0x562ee5d89930]
 6: (std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count()+0x27) [0x562ee5d88e87]
 7: (std::__shared_ptr<ObjectContext, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr()+0x1c) [0x562ee5eed062]
 8: (std::shared_ptr<ObjectContext>::~shared_ptr()+0x18) [0x562ee5eed0a4]
 9: (SharedLRU<hobject_t, ObjectContext, hobject_t::ComparatorWithDefault, std::hash<hobject_t> >::clear()+0xd3) [0x562ee60bbc19]
 10: (ReplicatedPG::on_change(ObjectStore::Transaction*)+0x96d) [0x562ee606c907]
 11: (PG::start_peering_interval(std::shared_ptr<OSDMap const>, std::vector<int, std::allocator<int> > const&, int, std::vector<int, std::allocator<int> > const&, int, ObjectStore\
::Transaction*)+0xea4) [0x562ee5f2bf6e]
 12: (PG::RecoveryState::Reset::react(PG::AdvMap const&)+0x514) [0x562ee5f3288e]
Actions #2

Updated by Kefu Chai over 7 years ago

we failed to release the read lock on object being recovered

the read lock is acquired by obc->get_recovery_read(), in ReplicatedPG::prep_object_replica_pushes(),

Actions #3

Updated by Kefu Chai over 7 years ago

  • Status changed from New to In Progress
  • Assignee set to Kefu Chai
Actions #4

Updated by Kefu Chai over 7 years ago

  • Status changed from In Progress to Fix Under Review
Actions #5

Updated by Loïc Dachary over 7 years ago

https://github.com/ceph/ceph/pull/11979/commits/8854cca4164f9184cc549ba0b90b44515933de8c disables test-erasure-eio.sh until this is fixed because it fails most of the time. To verify the fix is good, it is recommended to run the test in a loop on a machine that has a high load. For instance running a ./run-make-check.sh at the same time is enough to recreate conditions similar to jenkins on a machine such as rex001.

Actions #6

Updated by Kefu Chai over 7 years ago

  • Subject changed from ceph-helpers.sh:1103: wait_for_clean: returned 1 to osd/osd_types.h: 4287: FAILED assert(rwstate.empty())
  • Description updated (diff)
Actions #7

Updated by Kefu Chai over 7 years ago

  • Has duplicate Bug #17860: osd/osd_types.h: 4287: FAILED assert(rwstate.empty()) added
Actions #8

Updated by Kefu Chai over 7 years ago

  • Related to Bug #17830: osd-scrub-repair.sh is failing (intermittently?) on Jenkins added
Actions #9

Updated by Kefu Chai over 7 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF