Actions
Bug #22752
closedsnapmapper inconsistency, crash on luminous
% Done:
0%
Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
from Stefan Priebe on ceph-devel ML:
Date: Tue, 16 Jan 2018 02:23:17 +0100 From: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> To: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org> Subject: Ceph Luminous - pg is down due to src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) Hello, currently one of my clusters is missing a whole pg due to all 3 osds being down. All of them fail with: 0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1 /build/ceph/src/osd/SnapMapper.cc: In function 'void SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&, MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)' thread 7f944dbfe700 time 2018-01-16 02:05:33.349946 /build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2) ceph version 12.2.2-93-gd6da8d7 (d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x561f9ff0b1e2] 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t, std::less<snapid_t>, std::allocator<snapid_t> > const&, MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b) [0x561f9fb76f3b] 3: (PG::update_snap_map(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f] 4: (PG::append_log(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t, ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018] 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, boost::optional<pg_hit_set_history_t> const&, eversion_t const&, eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64] 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92) [0x561f9fc314b2] 7: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4) [0x561f9fc374f4] ...
A few things:
1. where did teh inconsistency come from?
2. osd shouldn't crash in this case.. we should make the trimmer etc tolerant, and make scrub repair it.
Actions