Project

General

Profile

Bug #22752

snapmapper inconsistency, crash on luminous

Added by Sage Weil almost 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature:

Description

from Stefan Priebe on ceph-devel ML:


Date: Tue, 16 Jan 2018 02:23:17 +0100
From: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
To: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Ceph Luminous - pg is down due to src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)

Hello,

currently one of my clusters is missing a whole pg due to all 3 osds
being down.

All of them fail with:
    0> 2018-01-16 02:05:33.353293 7f944dbfe700 -1
/build/ceph/src/osd/SnapMapper.cc: In function 'void
SnapMapper::add_oid(const hobject_t&, const std::set<snapid_t>&,
MapCacher::Transaction<std::basic_string<char>, ceph::buffer::list>*)'
thread 7f944dbfe700 time 2018-01-16 02:05:33.349946
/build/ceph/src/osd/SnapMapper.cc: 246: FAILED assert(r == -2)

 ceph version 12.2.2-93-gd6da8d7
(d6da8d77a4b2220e6bdd61e4bdd911a9cd91946c) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x561f9ff0b1e2]
 2: (SnapMapper::add_oid(hobject_t const&, std::set<snapid_t,
std::less<snapid_t>, std::allocator<snapid_t> > const&,
MapCacher::Transaction<std::string, ceph::buffer::list>*)+0x64b)
[0x561f9fb76f3b]
 3: (PG::update_snap_map(std::vector<pg_log_entry_t,
std::allocator<pg_log_entry_t> > const&,
ObjectStore::Transaction&)+0x38f) [0x561f9fa0ae3f]
 4: (PG::append_log(std::vector<pg_log_entry_t,
std::allocator<pg_log_entry_t> > const&, eversion_t, eversion_t,
ObjectStore::Transaction&, bool)+0x538) [0x561f9fa31018]
 5: (PrimaryLogPG::log_operation(std::vector<pg_log_entry_t,
std::allocator<pg_log_entry_t> > const&,
boost::optional<pg_hit_set_history_t> const&, eversion_t const&,
eversion_t const&, bool, ObjectStore::Transaction&)+0x64) [0x561f9fb25d64]
 6: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xa92)
[0x561f9fc314b2]
 7:
(ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a4)
[0x561f9fc374f4]
...

A few things:

1. where did teh inconsistency come from?
2. osd shouldn't crash in this case.. we should make the trimmer etc tolerant, and make scrub repair it.


Related issues

Copied to RADOS - Backport #23500: luminous: snapmapper inconsistency, crash on luminous Resolved

History

#2 Updated by Sage Weil almost 2 years ago

  • Backport set to luminous

#3 Updated by Greg Farnum almost 2 years ago

  • Status changed from In Progress to Fix Under Review
  • Assignee set to Sage Weil

#4 Updated by Kefu Chai over 1 year ago

  • Status changed from Fix Under Review to Pending Backport

#5 Updated by Nathan Cutler over 1 year ago

  • Copied to Backport #23500: luminous: snapmapper inconsistency, crash on luminous added

#6 Updated by Nathan Cutler over 1 year ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF