Project

General

Profile

Actions

Bug #41913

closed

With auto scaler operating stopping an OSD can lead to COT crashing instead of being able to operation on pg

Added by David Zafman over 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

qa/standalone/scrub/osd-unexpected-clone.sh doing

    objectstore_tool $dir $osd "$JSON" set-attr _ $dir/_ || return 1

The auto scaler causes crash of ceph-objectstore-tool

#0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x000055dfd4c366a4 in reraise_fatal (signum=6) at /home/dzafman/ceph/src/global/signal_handler.cc:81
#2  0x000055dfd4c37769 in handle_fatal_signal (signum=6) at /home/dzafman/ceph/src/global/signal_handler.cc:326
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#5  0x00007efc11071801 in __GI_abort () at abort.c:79
#6  0x00007efc13bff73c in ceph::__ceph_abort (file=0x55dfd51d0ea0 "/home/dzafman/ceph/src/os/bluestore/BlueStore.cc", line=3625,
    func=0x55dfd51d3068 "BlueStore::OnodeRef BlueStore::Collection::get_onode(const ghobject_t&, bool, bool)", msg="abort() called") at /home/dzafman/ceph/src/common/assert.cc:196
#7  0x000055dfd4a51697 in BlueStore::Collection::get_onode (this=0x55dfd8067f80, oid=..., create=true, is_createop=false) at /home/dzafman/ceph/src/os/bluestore/BlueStore.cc:3625
#8  0x000055dfd4aafe0f in BlueStore::_txc_add_transaction (this=0x55dfd818e000, txc=0x55dfd8fbe340, t=0x55dfd80ca6e0) at /home/dzafman/ceph/src/os/bluestore/BlueStore.cc:12339
#9  0x000055dfd4aae5ca in BlueStore::queue_transactions (this=0x55dfd818e000, ch=..., tls=std::vector of length 1, capacity 1 = {...}, op=..., handle=0x0)
    at /home/dzafman/ceph/src/os/bluestore/BlueStore.cc:12124
#10 0x000055dfd41e347a in ObjectStore::queue_transaction (this=0x55dfd818e000, ch=..., t=..., op=..., handle=0x0) at /home/dzafman/ceph/src/os/ObjectStore.h:220
#11 0x000055dfd41c2ea0 in do_set_attr (store=0x55dfd818e000, coll=..., ghobj=..., key="_", fd=39) at /home/dzafman/ceph/src/tools/ceph_objectstore_tool.cc:2309
#12 0x000055dfd41d0b71 in main (argc=7, argv=0x7fff3281b4c8) at /home/dzafman/ceph/src/tools/ceph_objectstore_tool.cc:4088

src/os/bluestore/BlueStore.cc
3613    BlueStore::OnodeRef BlueStore::Collection::get_onode(
3614      const ghobject_t& oid,
3615      bool create,
3616      bool is_createop)
3617    {
3618      ceph_assert(create ? ceph_mutex_is_wlocked(lock) : ceph_mutex_is_locked(lock));
3619
3620      spg_t pgid;
3621      if (cid.is_pg(&pgid)) {
3622        if (!oid.match(cnode.bits, pgid.ps())) {
3623          lderr(store->cct) << __func__ << " oid " << oid << " not part of " 
3624                            << pgid << " bits " << cnode.bits << dendl;
3625          ceph_abort();
3626        }
3627      }


Related issues 2 (0 open2 closed)

Related to RADOS - Bug #41900: auto-scaler breaks many standalone testsResolvedDavid Zafman09/17/2019

Actions
Related to RADOS - Bug #42476: ceph-objectstore-tool crashes trying to access meta objectsResolvedDavid Zafman10/24/2019

Actions
Actions #1

Updated by David Zafman over 4 years ago

  • Related to Bug #41900: auto-scaler breaks many standalone tests added
Actions #2

Updated by Sage Weil over 4 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 30501

the real bug here is that the pgid split so the pgid specified to COT is wrong. the attached PR adds a check in COT so that we don't crash in the guts of bluestore later.

Actions #3

Updated by David Zafman over 4 years ago

  • Related to Bug #42476: ceph-objectstore-tool crashes trying to access meta objects added
Actions #4

Updated by Neha Ojha almost 4 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF