Project

General

Profile

Actions

Bug #13668

closed

OSD always crashes while doing scrub with new store as the backend

Added by Wenjun Huang over 8 years ago. Updated almost 7 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi, guys

I have built a simple test cluster with three osds which are using newstore as the backend. But when it is doing scrub, there are always two osds(replica osd for the specified pg) crash. The stack trace of the crash is as below:

#0  0x00007ff1610c3ffb in raise () from /lib64/libpthread.so.0
#1  0x00007ff162f0438d in reraise_fatal (signum=6) at global/signal_handler.cc:59
#2  handle_fatal_signal (signum=6) at global/signal_handler.cc:109
#3  <signal handler called>
#4  0x00007ff15f8a45d7 in raise () from /lib64/libc.so.6
#5  0x00007ff15f8a5cc8 in abort () from /lib64/libc.so.6
#6  0x00007ff1601a89b5 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
#7  0x00007ff1601a6926 in ?? () from /lib64/libstdc++.so.6
#8  0x00007ff1601a6953 in std::terminate() () from /lib64/libstdc++.so.6
#9  0x00007ff1601a6b73 in __cxa_throw () from /lib64/libstdc++.so.6
#10 0x00007ff162ff0928 in ceph::__ceph_assert_fail (assertion=assertion@entry=0x7ff1631689e4 "k >= start_key && k < end_key", 
    file=file@entry=0x7ff1631682c8 "os/newstore/NewStore.cc", line=line@entry=1591, 
    func=func@entry=0x7ff16316ab60 <NewStore::collection_list(coll_t, ghobject_t, ghobject_t, bool, int, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)::__PRETTY_FUNCTION__> "virtual int NewStore::collection_list(coll_t, ghobject_t, ghobject_t, bool, int, std::vector<ghobject_t>*, ghobject_t*)")
    at common/assert.cc:77
#11 0x00007ff162d516b4 in NewStore::collection_list (this=0x7ff167b62000, cid=..., start=..., end=..., sort_bitwise=sort_bitwise@entry=true, 
    max=max@entry=2147483647, ls=ls@entry=0x7ff13b38fd60, pnext=0x7ff13b38eb20, pnext@entry=0x0) at os/newstore/NewStore.cc:1591
#12 0x00007ff162c89ac2 in PGBackend::objects_list_range (this=0x7ff167d28f00, start=..., end=..., seq=..., seq@entry=..., ls=ls@entry=0x7ff13b38feb0, 
    gen_obs=gen_obs@entry=0x7ff13b38fed0) at osd/PGBackend.cc:159
#13 0x00007ff162ba4ba2 in PG::build_scrub_map_chunk (this=this@entry=0x7ff167dd6800, map=..., start=..., end=..., deep=deep@entry=false, seed=seed@entry=4294967295, 
    handle=...) at osd/PG.cc:3569
#14 0x00007ff162ba545e in PG::replica_scrub (this=this@entry=0x7ff167dd6800, op=std::shared_ptr (count 4, weak 0) 0x7ff167b54b00, handle=...) at osd/PG.cc:3664
#15 0x00007ff162bf868b in ReplicatedPG::do_request (this=0x7ff167dd6800, op=std::shared_ptr (count 4, weak 0) 0x7ff167b54b00, handle=...) at osd/ReplicatedPG.cc:1418
#16 0x00007ff162a6d43d in OSD::dequeue_op (this=0x7ff167c98000, pg=..., op=std::shared_ptr (count 4, weak 0) 0x7ff167b54b00, handle=...) at osd/OSD.cc:8470
#17 0x00007ff162a6d65d in PGQueueable::RunVis::operator() (this=this@entry=0x7ff13b390720, op=std::shared_ptr (count 4, weak 0) 0x7ff167b54b00) at osd/OSD.cc:154
#18 0x00007ff162a92267 in internal_visit<std::shared_ptr<OpRequest> > (operand=std::shared_ptr (count 4, weak 0) 0x7ff167b54b00, this=<synthetic pointer>)
    at /usr/include/boost/variant/variant.hpp:1017
#19 visitation_impl_invoke_impl<boost::detail::variant::invoke_visitor<PGQueueable::RunVis>, void*, std::shared_ptr<OpRequest> > (storage=0x7ff13b390910, 
    visitor=<synthetic pointer>) at /usr/include/boost/variant/detail/visitation_impl.hpp:130
#20 visitation_impl_invoke<boost::detail::variant::invoke_visitor<PGQueueable::RunVis>, void*, std::shared_ptr<OpRequest>, boost::variant<std::shared_ptr<OpRequest>, PGSnapTrim, PGScrub>::has_fallback_type_> (internal_which=<optimized out>, t=0x0, storage=0x7ff13b390910, visitor=<synthetic pointer>)
    at /usr/include/boost/variant/detail/visitation_impl.hpp:173
#21 visitation_impl<mpl_::int_<0>, boost::detail::variant::visitation_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<3l>, std::shared_ptr<OpRequest>, boost::mpl::l_item<mpl_::long_<2l>, PGSnapTrim, boost::mpl::l_item<mpl_::long_<1l>, PGScrub, boost::mpl::l_end> > > >, boost::mpl::l_iter<boost::mpl::l_end> >, boost::detail::variant::invoke_visitor<PGQueueable::RunVis>, void*, boost::variant<std::shared_ptr<OpRequest>, PGSnapTrim, PGScrub>::has_fallback_type_> (no_backup_flag=..., 
    storage=0x7ff13b390910, visitor=<synthetic pointer>, logical_which=<optimized out>, internal_which=<optimized out>)
    at /usr/include/boost/variant/detail/visitation_impl.hpp:256
#22 internal_apply_visitor_impl<boost::detail::variant::invoke_visitor<PGQueueable::RunVis>, void*> (storage=0x7ff13b390910, visitor=<synthetic pointer>, 
    logical_which=<optimized out>, internal_which=<optimized out>) at /usr/include/boost/variant/variant.hpp:2326
#23 internal_apply_visitor<boost::detail::variant::invoke_visitor<PGQueueable::RunVis> > (visitor=<synthetic pointer>, this=0x7ff13b390908)
    at /usr/include/boost/variant/variant.hpp:2337
#24 apply_visitor<PGQueueable::RunVis> (visitor=..., this=0x7ff13b390908) at /usr/include/boost/variant/variant.hpp:2360
#25 apply_visitor<PGQueueable::RunVis, boost::variant<std::shared_ptr<OpRequest>, PGSnapTrim, PGScrub> > (visitable=..., visitor=...)
---Type <return> to continue, or q <return> to quit---
    at /usr/include/boost/variant/detail/apply_visitor_unary.hpp:60
#26 run (handle=..., pg=..., osd=<optimized out>, this=0x7ff13b390908) at osd/OSD.h:379
#27 OSD::ShardedOpWQ::_process (this=0x7ff167c99140, thread_index=<optimized out>, hb=<optimized out>) at osd/OSD.cc:8358
#28 0x00007ff162fe116f in ShardedThreadPool::shardedthreadpool_worker (this=0x7ff167c98758, thread_index=<optimized out>) at common/WorkQueue.cc:340
#29 0x00007ff162fe3070 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at common/WorkQueue.h:592
#30 0x00007ff1610bcdf5 in start_thread () from /lib64/libpthread.so.0
#31 0x00007ff15f9651ad in clone () from /lib64/libc.so.6

I have dumped some variable value related to the crash:
k:         --.8000000000000001.00000000.!!!0000000000000000.ffffffffffffffff
start_key: --.8000000000000001.70000000.
start_key: --.8000000000000001.74000000.
And it seems that start and end are both equal to blank ghobject object.
(gdb) p start
$33 = {hobj = {oid = {name = ""}, snap = {val = 0}, hash = 0, max = false, nibblewise_key_cache = 0, hash_reverse_bits = 0, static POOL_META = -1, 
    static POOL_TEMP_START = -2, pool = 1, nspace = "", key = ""}, generation = 18446744073709551615, shard_id = {id = -1 '\377', static NO_SHARD = {id = -1 '\377', 
      static NO_SHARD = <same as static member of an already seen type>}}, max = false, static NO_GEN = 18446744073709551615}
(gdb) p end
$34 = {hobj = {oid = {name = ""}, snap = {val = 0}, hash = 0, max = true, nibblewise_key_cache = 0, hash_reverse_bits = 0, static POOL_META = -1, 
    static POOL_TEMP_START = -2, pool = 1, nspace = "", key = ""}, generation = 18446744073709551615, shard_id = {id = -1 '\377', static NO_SHARD = {id = -1 '\377', 
      static NO_SHARD = <same as static member of an already seen type>}}, max = false, static NO_GEN = 18446744073709551615}

The ceph version is: ceph version 9.1.0 (3be81ae6cf17fcf689cd6f187c4615249fea4f61)


Related issues 1 (0 open1 closed)

Related to Ceph - Bug #13801: Deep-scrub will crash osd if osd backend is newstoreResolvedSage Weil11/16/2015

Actions
Actions #1

Updated by Nathan Cutler over 8 years ago

  • Related to Bug #13801: Deep-scrub will crash osd if osd backend is newstore added
Actions #2

Updated by Sage Weil almost 7 years ago

  • Status changed from New to Rejected

newstore is replaced by bluestore

Actions

Also available in: Atom PDF