Project

General

Profile

Bug #53002

crash BlueStore::Onode::put from BlueStore::TransContext::~TransContext

Added by Dan van der Ster over 1 year ago. Updated 17 days ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific, octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We've just seen this crash in the wild running 15.2.14. Maybe a dup of #50788?

   -14> 2021-10-21T09:42:31.079+0200 7f88e1b2c700  5 prioritycache tune_memory target: 3221225472 mapped: 3201368064 unmapped: 466845696 heap: 3668213760 old mem: 1932735267 new mem: 19327352
67
   -13> 2021-10-21T09:42:31.924+0200 7f88dde53700 10 monclient: tick
   -12> 2021-10-21T09:42:31.924+0200 7f88dde53700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2021-10-21T09:42:01.925680+0200)
   -11> 2021-10-21T09:42:32.062+0200 7f88dd323700  5 bluestore(/var/lib/ceph/osd/ceph-138) _kv_sync_thread utilization: idle 9.861052482s of 10.001157459s, submitted: 477
   -10> 2021-10-21T09:42:32.080+0200 7f88e1b2c700  5 prioritycache tune_memory target: 3221225472 mapped: 3201417216 unmapped: 466796544 heap: 3668213760 old mem: 1932735267 new mem: 19327352
67
    -9> 2021-10-21T09:42:32.080+0200 7f88e1b2c700  5 bluestore.MempoolThread(0x55a9f3e04a08) _resize_shards cache_size: 1932735267 kv_alloc: 889192448 kv_used: 586783984 meta_alloc: 813694976
 meta_used: 511074366 data_alloc: 218103808 data_used: 0
    -8> 2021-10-21T09:42:32.115+0200 7f88cd509700  0 <cls> /builddir/build/BUILD/ceph-15.2.14/src/cls/lock/cls_lock.cc:290: Could not read list of current lockers off disk: (2) No such file o
r directory
    -7> 2021-10-21T09:42:32.925+0200 7f88dde53700 10 monclient: tick
    -6> 2021-10-21T09:42:32.925+0200 7f88dde53700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2021-10-21T09:42:02.925792+0200)
    -5> 2021-10-21T09:42:33.082+0200 7f88e1b2c700  5 prioritycache tune_memory target: 3221225472 mapped: 3201490944 unmapped: 466722816 heap: 3668213760 old mem: 1932735267 new mem: 19327352
67
    -4> 2021-10-21T09:42:33.111+0200 7f88c9501700  0 <cls> /builddir/build/BUILD/ceph-15.2.14/src/cls/lock/cls_lock.cc:290: Could not read list of current lockers off disk: (2) No such file o
r directory
    -3> 2021-10-21T09:42:33.206+0200 7f88c8d00700  5 osd.138 360301 heartbeat osd_stat(store_statfs(0xa9ee097000/0x193950000/0xdf90000000, data 0x3408e34bb0/0x340e617000, compress 0x0/0x0/0x0
, omap 0x2dc5721e, meta 0x165cf8de2), peers [1,2,3,12,16,21,23,24,27,29,34,35,41,42,45,49,52,55,63,68,70,71,72,77,79,82,83,85,105,108,113,119,124,131,133,137,139,149,150,152,156,161,167,170,1
75,180,206,211,212,213,217,236,240,245,247,250,252,259,265,269,272,273,274,275,277,280,287] op hist [])
    -2> 2021-10-21T09:42:33.367+0200 7f88cd509700  0 <cls> /builddir/build/BUILD/ceph-15.2.14/src/cls/lock/cls_lock.cc:290: Could not read list of current lockers off disk: (2) No such file o
r directory
    -1> 2021-10-21T09:42:33.440+0200 7f88cc507700  0 <cls> /builddir/build/BUILD/ceph-15.2.14/src/cls/lock/cls_lock.cc:290: Could not read list of current lockers off disk: (2) No such file o
r directory
     0> 2021-10-21T09:42:33.457+0200 7f88e232d700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f88e232d700 thread_name:bstore_kv_final

 ceph version 15.2.14-7 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)
 1: (()+0xf630) [0x7f88f0f8f630]
 2: (BlueStore::Onode::put()+0x2eb) [0x55a9e87de1fb]
 3: (std::_Rb_tree<boost::intrusive_ptr<BlueStore::Onode>, boost::intrusive_ptr<BlueStore::Onode>, std::_Identity<boost::intrusive_ptr<BlueStore::Onode> >, std::less<boost::intrusive_ptr<BlueStore::Onode> >, std::allocator<boost::intrusive_ptr<BlueStore::Onode> > >::_M_erase(std::_Rb_tree_node<boost::intrusive_ptr<BlueStore::Onode> >*)+0x2d) [0x55a9e888297d]
 4: (BlueStore::TransContext::~TransContext()+0x107) [0x55a9e8882aa7]
 5: (BlueStore::_txc_finish(BlueStore::TransContext*)+0x231) [0x55a9e8854041]
 6: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x1fc) [0x55a9e8854b7c]
 7: (BlueStore::_kv_finalize_thread()+0x552) [0x55a9e8857a52]
 8: (BlueStore::KVFinalizeThread::entry()+0xd) [0x55a9e8887edd]
 9: (()+0x7ea5) [0x7f88f0f87ea5]
 10: (clone()+0x6d) [0x7f88efe4a9fd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

The fsck is clean:

# ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-138/
fsck success

We have the coredump and could check anything...

(gdb) bt
#0  0x00007f88f0f8f4fb in raise () from /lib64/libpthread.so.0
#1  0x000055a9e89501b2 in reraise_fatal (signum=11)
    at /usr/src/debug/ceph-15.2.14/src/global/signal_handler.cc:326
#2  handle_fatal_signal(int) () at /usr/src/debug/ceph-15.2.14/src/global/signal_handler.cc:326
#3  <signal handler called>
#4  0x000055a9e87de1fb in lock (this=<optimized out>)
    at /opt/rh/devtoolset-8/root/usr/include/c++/8/mutex:110
#5  BlueStore::Onode::put (this=0x55aa7ea2b440)
    at /usr/src/debug/ceph-15.2.14/src/os/bluestore/BlueStore.cc:3588
#6  0x000055a9e888297d in intrusive_ptr_release (o=<optimized out>)
    at /usr/src/debug/ceph-15.2.14/src/os/bluestore/BlueStore.h:3370
#7  ~intrusive_ptr (this=0x55aa49c74c20, __in_chrg=<optimized out>)
    at /usr/src/debug/ceph-15.2.14/build/boost/include/boost/smart_ptr/intrusive_ptr.hpp:98
#8  destroy<boost::intrusive_ptr<BlueStore::Onode> > (this=0x55aa8e89d578, __p=0x55aa49c74c20)
    at /opt/rh/devtoolset-8/root/usr/include/c++/8/ext/new_allocator.h:140
#9  destroy<boost::intrusive_ptr<BlueStore::Onode> > (__a=..., __p=0x55aa49c74c20)
    at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/alloc_traits.h:487
#10 _M_destroy_node (this=0x55aa8e89d578, __p=0x55aa49c74c00)
    at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:661
#11 _M_drop_node (this=0x55aa8e89d578, __p=0x55aa49c74c00)
    at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:669
#12 std::_Rb_tree<boost::intrusive_ptr<BlueStore::Onode>, boost::intrusive_ptr<BlueStore::Onode>, std::_Identity<boost::intrusive_ptr<BlueStore::Onode> >, std::less<boost::intrusive_ptr<BlueStore::Onode> >, std::allocator<boost::intrusive_ptr<BlueStore::Onode> > >::_M_erase (
    this=this@entry=0x55aa8e89d578, __x=0x55aa49c74c00)
    at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:1874
#13 0x000055a9e8882aa7 in ~_Rb_tree (this=0x55aa8e89d578, __in_chrg=<optimized out>)
    at /usr/src/debug/ceph-15.2.14/src/os/bluestore/BlueStore.h:1595
#14 ~set (this=0x55aa8e89d578, __in_chrg=<optimized out>)
    at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_set.h:281
#15 BlueStore::TransContext::~TransContext (this=0x55aa8e89d500, __in_chrg=<optimized out>)
    at /usr/src/debug/ceph-15.2.14/src/os/bluestore/BlueStore.h:1594
#16 0x000055a9e8854041 in ~TransContext (this=0x55aa8e89d500, __in_chrg=<optimized out>)
    at /usr/src/debug/ceph-15.2.14/src/os/bluestore/BlueStore.cc:11993
#17 BlueStore::_txc_finish(BlueStore::TransContext*) ()
    at /usr/src/debug/ceph-15.2.14/src/os/bluestore/BlueStore.cc:11993
#18 0x000055a9e8854b7c in BlueStore::_txc_state_proc(BlueStore::TransContext*) ()
    at /usr/src/debug/ceph-15.2.14/src/os/bluestore/BlueStore.cc:11709
#19 0x000055a9e8857a52 in BlueStore::_kv_finalize_thread() ()
    at /usr/src/debug/ceph-15.2.14/src/os/bluestore/BlueStore.cc:12556
#20 0x000055a9e8887edd in BlueStore::KVFinalizeThread::entry (this=<optimized out>)
    at /usr/src/debug/ceph-15.2.14/src/os/bluestore/BlueStore.h:1912
#21 0x00007f88f0f87ea5 in start_thread () from /lib64/libpthread.so.0
#22 0x00007f88efe4a9fd in clone () from /lib64/libc.so.6
(gdb) 

(gdb) up
#1  0x000055a9e89501b2 in reraise_fatal (signum=11)
    at /usr/src/debug/ceph-15.2.14/src/global/signal_handler.cc:326
326        reraise_fatal(signum);
(gdb) up
#2  handle_fatal_signal(int) ()
    at /usr/src/debug/ceph-15.2.14/src/global/signal_handler.cc:326
326        reraise_fatal(signum);
(gdb) up
#3  <signal handler called>
(gdb) up
#4  0x000055a9e87de1fb in lock (this=<optimized out>)
    at /opt/rh/devtoolset-8/root/usr/include/c++/8/mutex:110
110    /opt/rh/devtoolset-8/root/usr/include/c++/8/mutex: No such file or directory.
(gdb) up
#5  BlueStore::Onode::put (this=0x55aa7ea2b440)
    at /usr/src/debug/ceph-15.2.14/src/os/bluestore/BlueStore.cc:3588
3588          ocs->lock.lock();
(gdb) list
3583        ocs->lock.lock();
3584        // It is possible that during waiting split_cache moved us to different OnodeCacheShard.
3585        while (ocs != c->get_onode_cache()) {
3586          ocs->lock.unlock();
3587          ocs = c->get_onode_cache();
3588          ocs->lock.lock();
3589        }
3590        bool need_unpin = pinned;
3591        pinned = pinned && nref > 2; // intentionally use > not >= as we have
3592                                     // +1 due to pinned state
(gdb) up
#6  0x000055a9e888297d in intrusive_ptr_release (o=<optimized out>)
    at /usr/src/debug/ceph-15.2.14/src/os/bluestore/BlueStore.h:3370
3370      o->put();
(gdb) list
3365    
3366    static inline void intrusive_ptr_add_ref(BlueStore::Onode *o) {
3367      o->get();
3368    }
3369    static inline void intrusive_ptr_release(BlueStore::Onode *o) {
3370      o->put();
3371    }
3372    
3373    static inline void intrusive_ptr_add_ref(BlueStore::OpSequencer *o) {
3374      o->get();
(gdb) up
#7  ~intrusive_ptr (this=0x55aa49c74c20, __in_chrg=<optimized out>)
    at /usr/src/debug/ceph-15.2.14/build/boost/include/boost/smart_ptr/intrusive_ptr.hpp:98
98            if( px != 0 ) intrusive_ptr_release( px );
(gdb) list
93            if( px != 0 ) intrusive_ptr_add_ref( px );
94        }
95    
96        ~intrusive_ptr()
97        {
98            if( px != 0 ) intrusive_ptr_release( px );
99        }
100    
101    #if !defined(BOOST_NO_MEMBER_TEMPLATES) || defined(BOOST_MSVC6_MEMBER_TEMPLATES)
102    
(gdb) up
#8  destroy<boost::intrusive_ptr<BlueStore::Onode> > (this=0x55aa8e89d578, __p=0x55aa49c74c20)
    at /opt/rh/devtoolset-8/root/usr/include/c++/8/ext/new_allocator.h:140
140    /opt/rh/devtoolset-8/root/usr/include/c++/8/ext/new_allocator.h: No such file or directory.
(gdb) up
#9  destroy<boost::intrusive_ptr<BlueStore::Onode> > (__a=..., __p=0x55aa49c74c20)
    at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/alloc_traits.h:487
487    /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/alloc_traits.h: No such file or directory.
(gdb) up
#10 _M_destroy_node (this=0x55aa8e89d578, __p=0x55aa49c74c00)
    at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:661
661    /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h: No such file or directory.
(gdb) up
#11 _M_drop_node (this=0x55aa8e89d578, __p=0x55aa49c74c00)
    at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:669
669    in /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h
(gdb) up
#12 std::_Rb_tree<boost::intrusive_ptr<BlueStore::Onode>, boost::intrusive_ptr<BlueStore::Onode>, std::_Identity<boost::intrusive_ptr<BlueStore::Onode> >, std::less<boost::intrusive_ptr<BlueStore::Onode> >, std::allocator<boost::intrusive_ptr<BlueStore::Onode> > >::_M_erase (
    this=this@entry=0x55aa8e89d578, __x=0x55aa49c74c00)
    at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:1874
1874    in /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h
(gdb) up
#13 0x000055a9e8882aa7 in ~_Rb_tree (this=0x55aa8e89d578, __in_chrg=<optimized out>)
    at /usr/src/debug/ceph-15.2.14/src/os/bluestore/BlueStore.h:1595
1595          delete deferred_txn;
(gdb) list
1590          if (on_commits) {
1591        oncommits.swap(*on_commits);
1592          }
1593        }
1594        ~TransContext() {
1595          delete deferred_txn;
1596        }
1597    
1598        void write_onode(OnodeRef &o) {
1599          onodes.insert(o);
(gdb) 

Related issues

Related to bluestore - Bug #50788: crash in BlueStore::Onode::put() Duplicate
Related to bluestore - Bug #47740: OSD crash when increase pg_num Duplicate
Duplicates bluestore - Bug #56174: rook-ceph-osd crash randomly Duplicate
Duplicates bluestore - Bug #54727: crash: __pthread_mutex_lock() Duplicate
Duplicates bluestore - Bug #56200: crash: ceph::buffer::ptr::release() Duplicate
Duplicates bluestore - Bug #54650: crash: BlueStore::Onode::put() Duplicate
Duplicated by bluestore - Bug #58439: octopus osd crash Duplicate
Copied to bluestore - Backport #53608: pacific: crash BlueStore::Onode::put from BlueStore::TransContext::~TransContext Resolved
Copied to bluestore - Backport #53609: octopus: crash BlueStore::Onode::put from BlueStore::TransContext::~TransContext Resolved

History

#1 Updated by Igor Fedotov over 1 year ago

  • Related to Bug #50788: crash in BlueStore::Onode::put() added

#2 Updated by Igor Fedotov over 1 year ago

Dan van der Ster wrote:

We've just seen this crash in the wild running 15.2.14. Maybe a dup of #50788?

I'm pretty sure it is...

Aren't there any indications of a recent PG split?

#3 Updated by Dan van der Ster over 1 year ago

Igor Fedotov wrote:

Dan van der Ster wrote:

We've just seen this crash in the wild running 15.2.14. Maybe a dup of #50788?

I'm pretty sure it is...

Aren't there any indications of a recent PG split?

Not recently AFAIK... we have nopgchange set on all the pools.

#4 Updated by Dan van der Ster over 1 year ago

More context: the cluster was upgraded from 14.2.20 to 15.2.14 two weeks ago. We've never seen this before today; it happened only once on only this OSD so far.

#5 Updated by Dan van der Ster over 1 year ago

In frame 7 I can print the Onode. Some of the vals look quite strange (but I don't know if that's normal):

(gdb) f
#7  ~intrusive_ptr (this=0x55aa49c74c20, __in_chrg=<optimized out>)
    at /usr/src/debug/ceph-15.2.14/build/boost/include/boost/smart_ptr/intrusive_ptr.hpp:98
98            if( px != 0 ) intrusive_ptr_release( px );
(gdb) list
93            if( px != 0 ) intrusive_ptr_add_ref( px );
94        }
95    
96        ~intrusive_ptr()
97        {
98            if( px != 0 ) intrusive_ptr_release( px );
99        }
100    
101    #if !defined(BOOST_NO_MEMBER_TEMPLATES) || defined(BOOST_MSVC6_MEMBER_TEMPLATES)
102    
(gdb) p px
$11 = (BlueStore::Onode *) 0x55aa7ea2b440
(gdb) p *px
$12 = {nref = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = 1024138560}, 
    static is_always_lock_free = true}, c = 0x200, oid = {hobj = {static POOL_META = -1, 
      static POOL_TEMP_START = -2, oid = {
        name = <error reading variable: Cannot access memory at address 0x55aaffffffe7>}, 
      snap = {val = 8295752894954156584}, hash = 543712117, max = 102, 
      nibblewise_key_cache = 544370464, hash_reverse_bits = 1701996900, pool = 521610949731, 
      nspace = "cta-cristina", key = ""}, generation = 18446744073709551615, shard_id = {
      id = -1 '\377', static NO_SHARD = {id = -1 '\377', 
        static NO_SHARD = <same as static member of an already seen type>}}, max = false, 
    static NO_GEN = 18446744073709551615}, key = "", 
  lru_item = {<boost::intrusive::generic_hook<(boost::intrusive::algo_types)0, boost::intrusive::list_node_traits<void*>, boost::intrusive::member_tag, (boost::intrusive::link_mode_type)1, (boost::intrusive::base_hook_type)0>> = {<boost::intrusive::list_node<void*>> = {next_ = 0x0, 
        prev_ = 0x0}, <boost::intrusive::hook_tags_definer<boost::intrusive::generic_hook<(boost::intrusive::algo_types)0, boost::intrusive::list_node_traits<void*>, boost::intrusive::member_tag, (boost::intrusive::link_mode_type)1, (boost::intrusive::base_hook_type)0>, 0>> = {<No data fields>}, <No data fields>}, <No data fields>}, onode = {nid = 0, size = 0, 
    attrs = std::map with 0 elements, 
    extent_map_shards = std::vector of length 0, capacity 0, expected_object_size = 0, 
    expected_write_size = 0, alloc_hint_flags = 0, flags = 0 '\000'}, exists = false, 
  cached = false, pinned = {_M_base = {static _S_alignment = 1, _M_i = false}, 
    static is_always_lock_free = true}, extent_map = {onode = 0x55aa7ea2b440, 
    extent_map = {<boost::intrusive::set_impl<boost::intrusive::bhtraits<BlueStore::Extent, boost::intrusive::rbtree_node_traits<void*, true>, (boost::intrusive::link_mode_type)1, boost::intrusive::dft_tag, 3>, void, void, unsigned long, true, void>> = {<boost::intrusive::bstree_impl<boost::intrusive::bhtraits<BlueStore::Extent, boost::intrusive::rbtree_node_traits<void*, true>, (boost::intrusive::link_mode_type)1, boost::intrusive::dft_tag, 3>, void, void, unsigned long, true, (boost::intrusive::algo_types)5, void>> = {<boost::intrusive::bstbase<boost::intrusive::bhtraits<BlueStore::Extent, boost::intrusive::rbtree_node_traits<void*, true>, (boost::intrusive::link_mode_type)1, boost::intrusive::dft_tag, 3>, void, void, true, unsigned long, (boost::intrusive::algo_types)5, void>> = {<boost::intrusive::bstbase_hack<boost::intrusive::bhtraits<BlueStore::Extent, boost::intrusive::rbtree_node_traits<void*, true>, (boost::intrusive::link_mode_type)1, boost::intrusive::dft_tag, 3>, void, void, true, unsigned long, (boost::intrusive::algo_types)5, void>> = {<boost::intrusive::detail::size_holder<true, unsigned long, void>> = {
                static constant_time_size = <optimized out>, 
                size_ = 0}, <boost::intrusive::bstbase2<boost::intrusive::bhtraits<BlueStore::Extent, boost::intrusive::rbtree_node_traits<void*, true>, (boost::intrusive::link_mode_type)1, boost::intrusive::dft_tag, 3>, void, void, (boost::intrusive::algo_types)5, void>> = {<boost::intrusive::detail::ebo_functor_holder<boost::intrusive::tree_value_compare<BlueStore::Extent*, std::less<BlueStore::Extent>, boost::move_detail::identity<BlueStore::Extent>, bool, true>, void, false>> = {<boost::intrusive::tree_value_compare<BlueStore::Extent*, std::less<BlueStore::Extent>, boost::move_detail::identity<BlueStore::Extent>, bool, true>> = {<boost::intrusive::detail::ebo_functor_holder<std::less<BlueStore::Extent>, void, false>> = {<std::less<BlueStore::Extent>> = {<std::binary_function<BlueStore::Extent, BlueStore::Extent, bool>> = {<No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <No data fields>}, <boost::intrusive::bstbase3<boost::intrusive::bhtraits<BlueStore::Extent, boost::intrusive::rbtree_node_traits<void*, true>, (boost::intrusive::link_mode_type)1, boost::intrusive::dft_tag, 3>, (boost::intrus---Type <return> to continue, or q <return> to quit---
ive::algo_types)5, void>> = {static safemode_or_autounlink = <optimized out>, 
                  static stateful_value_traits = <optimized out>, 
                  static has_container_from_iterator = <optimized out>, 
                  holder = {<boost::intrusive::bhtraits<BlueStore::Extent, boost::intrusive::rbtree_node_traits<void*, true>, (boost::intrusive::link_mode_type)1, boost::intrusive::dft_tag, 3>> = {<boost::intrusive::bhtraits_base<BlueStore::Extent, boost::intrusive::compact_rbtree_node<void*>*, boost::intrusive::dft_tag, 3>> = {<No data fields>}, 
                      static link_mode = boost::intrusive::safe_link}, 
                    root = {<boost::intrusive::compact_rbtree_node<void*>> = {parent_ = 0x0, 
                        left_ = 0x55aa7ea2b540, 
                        right_ = 0x55aa7ea2b540}, <No data fields>}}}, <No data fields>}, <No data fields>}, <No data fields>}, static constant_time_size = true, 
          static stateful_value_traits = <optimized out>, 
          static safemode_or_autounlink = true}, 
        static constant_time_size = true}, <No data fields>}, 
    spanning_blob_map = std::map with 0 elements, 
    shards = std::vector of length 0, capacity 0, inline_bl = {_buffers = {_root = {
          next = 0x55aa7ea2b5c0}, _tail = 0x55aa7ea2b5c0}, 
      _carriage = 0x55a9f17a8d90 <ceph::buffer::v15_2_0::list::always_empty_bptr>, _len = 0, 
      _num = 0, static always_empty_bptr = {_raw = 0x0, _off = 0, _len = 0}}, 
    needs_reshard_begin = 0, needs_reshard_end = 0}, 
  flushing_count = {<std::__atomic_base<int>> = {static _S_alignment = 4, _M_i = 0}, 
    static is_always_lock_free = true}, waiting_count = {<std::__atomic_base<int>> = {
      static _S_alignment = 4, _M_i = 0}, static is_always_lock_free = true}, 
  flush_lock = {<std::__mutex_base> = {_M_mutex = {__data = {__lock = 0, __count = 0, 
          __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {
            __prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, 
        __align = 0}}, <No data fields>}, flush_cond = {_M_cond = {__data = {__lock = 1, 
        __futex = 0, __total_seq = 18446744073709551615, __wakeup_seq = 0, __woken_seq = 0, 
        __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, 
      __size = "\001\000\000\000\000\000\000\000\377\377\377\377\377\377\377\377", '\000' <repeats 31 times>, __align = 1}}}
(gdb) 

E.g. down in frame 5, `c` has address 0x200 ?!!

(gdb) f
#5  BlueStore::Onode::put (this=0x55aa7ea2b440)
    at /usr/src/debug/ceph-15.2.14/src/os/bluestore/BlueStore.cc:3588
3588          ocs->lock.lock();
(gdb) list
3583        ocs->lock.lock();
3584        // It is possible that during waiting split_cache moved us to different OnodeCacheShard.
3585        while (ocs != c->get_onode_cache()) {
3586          ocs->lock.unlock();
3587          ocs = c->get_onode_cache();
3588          ocs->lock.lock();
3589        }
3590        bool need_unpin = pinned;
3591        pinned = pinned && nref > 2; // intentionally use > not >= as we have
3592                                     // +1 due to pinned state
(gdb) p c
$16 = (BlueStore::Collection *) 0x200
(gdb) p *c
Cannot access memory at address 0x200

#6 Updated by Neha Ojha over 1 year ago

  • Assignee set to Igor Fedotov

#7 Updated by Igor Fedotov about 1 year ago

  • Status changed from New to In Progress
  • Pull request ID set to 43770

#8 Updated by Igor Fedotov about 1 year ago

  • Backport set to pacific, octopus

#9 Updated by Igor Fedotov about 1 year ago

  • Status changed from In Progress to Pending Backport

#10 Updated by Igor Fedotov about 1 year ago

  • Status changed from Pending Backport to Fix Under Review

#11 Updated by Igor Fedotov about 1 year ago

  • Status changed from Fix Under Review to Pending Backport

#12 Updated by Backport Bot about 1 year ago

  • Copied to Backport #53608: pacific: crash BlueStore::Onode::put from BlueStore::TransContext::~TransContext added

#13 Updated by Backport Bot about 1 year ago

  • Copied to Backport #53609: octopus: crash BlueStore::Onode::put from BlueStore::TransContext::~TransContext added

#14 Updated by Igor Fedotov 12 months ago

  • Status changed from Pending Backport to Resolved

#15 Updated by Igor Fedotov 7 months ago

  • Duplicates Bug #56174: rook-ceph-osd crash randomly added

#16 Updated by Igor Fedotov 6 months ago

  • Duplicates Bug #54727: crash: __pthread_mutex_lock() added

#17 Updated by Igor Fedotov 6 months ago

  • Duplicates Bug #56200: crash: ceph::buffer::ptr::release() added

#18 Updated by Igor Fedotov 6 months ago

  • Duplicates Bug #54650: crash: BlueStore::Onode::put() added

#19 Updated by Igor Fedotov 6 months ago

  • Related to Bug #47740: OSD crash when increase pg_num added

#20 Updated by Sven Kieske 6 months ago

according to https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/PPWIFPEI3EVBU3GQYYO6ABGF23WR5SGZ/ this is not resolved yet, could this be reopened, please?

#21 Updated by Igor Fedotov 6 months ago

  • Status changed from Resolved to New

Looks like this hasn't been completely fixed yet.
We've got a bunch of new tickets from Telemetry bot which indicate the same or similar symptoms (Onode::put is primarily involved) for Ceph releases which had got PR #43770 (and its backports).

Some of the cases from the field I observed personally:
1) 15.2.16
Aug 05 23:34:51 ceph-osd2861: ** Caught signal (Segmentation fault) *
Aug 05 23:34:51 ceph-osd2861: in thread 7f08cf3a0700 thread_name:tp_osd_tp
Aug 05 23:34:51 ceph-osd2861: ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable)
Aug 05 23:34:51 ceph-osd2861: 1: (()+0x12730) [0x7f08ec91e730]
Aug 05 23:34:51 ceph-osd2861: 2: (ceph::buffer::v15_2_0::ptr::release()+0x26) [0x5650f3904d26]
Aug 05 23:34:51 ceph-osd2861: 3: (BlueStore::Onode::put()+0x1a9) [0x5650f35b6a79]
Aug 05 23:34:51 ceph-osd2861: 4: (std::_Hashtable<ghobject_t, std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> >, mempool::pool_allocator<(mempool::pool_index_t)4, std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> > >, std::__detail::_Select1st, std::equal_to<ghobject_t>, std::hash<ghobject_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> >, true>)+0x64) [0x5650f3662ca4]
Aug 05 23:34:51 ceph-osd2861: 5: (BlueStore::OnodeSpace::_remove(ghobject_t const&)+0x290) [0x5650f35b68a0]
Aug 05 23:34:51 ceph-osd2861: 6: (LruOnodeCacheShard::_trim_to(unsigned long)+0xdb) [0x5650f36631db]
Aug 05 23:34:51 ceph-osd2861: 7: (BlueStore::OnodeSpace::add(ghobject_t const&, boost::intrusive_ptr<BlueStore::Onode>&)+0x48d) [0x5650f35b74cd]
Aug 05 23:34:51 ceph-osd2861: 8: (BlueStore::Collection::get_onode(ghobject_t const&, bool, bool)+0x453) [0x5650f35fdac3]
Aug 05 23:34:51 ceph-osd2861: 9: (BlueStore::_txc_add_transaction(BlueStore::TransContext
, ceph::os::Transaction*)+0x1dc3) [0x5650f3633353]
Aug 05 23:34:51 ceph-osd2861: 10: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x408) [0x5650f3634778]
Aug 05 23:34:51 ceph-osd2861: 11: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x54) [0x5650f32e7c14]
Aug 05 23:34:51 ceph-osd2861: 12: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xdf4) [0x5650f347b804]
Aug 05 23:34:51 ceph-osd2861: 13: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x267) [0x5650f348ad57]
Aug 05 23:34:51 ceph-osd2861: 14: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x57) [0x5650f331d917]
Aug 05 23:34:51 ceph-osd2861: 15: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x62f) [0x5650f32c14df]
Aug 05 23:34:51 ceph-osd2861: 16: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x325) [0x5650f3159d35]
Aug 05 23:34:51 ceph-osd2861: 17: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x64) [0x5650f339dea4]
Aug 05 23:34:51 ceph-osd2861: 18: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x12fa) [0x5650f317678a]
Aug 05 23:34:51 ceph-osd2861: 19: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4) [0x5650f37801f4]
Aug 05 23:34:51 ceph-osd2861: 20: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5650f3782c70]
Aug 05 23:34:51 ceph-osd2861: 21: (()+0x7fa3) [0x7f08ec913fa3]
Aug 05 23:34:51 ceph-osd2861: 22: (clone()+0x3f) [0x7f08ec4beeff]

or

Aug 05 00:33:29 ceph-osd2863: ** Caught signal (Segmentation fault) *
Aug 05 00:33:29 ceph-osd2863: in thread 7f4613a22700 thread_name:bstore_kv_final
Aug 05 00:33:29 ceph-osd2863: ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable)
Aug 05 00:33:29 ceph-osd2863: 1: (()+0x12730) [0x7f461ff7e730]
Aug 05 00:33:29 ceph-osd2863: 2: (BlueStore::Onode::put()+0x193) [0x564c15db8a63]
Aug 05 00:33:29 ceph-osd2863: 3: (std::_Rb_tree<boost::intrusive_ptr<BlueStore::Onode>, boost::intrusive_ptr<BlueStore::Onode>, std::_Identity<boost::intrusive_ptr<BlueStore::Onode> >, std::less<boost::intrusive_ptr<BlueStore::Onode> >, std::allocator<boost::intrusive_ptr<BlueStore::Onode> > >::_M_erase(std::_Rb_tree_node<boost::intrusive_ptr<BlueStore::Onode> >)+0x2d) [0x564c15e6460d]
Aug 05 00:33:29 ceph-osd2863: 4: (BlueStore::TransContext::~TransContext()+0x117) [0x564c15e64747]
Aug 05 00:33:29 ceph-osd2863: 5: (BlueStore::_txc_finish(BlueStore::TransContext
)+0x24b) [0x564c15e0bb8b]
Aug 05 00:33:29 ceph-osd2863: 6: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x234) [0x564c15e23744]
Aug 05 00:33:29 ceph-osd2863: 7: (BlueStore::_kv_finalize_thread()+0x552) [0x564c15e2e3e2]
Aug 05 00:33:29 ceph-osd2863: 8: (BlueStore::KVFinalizeThread::entry()+0xd) [0x564c15e69b8d]
Aug 05 00:33:29 ceph-osd2863: 9: (()+0x7fa3) [0x7f461ff73fa3]
Aug 05 00:33:29 ceph-osd2863: 10: (clone()+0x3f) [0x7f461fb1eeff]

2) different cluster at 15.2.16
backtrace:
0: (()+0x12730) [0x7fe8875d1730]
1: (gsignal()+0x10b) [0x7fe8870b07bb]
2: (abort()+0x121) [0x7fe88709b535]
3: (()+0x2240f) [0x7fe88709b40f]
4: (()+0x30102) [0x7fe8870a9102]
5: (()+0xeb47ca) [0x55e2237177ca]
6: (BlueStore::Onode::put()+0x2b1) [0x55e22372ab81]
7: (std::_Rb_tree<boost::intrusive_ptrBlueStore::Onode, boost::intrusive_ptrBlueStore::Onode, std::_Identity<boost::intrusive_ptrBlueStore::Onode >, std::less<boost::intrusive_ptrBlueStore::Onode >, std::allocator<boost::intrusive_ptrBlueStore::Onode > >::_M_erase(std::_Rb_tree_node<boost::intrusive_ptrBlueStore::Onode >)+0x2d) [0x55e2237d660d]
8: (BlueStore::TransContext::~TransContext()+0x124) [0x55e2237d6754]
9: (BlueStore::_txc_finish(BlueStore::TransContext)+0x24b) [0x55e22377db8b]
10: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x234) [0x55e223795744]
11: (BlueStore::_kv_finalize_thread()+0x552) [0x55e2237a03e2]
12: (BlueStore::KVFinalizeThread::entry()+0xd) [0x55e2237dbb8d]
13: (()+0x7fa3) [0x7fe8875c6fa3]
14: (clone()+0x3f) [0x7fe887171eff]

3) 16.2.9
Caught signal (Segmentation fault) *
2022-08-02 00:33:00 Ceph04 osd.21 in thread 7f2853f74700 thread_name:tp_osd_tp
2022-08-02 00:33:00 Ceph04 osd.21 ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable)
2022-08-02 00:33:00 Ceph04 osd.21 1: /lib64/libpthread.so.0(+0x168c0) [0x7f287a1e98c0]
2022-08-02 00:33:00 Ceph04 osd.21 2: (ceph::buffer::v15_2_0::ptr::release()+0xf) [0x55670639336f]
2022-08-02 00:33:00 Ceph04 osd.21 3: (BlueStore::Onode::put()+0x1bc) [0x55670601feac]
2022-08-02 00:33:00 Ceph04 osd.21 4: (std::_detail::_Hashtable_alloc<mempool::pool_allocator >, true> > >::_M_deallocate_node(std::_detail::_Hash_node<std::pair >, true>
)+0x35) [0x5567060d2365]</std::pair</mempool::pool_allocator
2022-08-02 00:33:00 Ceph04 osd.21 5: (std::Hashtable >, mempool::pool_allocator<(mempool::pool_index_t)4, std::pair > >, std::detail::_Select1st, std::equal_to, std::hash, std::detail::_Mod_range_hashing, std::detail::_Default_ranged_hash, std::detail::_Prime_rehash_policy, std::detail::_Hashtable_traits >::_M_erase(unsigned long, std::detail::_Hash_node_base*, std::_detail::_Hash_node<std::pair >, true>)+0x53) [0x5567060d27a3]</std::pair
2022-08-02 00:33:00 Ceph04 osd.21 6: (BlueStore::OnodeSpace::_remove(ghobject_t const&)+0x12c) [0x55670601fb5c]
2022-08-02 00:33:00 Ceph04 osd.21 7: (LruOnodeCacheShard::_trim_to(unsigned long)+0xce) [0x5567060d350e]
2022-08-02 00:33:00 Ceph04 osd.21 8: (BlueStore::OnodeSpace::add(ghobject_t const&, boost::intrusive_ptr&)+0x152) [0x5567060206a2]
2022-08-02 00:33:00 Ceph04 osd.21 9: (BlueStore::Collection::get_onode(ghobject_t const&, bool, bool)+0x299) [0x55670607fc39]
2022-08-02 00:33:00 Ceph04 osd.21 10: (BlueStore::_txc_add_transaction(BlueStore::TransContext
, ceph::os::Transaction*)+0x1d32) [0x55670608b722]
2022-08-02 00:33:00 Ceph04 osd.21 11: (BlueStore::queue_transactions(boost::intrusive_ptr&, std::vector >&, boost::intrusive_ptr, ThreadPool::TPHandle*)+0x2fa) [0x5567060a555a]
2022-08-02 00:33:00 Ceph04 osd.21 12: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector >&, boost::intrusive_ptr)+0x54) [0x556705ce5cf4]
2022-08-02 00:33:00 Ceph04 osd.21 13: (ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr, ECSubWrite&, ZTracer::Trace const&)+0xa4d) [0x556705eff87d]
2022-08-02 00:33:00 Ceph04 osd.21 14: (ECBackend::try_reads_to_commit()+0x2509) [0x556705f10759]
2022-08-02 00:33:00 Ceph04 osd.21 15: (ECBackend::check_ops()+0x1c) [0x556705f1202c]
2022-08-02 00:33:00 Ceph04 osd.21 16: (ECBackend::handle_sub_write_reply(pg_shard_t, ECSubWriteReply const&, ZTracer::Trace const&)+0xde) [0x556705f1217e]
2022-08-02 00:33:00 Ceph04 osd.21 17: (ECBackend::_handle_message(boost::intrusive_ptr)+0x1cf) [0x556705f17cef]
2022-08-02 00:33:00 Ceph04 osd.21 18: (PGBackend::handle_message(boost::intrusive_ptr)+0x87) [0x556705d34117]
2022-08-02 00:33:00 Ceph04 osd.21 19: (PrimaryLogPG::do_request(boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x684) [0x556705cd5264]
2022-08-02 00:33:00 Ceph04 osd.21 20: (OSD::dequeue_op(boost::intrusive_ptr, boost::intrusive_ptr, ThreadPool::TPHandle&)+0x159) [0x556705b5ee39]
2022-08-02 00:33:00 Ceph04 osd.21 21: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x67) [0x556705dbaef7]
2022-08-02 00:33:00 Ceph04 osd.21 22: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xcf5) [0x556705b7c625]
2022-08-02 00:33:00 Ceph04 osd.21 23: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x5567061e02ec]
2022-08-02 00:33:00 Ceph04 osd.21 24: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5567061e37b0]
2022-08-02 00:33:00 Ceph04 osd.21 25: /lib64/libpthread.so.0(+0xa6ea) [0x7f287a1dd6ea]
2022-08-02 00:33:00 Ceph04 osd.21 26: clone()

#22 Updated by Igor Fedotov 6 months ago

4) Quincy case from Telemetry: https://tracker.ceph.com/issues/56382

#23 Updated by Igor Fedotov 5 months ago

  • Status changed from New to In Progress

#24 Updated by Sven Kieske 3 months ago

We have almost daily crashes on our octopus cluster, which are also reported via telemetry, which look like this bug, could you confirm that these are the same, or if you need more information, just ask. I'm really waiting on a patch for this:

{
    "backtrace": [
        "(()+0x12980) [0x7f269ac06980]",
        "(ceph::buffer::v15_2_0::ptr::release()+0x26) [0x55fc3e524206]",
        "(BlueStore::Onode::put()+0x1c1) [0x55fc3e192a71]",
        "(std::_Rb_tree<boost::intrusive_ptr<BlueStore::Onode>, boost::intrusive_ptr<BlueStore::Onode>, std::_Identity<boost::intrusive_ptr<BlueStore::Onode> >, std::less<boost::intrusive_ptr<BlueStore::Onode> >, std::allocator<boost::intrusive_ptr<BlueStore::Onode> > >::_M_erase(std::_Rb_tree_node<boost::intrusive_ptr<BlueStore::Onode> >*)+0x2d) [0x55fc3e248a0d]",
        "(std::_Rb_tree<boost::intrusive_ptr<BlueStore::Onode>, boost::intrusive_ptr<BlueStore::Onode>, std::_Identity<boost::intrusive_ptr<BlueStore::Onode> >, std::less<boost::intrusive_ptr<BlueStore::Onode> >, std::allocator<boost::intrusive_ptr<BlueStore::Onode> > >::_M_erase(std::_Rb_tree_node<boost::intrusive_ptr<BlueStore::Onode> >*)+0x1b) [0x55fc3e2489fb]",
        "(BlueStore::TransContext::~TransContext()+0x124) [0x55fc3e248b54]",
        "(BlueStore::_txc_finish(BlueStore::TransContext*)+0x4b8) [0x55fc3e1d01b8]",
        "(BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x24c) [0x55fc3e1d1b7c]",
        "(BlueStore::_kv_finalize_thread()+0x48c) [0x55fc3e21b58c]",
        "(BlueStore::KVFinalizeThread::entry()+0xd) [0x55fc3e24d09d]",
        "(()+0x76db) [0x7f269abfb6db]",
        "(clone()+0x3f) [0x7f269999b61f]" 
    ],
    "ceph_version": "15.2.17",
    "crash_id": "2022-10-21T16:26:38.286992Z_ba5ffc75-58c3-45fc-9cda-950256b5efca",
    "entity_name": "osd.127",
    "os_id": "ubuntu",
    "os_name": "Ubuntu",
    "os_version": "18.04.6 LTS (Bionic Beaver)",
    "os_version_id": "18.04",
    "process_name": "ceph-osd",
    "stack_sig": "b2e4aac01a4b8acbb3878c39b0f5b1269edcccb6a90435e54b6958716a9e703e",
    "timestamp": "2022-10-21T16:26:38.286992Z",
    "utsname_hostname": "ceph-osd08",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.0-107-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#121~18.04.1-Ubuntu SMP Thu Mar 24 17:21:33 UTC 2022" 
}

#25 Updated by Yaarit Hatuka 3 months ago

Hi Sven,

Thanks for reporting telemetry! The issue you reported is tracked in https://tracker.ceph.com/issues/56200, which is marked as a duplicate to this tracker (https://tracker.ceph.com/issues/53002), so indeed they are the same.
Looks like the Octopus backport is already merged, but there is another PR (https://github.com/ceph/ceph/pull/47702) which is still under review and not yet merged to main.

Regards,
Yaarit

#26 Updated by 王子敬 wang about 1 month ago

(gdb) bt
#0 0x00007fc82cdb64aa in tc_newarray () from /lib64/libtcmalloc.so.4
#1 0x000055f6876050ba in ceph::buffer::v15_2_0::ptr_node::create<ceph::buffer::v15_2_0::ptr_node const&> ()
at /usr/src/debug/ceph-15.2.13-branch_2212260918.el8.x86_64/src/include/buffer.h:411
#2 ceph::buffer::v15_2_0::list::append (this=this@entry=0x55f6b308ceb8, bl=...) at /usr/src/debug/ceph-15.2.13-branch_2212260918.el8.x86_64/src/common/buffer.cc:1424
#3 0x000055f687150491 in ceph::encode (bl=..., s=...) at /usr/src/debug/ceph-15.2.13-branch_2212260918.el8.x86_64/src/include/encoding.h:282
#4 ceph::os::Transaction::encode (this=this@entry=0x7fc8039c7440, bl=...) at /usr/src/debug/ceph-15.2.13-branch_2212260918.el8.x86_64/src/os/Transaction.h:1267
#5 0x000055f687137698 in ceph::os::encode (features=0, bl=..., c=...) at /usr/src/debug/ceph-15.2.13-branch_2212260918.el8.x86_64/src/os/Transaction.h:1293
#6 ReplicatedBackend::generate_subop (this=0x55f6956f8180, soid=..., at_version=..., tid=10598176, reqid=..., pg_trim_to=..., min_last_complete_ondisk=..., new_temp_oid=...,
discard_temp_oid=..., log_entries=..., hset_hist=std::optional<pg_hit_set_history_t> [no contained value], op_t=..., peer=..., pinfo=...)
at /usr/src/debug/ceph-15.2.13-branch_2212260918.el8.x86_64/src/osd/ReplicatedBackend.cc:968
#7 0x000055f687138188 in ReplicatedBackend::issue_op (this=0x55f6956f8180, soid=..., at_version=..., tid=<optimized out>, reqid=..., pg_trim_to=..., min_last_complete_ondisk=...,
new_temp_oid=..., discard_temp_oid=..., log_entries=..., hset_hist=..., op=<optimized out>, op_t=...)
at /usr/src/debug/ceph-15.2.13-branch_2212260918.el8.x86_64/src/osd/ReplicatedBackend.cc:1028
#8 0x000055f68713ad14 in ReplicatedBackend::submit_transaction (this=0x55f6956f8180, soid=..., delta_stats=..., at_version=..., _t=..., trim_to=..., min_last_complete_ondisk=...,
_log_entries=std::vector of length 1, capacity 1 = {...}, hset_history=std::optional<pg_hit_set_history_t> [no contained value], on_all_commit=0x55f6bfd47360, tid=10598176,
reqid=..., orig_op=...) at /usr/include/c++/8/ext/aligned_buffer.h:76
#9 0x000055f686f07ce0 in PrimaryLogPG::issue_repop (this=0x55f6961c4000, repop=0x55f696e73980, ctx=0x55f6dc76d200)
at /usr/src/debug/ceph-15.2.13-branch_2212260918.el8.x86_64/src/osd/PeeringState.h:2292
#10 0x000055f686f64c5a in PrimaryLogPG::execute_ctx (this=0x55f6961c4000, ctx=<optimized out>)
at /usr/src/debug/ceph-15.2.13-branch_2212260918.el8.x86_64/src/osd/PrimaryLogPG.cc:4166
#11 0x000055f686f69004 in PrimaryLogPG::do_op (this=0x55f6961c4000, op=...) at /usr/src/debug/ceph-15.2.13-branch_2212260918.el8.x86_64/src/osd/PrimaryLogPG.cc:2381
#12 0x000055f686f76585 in PrimaryLogPG::do_request (this=0x55f6961c4000, op=..., handle=...) at /usr/src/debug/ceph-15.2.13-branch_2212260918.el8.x86_64/src/osd/PrimaryLogPG.cc:1779
#13 0x000055f686df35d9 in OSD::dequeue_op (this=this@entry=0x55f692652000, pg=..., op=..., handle=...)
at /usr/src/debug/ceph-15.2.13-branch_2212260918.el8.x86_64/src/osd/OSD.cc:9754
#14 0x000055f68705b378 in ceph::osd::scheduler::PGOpItem::run (this=<optimized out>, osd=0x55f692652000, sdata=<optimized out>, pg=..., handle=...)
at /usr/src/debug/ceph-15.2.13-branch_2212260918.el8.x86_64/src/osd/PG.h:627
#15 0x000055f686e0ff4b in ceph::osd::scheduler::OpSchedulerItem::run (handle=..., pg=..., sdata=<optimized out>, osd=<optimized out>, this=0x7fc8039c83b0)
at /usr/include/c++/8/bits/unique_ptr.h:345
#16 OSD::ShardedOpWQ::_process (this=<optimized out>, thread_index=<optimized out>, hb=<optimized out>)
at /usr/src/debug/ceph-15.2.13-branch_2212260918.el8.x86_64/src/osd/OSD.cc:10788
#17 0x000055f687465644 in ShardedThreadPool::shardedthreadpool_worker (this=0x55f692652a28, thread_index=11)
at /usr/src/debug/ceph-15.2.13-branch_2212260918.el8.x86_64/src/common/WorkQueue.cc:311
#18 0x000055f6874682a4 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at /usr/src/debug/ceph-15.2.13-branch_2212260918.el8.x86_64/src/common/WorkQueue.h:715
#19 0x00007fc82c26014a in start_thread () from /lib64/libpthread.so.0
#20 0x00007fc82b3c9dc3 in clone () from /lib64/libc.so.6

ceph_version 15.2.13
We've just seen some crash running 15.2.13

#27 Updated by Igor Fedotov 17 days ago

  • Duplicated by Bug #58439: octopus osd crash added

#28 Updated by Igor Fedotov 17 days ago

  • Pull request ID changed from 43770 to 47702

#29 Updated by Igor Fedotov 17 days ago

  • Status changed from In Progress to Fix Under Review

Also available in: Atom PDF