Bug #23206: ceph-osd daemon crashes - *** Caught signal (Aborted) ** - bluestore - Ceph

Actions

Copy link

Bug #23206

closed

ceph-osd daemon crashes - * Caught signal (Aborted)

Added by Anonymous about 6 years ago. Updated about 5 years ago.

Status:

Rejected

Priority:

Normal

Assignee:

Target version:

Ceph - v12.2.2

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Upfront: sorry for the title, I don't know better.
One of our OSDs on a machine is constantly up/down due to crashing daemon it seems, that is what the log shows.


    -1> 2018-03-03 20:57:52.577447 7f13128e6700  1 -- 10.3.0.6:6807/3779664 --> 10.3.0.100:0/2025 -- osd_ping(ping_reply e18149 stamp 2018-03-03 20:57:52.575794) v4 -- 0x5577885ee200 con 0
     0> 2018-03-03 20:57:52.594993 7f12f80cc700 -1 *** Caught signal (Aborted) **
 in thread 7f12f80cc700 thread_name:tp_osd_tp

 ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)
 1: (()+0xa65824) [0x557766577824]
 2: (()+0x11390) [0x7f1316c67390]
 3: (gsignal()+0x38) [0x7f1315c02428]
 4: (abort()+0x16a) [0x7f1315c0402a]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x5577665baa1e]
 6: (BlueStore::_collection_list(BlueStore::Collection*, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x1e8f) [0x5577664131cf]
 7: (BlueStore::collection_list(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x25a) [0x55776641394a]
 8: (PGBackend::objects_list_partial(hobject_t const&, int, int, std::vector<hobject_t, std::allocator<hobject_t> >*, hobject_t*)+0x3a0) [0x5577661d3d00]
 9: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x9c1) [0x5577660a2c01]
 10: (PG::scrub(unsigned int, ThreadPool::TPHandle&)+0x45c) [0x5577660a413c]
 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x12d0) [0x557765fe3f70]
 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) [0x5577665bf684]
 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5577665c26c0]
 14: (()+0x76ba) [0x7f1316c5d6ba]
 15: (clone()+0x6d) [0x7f1315cd441d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.2.log
--- end dump of recent events ---

Actions

Copy link

Updated by Anonymous about 6 years ago

This is an repeating/ongoing issue, please tell me what to help to investigate this.

Actions

Copy link

Updated by Rams C over 5 years ago

we can confirm we are experiencing the same issue on version 12.2.7 and currently have some random osds that went offline and won’t come up. Several pgs are down and some inactive.

Actions

Copy link