Project

General

Profile

Bug #23206

ceph-osd daemon crashes - *** Caught signal (Aborted) **

Added by super xor over 3 years ago. Updated over 2 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Upfront: sorry for the title, I don't know better.
One of our OSDs on a machine is constantly up/down due to crashing daemon it seems, that is what the log shows.


    -1> 2018-03-03 20:57:52.577447 7f13128e6700  1 -- 10.3.0.6:6807/3779664 --> 10.3.0.100:0/2025 -- osd_ping(ping_reply e18149 stamp 2018-03-03 20:57:52.575794) v4 -- 0x5577885ee200 con 0
     0> 2018-03-03 20:57:52.594993 7f12f80cc700 -1 *** Caught signal (Aborted) **
 in thread 7f12f80cc700 thread_name:tp_osd_tp

 ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)
 1: (()+0xa65824) [0x557766577824]
 2: (()+0x11390) [0x7f1316c67390]
 3: (gsignal()+0x38) [0x7f1315c02428]
 4: (abort()+0x16a) [0x7f1315c0402a]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x5577665baa1e]
 6: (BlueStore::_collection_list(BlueStore::Collection*, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x1e8f) [0x5577664131cf]
 7: (BlueStore::collection_list(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x25a) [0x55776641394a]
 8: (PGBackend::objects_list_partial(hobject_t const&, int, int, std::vector<hobject_t, std::allocator<hobject_t> >*, hobject_t*)+0x3a0) [0x5577661d3d00]
 9: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x9c1) [0x5577660a2c01]
 10: (PG::scrub(unsigned int, ThreadPool::TPHandle&)+0x45c) [0x5577660a413c]
 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x12d0) [0x557765fe3f70]
 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) [0x5577665bf684]
 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5577665c26c0]
 14: (()+0x76ba) [0x7f1316c5d6ba]
 15: (clone()+0x6d) [0x7f1315cd441d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.2.log
--- end dump of recent events ---

History

#1 Updated by super xor over 3 years ago

This is an repeating/ongoing issue, please tell me what to help to investigate this.

#2 Updated by Rams C over 2 years ago

we can confirm we are experiencing the same issue on version 12.2.7 and currently have some random osds that went offline and won’t come up. Several pgs are down and some inactive.

#3 Updated by Igor Fedotov over 2 years ago

  • Project changed from RADOS to bluestore

#4 Updated by Igor Fedotov over 2 years ago

Rams rams, could you please share your stack trace and log output preceding the assertion?

#5 Updated by Sage Weil over 2 years ago

  • Status changed from New to Need More Info

#6 Updated by Sage Weil over 2 years ago

  • Status changed from Need More Info to Rejected

not enough info

Also available in: Atom PDF