Actions
Bug #23206
closedceph-osd daemon crashes - *** Caught signal (Aborted) **
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Upfront: sorry for the title, I don't know better.
One of our OSDs on a machine is constantly up/down due to crashing daemon it seems, that is what the log shows.
-1> 2018-03-03 20:57:52.577447 7f13128e6700 1 -- 10.3.0.6:6807/3779664 --> 10.3.0.100:0/2025 -- osd_ping(ping_reply e18149 stamp 2018-03-03 20:57:52.575794) v4 -- 0x5577885ee200 con 0 0> 2018-03-03 20:57:52.594993 7f12f80cc700 -1 *** Caught signal (Aborted) ** in thread 7f12f80cc700 thread_name:tp_osd_tp ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable) 1: (()+0xa65824) [0x557766577824] 2: (()+0x11390) [0x7f1316c67390] 3: (gsignal()+0x38) [0x7f1315c02428] 4: (abort()+0x16a) [0x7f1315c0402a] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x5577665baa1e] 6: (BlueStore::_collection_list(BlueStore::Collection*, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x1e8f) [0x5577664131cf] 7: (BlueStore::collection_list(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x25a) [0x55776641394a] 8: (PGBackend::objects_list_partial(hobject_t const&, int, int, std::vector<hobject_t, std::allocator<hobject_t> >*, hobject_t*)+0x3a0) [0x5577661d3d00] 9: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x9c1) [0x5577660a2c01] 10: (PG::scrub(unsigned int, ThreadPool::TPHandle&)+0x45c) [0x5577660a413c] 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x12d0) [0x557765fe3f70] 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) [0x5577665bf684] 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5577665c26c0] 14: (()+0x76ba) [0x7f1316c5d6ba] 15: (clone()+0x6d) [0x7f1315cd441d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 1/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 1 reserver 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 4/ 5 memdb 1/ 5 kinetic 1/ 5 fuse 1/ 5 mgr 1/ 5 mgrc 1/ 5 dpdk 1/ 5 eventtrace -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.2.log --- end dump of recent events ---
Updated by Anonymous about 6 years ago
This is an repeating/ongoing issue, please tell me what to help to investigate this.
Updated by Rams C over 5 years ago
we can confirm we are experiencing the same issue on version 12.2.7 and currently have some random osds that went offline and won’t come up. Several pgs are down and some inactive.
Updated by Igor Fedotov over 5 years ago
- Project changed from RADOS to bluestore
Updated by Igor Fedotov over 5 years ago
Rams rams, could you please share your stack trace and log output preceding the assertion?
Updated by Sage Weil about 5 years ago
- Status changed from Need More Info to Rejected
not enough info
Actions