Actions
Bug #22539
closedbluestore: New OSD - Caught signal - bstore_kv_sync
Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
OSD, bluestore, bstore_kv_sync
Backport:
luminous, jewel
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
After rebuilding a demo cluster, OSD on one node can no longer be created.
Looking though the log I see this error:
2017-12-25 10:46:16.766578 7fc92097b700 -1 *** Caught signal (Aborted) ** in thread 7fc92097b700 thread_name:bstore_kv_sync ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable) 1: (()+0xa16654) [0x562747796654] 2: (()+0x110c0) [0x7fc93155d0c0] 3: (gsignal()+0xcf) [0x7fc930524fcf] 4: (abort()+0x16a) [0x7fc9305263fa] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x5627477de33e] 6: (Throttle::put(long)+0x38e) [0x5627477d506e] 7: (BlueStore::_kv_sync_thread()+0x1a18) [0x56274766c308] 8: (BlueStore::KVSyncThread::entry()+0xd) [0x5627476af04d] 9: (()+0x7494) [0x7fc931553494] 10: (clone()+0x3f) [0x7fc9305daaff] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- 0> 2017-12-25 10:46:16.766578 7fc92097b700 -1 *** Caught signal (Aborted) ** in thread 7fc92097b700 thread_name:bstore_kv_sync ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable) 1: (()+0xa16654) [0x562747796654] 2: (()+0x110c0) [0x7fc93155d0c0] 3: (gsignal()+0xcf) [0x7fc930524fcf] 4: (abort()+0x16a) [0x7fc9305263fa] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x5627477de33e] 6: (Throttle::put(long)+0x38e) [0x5627477d506e] 7: (BlueStore::_kv_sync_thread()+0x1a18) [0x56274766c308] 8: (BlueStore::KVSyncThread::entry()+0xd) [0x5627476af04d] 9: (()+0x7494) [0x7fc931553494] 10: (clone()+0x3f) [0x7fc9305daaff] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 1/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 1 reserver 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 4/ 5 memdb 1/ 5 kinetic 1/ 5 fuse 1/ 5 mgr 1/ 5 mgrc 1/ 5 dpdk 1/ 5 eventtrace -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.51004.log --- end dump of recent events ---
The OSD never comes online.
Complete-ish log at: https://pastebin.com/10e5rjGt
Also, the host appears to experience lag as OSD are created... May or may not be related. I have validated the health of the physical disks.
Files
Actions