Project

General

Profile

Actions

Bug #22539

closed

bluestore: New OSD - Caught signal - bstore_kv_sync

Added by Brian Woods over 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
OSD, bluestore, bstore_kv_sync
Backport:
luminous, jewel
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After rebuilding a demo cluster, OSD on one node can no longer be created.

Looking though the log I see this error:

2017-12-25 10:46:16.766578 7fc92097b700 -1 *** Caught signal (Aborted) **
 in thread 7fc92097b700 thread_name:bstore_kv_sync

 ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)
 1: (()+0xa16654) [0x562747796654]
 2: (()+0x110c0) [0x7fc93155d0c0]
 3: (gsignal()+0xcf) [0x7fc930524fcf]
 4: (abort()+0x16a) [0x7fc9305263fa]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x5627477de33e]
 6: (Throttle::put(long)+0x38e) [0x5627477d506e]
 7: (BlueStore::_kv_sync_thread()+0x1a18) [0x56274766c308]
 8: (BlueStore::KVSyncThread::entry()+0xd) [0x5627476af04d]
 9: (()+0x7494) [0x7fc931553494]
 10: (clone()+0x3f) [0x7fc9305daaff]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
     0> 2017-12-25 10:46:16.766578 7fc92097b700 -1 *** Caught signal (Aborted) **
 in thread 7fc92097b700 thread_name:bstore_kv_sync

 ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)
 1: (()+0xa16654) [0x562747796654]
 2: (()+0x110c0) [0x7fc93155d0c0]
 3: (gsignal()+0xcf) [0x7fc930524fcf]
 4: (abort()+0x16a) [0x7fc9305263fa]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x5627477de33e]
 6: (Throttle::put(long)+0x38e) [0x5627477d506e]
 7: (BlueStore::_kv_sync_thread()+0x1a18) [0x56274766c308]
 8: (BlueStore::KVSyncThread::entry()+0xd) [0x5627476af04d]
 9: (()+0x7494) [0x7fc931553494]
 10: (clone()+0x3f) [0x7fc9305daaff]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.51004.log
--- end dump of recent events ---

The OSD never comes online.

Complete-ish log at: https://pastebin.com/10e5rjGt

Also, the host appears to experience lag as OSD are created... May or may not be related. I have validated the health of the physical disks.


Files

ceph-osd.51004-short.log.gz (573 KB) ceph-osd.51004-short.log.gz Brian Woods, 12/25/2017 09:27 PM
ceph-osd.51004-small2.log.gz (581 KB) ceph-osd.51004-small2.log.gz Brian Woods, 12/26/2017 05:45 PM
ceph-osd.51004-working.log.gz (381 KB) ceph-osd.51004-working.log.gz Brian Woods, 12/26/2017 10:03 PM
ceph-osd.51004-lagging.log.gz (847 KB) ceph-osd.51004-lagging.log.gz Brian Woods, 12/26/2017 10:43 PM

Related issues 2 (0 open2 closed)

Copied to bluestore - Backport #22698: luminous: bluestore: New OSD - Caught signal - bstore_kv_syncResolvedPrashant DActions
Copied to RADOS - Backport #22906: jewel: bluestore: New OSD - Caught signal - bstore_kv_sync (throttle is only 32 bits)RejectedActions
Actions

Also available in: Atom PDF