Project

General

Profile

Bug #43538

BlueFS volume selector assert

Added by Adam Kupczyk about 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have been doing test that operate only on rocksdb.

During intensive omap writes, sometimes following happens:
-3> 2020-01-09T06:33:11.782-0500 7fb3b9b19700 4 rocksdb: [db/compaction_job.cc:1332] [default] [JOB 1126] Generated table #4214: 32235 keys, 2258207 bytes
-2> 2020-01-09T06:33:11.782-0500 7fb3b9b19700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1578569591783536, "cf_name": "default", "job": 1126, "event": "table_file_creation", "file_number": 4214, "file_size": 2258207, "table_properties": {"data_size": 2155594, "index_size": 21047, "filter_size": 80709, "raw_key_size": 896982, "raw_average_key_size": 27, "raw_value_size": 1704768, "raw_average_value_size": 52, "num_data_blocks": 533, "num_entries": 32235, "filter_policy_name": "rocksdb.BuiltinBloomFilter"}}
-1> 2020-01-09T06:33:11.793-0500 7fb3b4b0f700 -1 /work/adam/ceph-4/src/os/bluestore/BlueStore.h: In function 'virtual void RocksDBBlueFSVolumeSelector::sub_usage(void*, const bluefs_fnode_t&)' thread 7fb3b4b0f700 time 2020-01-09T06:33:11.784907-0500
/work/adam/ceph-4/src/os/bluestore/BlueStore.h: 3677: FAILED ceph_assert(cur >= p.length)

ceph version 15.0.0-8514-gf01523d (f01523d0021e786f660e1d2823c38088ea7fe49c) octopus (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x56309434c3cc]
2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x56309434c59a]
3: (RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t const&)+0x2a4) [0x5630948f9f64]
4: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned long)+0xf25) [0x563094973be5]
5: (BlueFS::_flush(BlueFS::FileWriter*, bool)+0x10b) [0x563094974dab]
6: (BlueFS::_flush_and_sync_log(std::unique_lock<std::mutex>&, unsigned long, unsigned long)+0x8a3) [0x563094975aa3]
7: (BlueFS::_fsync(BlueFS::FileWriter*, std::unique_lock<std::mutex>&)+0x92) [0x563094978032]
8: (BlueRocksWritableFile::Sync()+0x63) [0x563094999e63]
9: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x321) [0x563094f79421]
10: (rocksdb::WritableFileWriter::Sync(bool)+0x88) [0x563094f7a9c8]
11: (rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned long)+0x304) [0x563094e0b614]
12: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x257e) [0x563094e1404e]
13: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21) [0x563094e14531]
14: (RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x81) [0x563094dc8331]
15: (RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x97) [0x563094dc8da7]
16: (BlueStore_DB_Hash::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x4b) [0x5630948fbdbb]
17: (BlueStore::_kv_sync_thread()+0x2279) [0x5630948d6c99]
18: (BlueStore::KVSyncThread::entry()+0xd) [0x5630948fb63d]
19: (()+0x7e25) [0x7fb3c8b3ce25]
20: (clone()+0x6d) [0x7fb3c7ba434d]

Related issues

Copied to bluestore - Backport #47213: nautilus: BlueFS volume selector assert Resolved

History

#1 Updated by Adam Kupczyk about 4 years ago

This issue is caused by interaction of `BlueFS::_flush_and_sync_log` and `BlueFS::_compact_log_async` executing at the same time.

#0 lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007fb3c8b3edcb in L_lock_812 () from /lib64/libpthread.so.0
#2 0x00007fb3c8b3ec98 in __GI
_pthread_mutex_lock (mutex=0x56309ff66808) at ../nptl/pthread_mutex_lock.c:79
#3 0x00005630949716e5 in __gthread_mutex_lock (
_mutex=0x56309ff66808)
at /opt/rh/devtoolset-7/root/usr/include/c++/7/x86_64-redhat-linux/bits/gthr-default.h:748
#4 lock (this=0x56309ff66808) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_mutex.h:103
#5 BlueFS::_flush_bdev_safely (this=this@entry=0x56309ff66800, h=0x56319fb64c80)
at /work/adam/ceph-4/src/os/bluestore/BlueFS.cc:2995
#6 0x00005630949864c3 in BlueFS::_compact_log_async (this=this@entry=0x56309ff66800, l=...)
at /work/adam/ceph-4/src/os/bluestore/BlueFS.cc:2407
#7 0x0000563094987c65 in BlueFS::sync_metadata (this=0x56309ff66800) at /work/adam/ceph-4/src/os/bluestore/BlueFS.cc:3208
#8 0x0000563094999b9d in BlueRocksDirectory::Fsync (this=<optimized out>)
at /work/adam/ceph-4/src/os/bluestore/BlueRocksEnv.cc:302
#9 0x0000563094fc50fe in rocksdb::CompactionJob::Run (this=this@entry=0x7fb3b9b15a20)
at /work/adam/ceph-4/src/rocksdb/db/compaction_job.cc:627
#10 0x0000563094e1e5e7 in rocksdb::DBImpl::BackgroundCompaction (this=this@entry=0x5630a002c800,
made_progress=made_progress@entry=0x7fb3b9b15e86, job_context=job_context@entry=0x7fb3b9b15ea0,
log_buffer=log_buffer@entry=0x7fb3b9b16070, prepicked_compaction=prepicked_compaction@entry=0x0,
thread_pri=rocksdb::Env::LOW) at /work/adam/ceph-4/src/rocksdb/db/db_impl_compaction_flush.cc:2623
#11 0x0000563094e24ed6 in rocksdb::DBImpl::BackgroundCallCompaction (this=this@entry=0x5630a002c800,
prepicked_compaction=prepicked_compaction@entry=0x0, bg_thread_pri=bg_thread_pri@entry=rocksdb::Env::LOW)
at /work/adam/ceph-4/src/rocksdb/db/db_impl_compaction_flush.cc:2192
#12 0x0000563094e2538a in rocksdb::DBImpl::BGWorkCompaction (arg=<optimized out>)
at /work/adam/ceph-4/src/rocksdb/db/db_impl_compaction_flush.cc:1972
#13 0x0000563095002c84 in operator() (this=0x7fb3b9b169f0)
at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:706
#14 rocksdb::ThreadPoolImpl::Impl::BGThread (this=this@entry=0x5630a0104d20, thread_id=thread_id@entry=0)
at /work/adam/ceph-4/src/rocksdb/util/threadpool_imp.cc:265
#15 0x0000563095002e1d in rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper (arg=0x56309f2e2990)
at /work/adam/ceph-4/src/rocksdb/util/threadpool_imp.cc:306
#16 0x00005630950bb26f in execute_native_thread_routine ()
#17 0x00007fb3c8b3ce25 in start_thread (arg=0x7fb3b9b19700) at pthread_create.c:308
#18 0x00007fb3c7ba434d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

#0 0x00007fb3c8b444ab in raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
#1 0x00005630949cd1f9 in reraise_fatal (signum=6) at /work/adam/ceph-4/src/global/signal_handler.cc:81
#2 handle_fatal_signal (signum=6) at /work/adam/ceph-4/src/global/signal_handler.cc:326
#3 <signal handler called>
#4 0x00007fb3c7ae11f7 in _GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#5 0x00007fb3c7ae28e8 in __GI_abort () at abort.c:90
#6 0x000056309434c41b in ceph::
_ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>,
func=<optimized out>) at /work/adam/ceph-4/src/common/assert.cc:73
#7 0x000056309434c59a in ceph::__ceph_assert_fail (ctx=...) at /work/adam/ceph-4/src/common/assert.cc:78
#8 0x00005630948f9f64 in RocksDBBlueFSVolumeSelector::sub_usage (this=0x56309f332a80, hint=0x1, fnode=...)
at /work/adam/ceph-4/src/os/bluestore/BlueStore.h:3677
#9 0x0000563094973be5 in BlueFS::_flush_range (this=this@entry=0x56309ff66800, h=h@entry=0x56309f371b80,
offset=offset@entry=22020096, length=length@entry=4096) at /work/adam/ceph-4/src/os/bluestore/BlueFS.cc:2679
#10 0x0000563094974dab in BlueFS::_flush (this=this@entry=0x56309ff66800, h=0x56309f371b80, force=force@entry=true)
at /work/adam/ceph-4/src/os/bluestore/BlueFS.cc:2916
#11 0x0000563094975aa3 in BlueFS::_flush_and_sync_log (this=this@entry=0x56309ff66800, l=..., want_seq=want_seq@entry=1507654,
jump_to=jump_to@entry=0) at /work/adam/ceph-4/src/os/bluestore/BlueFS.cc:2579
#12 0x0000563094978032 in BlueFS::_fsync (this=this@entry=0x56309ff66800, h=h@entry=0x5630a2ba1e00, l=...)
at /work/adam/ceph-4/src/os/bluestore/BlueFS.cc:2976
#13 0x0000563094999e63 in fsync (h=0x5630a2ba1e00, this=0x56309ff66800) at /work/adam/ceph-4/src/os/bluestore/BlueFS.h:561
#14 BlueRocksWritableFile::Sync (this=<optimized out>) at /work/adam/ceph-4/src/os/bluestore/BlueRocksEnv.cc:220
#15 0x0000563094f79421 in rocksdb::WritableFileWriter::SyncInternal (this=this@entry=0x5630f7fc6f00,
use_fsync=use_fsync@entry=false) at /work/adam/ceph-4/src/rocksdb/util/file_reader_writer.cc:426
#16 0x0000563094f7a9c8 in rocksdb::WritableFileWriter::Sync (this=0x5630f7fc6f00, use_fsync=<optimized out>)
at /work/adam/ceph-4/src/rocksdb/util/file_reader_writer.cc:395
#17 0x0000563094e0b614 in rocksdb::DBImpl::WriteToWAL (this=this@entry=0x5630a002c800, write_group=...,
log_writer=log_writer@entry=0x5630d76bba40, log_used=log_used@entry=0x0, need_log_sync=<optimized out>,
need_log_dir_sync=<optimized out>, sequence=<optimized out>) at /work/adam/ceph-4/src/rocksdb/db/db_impl_write.cc:884
#18 0x0000563094e1404e in rocksdb::DBImpl::WriteImpl (this=0x5630a002c800, write_options=..., my_batch=<optimized out>,
callback=callback@entry=0x0, log_used=log_used@entry=0x0, log_ref=<optimized out>, disable_memtable=<optimized out>,
seq_used=<optimized out>, batch_cnt=<optimized out>, pre_release_callback=<optimized out>)
at /work/adam/ceph-4/src/rocksdb/db/db_impl_write.cc:282
#19 0x0000563094e14531 in rocksdb::DBImpl::Write (this=<optimized out>, write_options=..., my_batch=<optimized out>)
at /work/adam/ceph-4/src/rocksdb/db/db_impl_write.cc:55
#20 0x0000563094dc8331 in RocksDBStore::submit_common (this=this@entry=0x5630a0240000, woptions=..., t=...)
at /work/adam/ceph-4/src/kv/RocksDBStore.cc:1302
#21 0x0000563094dc8da7 in RocksDBStore::submit_transaction_sync (this=0x5630a0240000, t=...)
at /work/adam/ceph-4/src/kv/RocksDBStore.cc:1354
#22 0x00005630948fbdbb in BlueStore_DB_Hash::submit_transaction_sync (this=this@entry=0x56309f309340, t=...)
at /work/adam/ceph-4/src/os/bluestore/BlueStore_DB_Hash.cc:653
#23 0x00005630948d6c99 in BlueStore::_kv_sync_thread (this=0x5630a0022000)
at /work/adam/ceph-4/src/os/bluestore/BlueStore.cc:12140
#24 0x00005630948fb63d in BlueStore::KVSyncThread::entry (this=<optimized out>)
at /work/adam/ceph-4/src/os/bluestore/BlueStore.h:1905
#25 0x00007fb3c8b3ce25 in start_thread (arg=0x7fb3b4b0f700) at pthread_create.c:308
#26 0x00007fb3c7ba434d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

#2 Updated by Adam Kupczyk about 4 years ago

Fix proposal: https://github.com/ceph/ceph/pull/32584

Gist of solution is moving vselector->sub_usage().
Now `sub_usage()` and `add_usage()` no longer spans over `_flush_bdev_safely(new_log_writer);`.
_flush_bdev_safely drops bluefs.lock, and allows `_flush_and_sync_log()` to sneak in with usage accounting assert.

#3 Updated by Kefu Chai about 4 years ago

  • Status changed from New to Resolved
  • Pull request ID set to 32584

#4 Updated by Pietari Hyvärinen over 3 years ago

I'm seeing this killing almost every osd one by one ( during 48h cycle) on one of our production clusters after upgrading from 14.2.9 to 14.2.11. Would it be possible to get this backported to nautilus?

#5 Updated by Igor Fedotov over 3 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to nautilus

#6 Updated by Igor Fedotov over 3 years ago

#7 Updated by Igor Fedotov over 3 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF