Project

General

Profile

Bug #53907

BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length)

Added by Vikhyat Umrao about 2 years ago. Updated 5 months ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Target version:
-
% Done:

100%

Source:
Tags:
backport_processed
Backport:
quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2022-01-18T00:11:09.058+0000 7f904d00d700  4 rocksdb: (Original Log Time 2022/01/18-00:11:09.059551) [db/memtable_list.cc:631] [L] Level-0 commit table #2027: memtable #1 done
2022-01-18T00:11:09.058+0000 7f904d00d700  4 rocksdb: (Original Log Time 2022/01/18-00:11:09.059566) EVENT_LOG_v1 {"time_micros": 1642464669059560, "job": 635, "event": "flush_finished", "output_compression": "NoCompression", "lsm_state": [2, 1, 0, 0, 0, 0, 0], "immutable_memtables": 0}
2022-01-18T00:11:09.058+0000 7f904d00d700  4 rocksdb: (Original Log Time 2022/01/18-00:11:09.059592) [db/db_impl/db_impl_compaction_flush.cc:235] [L] Level summary: files[2 1 0 0 0 0 0] max score 0.50

2022-01-18T00:11:09.058+0000 7f904d00d700  4 rocksdb: [db/db_impl/db_impl_files.cc:420] [JOB 635] Try to delete WAL files size 409400591, prev total WAL file size 414051513, number of live WAL files 3.

2022-01-18T00:11:09.145+0000 7f90485f6700 -1 /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-10229-g7e035110/rpm/el8/BUILD/ceph-17.0.0-10229-g7e035110/src/os/bluestore/BlueStore.h: In function 'virtual void RocksDBBlueFSVolumeSelector::sub_usage(void*, const bluefs_fnode_t&)' thread 7f904d00d700 time 2022-01-18T00:11:09.085371+0000
/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-10229-g7e035110/rpm/el8/BUILD/ceph-17.0.0-10229-g7e035110/src/os/bluestore/BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length)

 ceph version 17.0.0-10229-g7e035110 (7e035110784fba02ba81944e444be9a36932c6a3) quincy (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x561a196bab8e]
 2: /usr/bin/ceph-osd(+0x5d5daf) [0x561a196badaf]
 3: (RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t const&)+0x16a) [0x561a19d7ecca]
 4: (BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, unsigned long)+0x735) [0x561a19e1cb45]
 5: (BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0xa9) [0x561a19e1d009]
 6: (BlueFS::fsync(BlueFS::FileWriter*)+0x18e) [0x561a19e386de]
 7: (BlueRocksWritableFile::Sync()+0x18) [0x561a19e48fb8]
 8: (rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x1f) [0x561a1a36c74f]
 9: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x662) [0x561a1a49cf22]
 10: (rocksdb::WritableFileWriter::Sync(bool)+0xf8) [0x561a1a49e8e8]
 11: (rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned long)+0x341) [0x561a1a383701]
 12: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x1c04) [0x561a1a38b454]
 13: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21) [0x561a1a38b5a1]
 14: (RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x84) [0x561a1a325f84]
 15: (RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9a) [0x561a1a32698a]
 16: (BlueStore::_kv_sync_thread()+0x3530) [0x561a19d7d390]
 17: (BlueStore::KVSyncThread::entry()+0x11) [0x561a19dac0b1]
 18: /lib64/libpthread.so.0(+0x817a) [0x7f905f01817a]
 19: clone()

2022-01-18T00:11:09.146+0000 7f904d00d700 -1 /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-10229-g7e035110/rpm/el8/BUILD/ceph-17.0.0-10229-g7e035110/src/os/bluestore/BlueStore.h: In function 'virtual void RocksDBBlueFSVolumeSelector::sub_usage(void*, const bluefs_fnode_t&)' thread 7f904d00d700 time 2022-01-18T00:11:09.085371+0000
/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-10229-g7e035110/rpm/el8/BUILD/ceph-17.0.0-10229-g7e035110/src/os/bluestore/BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length)

 ceph version 17.0.0-10229-g7e035110 (7e035110784fba02ba81944e444be9a36932c6a3) quincy (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x561a196bab8e]
 2: /usr/bin/ceph-osd(+0x5d5daf) [0x561a196badaf]
 3: (RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t const&)+0x16a) [0x561a19d7ecca]
 4: (BlueFS::_drop_link_D(boost::intrusive_ptr<BlueFS::File>)+0x5fd) [0x561a19e1514d]
 5: (BlueFS::unlink(std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >)+0x706) [0x561a19e23ed6]
 6: (BlueRocksEnv::DeleteFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x47) [0x561a19e47167]
 7: (rocksdb::DeleteDBFile(rocksdb::ImmutableDBOptions const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool)+0x9d) [0x561a1a492abd]
 8: (rocksdb::DBImpl::DeleteObsoleteFileImpl(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::FileType, unsigned long)+0x116) [0x561a1a3a37a6]
 9: (rocksdb::DBImpl::PurgeObsoleteFiles(rocksdb::JobContext&, bool)+0x14c4) [0x561a1a3a6974]
 10: (rocksdb::DBImpl::BackgroundCallFlush(rocksdb::Env::Priority)+0x16b) [0x561a1a39dd8b]
 11: (rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long)+0x24a) [0x561a1a56243a]
 12: (rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x5d) [0x561a1a5625dd]
 13: /lib64/libstdc++.so.6(+0xc2ba3) [0x7f905e66bba3]
 14: /lib64/libpthread.so.0(+0x817a) [0x7f905f01817a]
 15: clone()

2022-01-18T00:11:29.158+0000 7f42f7cf7240  0 set uid:gid to 167:167 (ceph:ceph)
2022-01-18T00:11:29.158+0000 7f42f7cf7240  0 ceph version 17.0.0-10229-g7e035110 (7e035110784fba02ba81944e444be9a36932c6a3) quincy (dev), process ceph-osd, pid 8
2022-01-18T00:11:29.158+0000 7f42f7cf7240  0 pidfile_write: ignore empty --pid-file
2022-01-18T00:11:29.160+0000 7f42f7cf7240  1 bdev(0x55e9a1703400 /var/lib/ceph/osd/ceph-4/block) open path /var/lib/ceph/osd/ceph-4/block
2022-01-18T00:11:29.161+0000 7f42f7cf7240  1 bdev(0x55e9a1703400 /var/lib/ceph/osd/ceph-4/block) open size 1999839952896 (0x1d19fc00000, 1.8 TiB) block_size 4096 (4 KiB) rotational discard not supported

20231016_osd6.zip - OSD log including recent events before crash (157 KB) Zakhar Kirpichenko, 10/16/2023 08:22 AM


Related issues

Duplicated by bluestore - Bug #53906: BlueStore.h: 4158: FAILED ceph_assert(cur >= fnode.size) Duplicate
Duplicates bluestore - Bug #63161: bluestore: FAILED ceph_assert(cur2 >= p.length) Duplicate
Duplicates bluestore - Bug #63172: BlueStore: FAILED ceph_assert(cur >= fnode.size) Duplicate
Duplicated by bluestore - Bug #63110: Crash in RocksDBBlueFSVolumeSelector::sub_usage via BlueFS::fsync via WriteToWAL in KVSyncThread Duplicate
Duplicated by bluestore - Bug #63352: ceph-osd crashed with ceph_assert(cur2 >= p.length) error message Duplicate
Copied to bluestore - Backport #54209: quincy: BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length) Resolved
Copied to bluestore - Backport #62928: pacific: BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length) Resolved

History

#1 Updated by Vikhyat Umrao about 2 years ago

- After hitting this crash the systemd restarted the OSD container pod and after the restart, OSD is running fine!

#2 Updated by Vikhyat Umrao about 2 years ago

The same cluster OSD.95 had also hit the same assert and got restarted by systemd and after that running fine!


2022-01-17T22:56:40.077+0000 7f03e4a29700 -1 /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-10229-g7e035110/rpm/el8/BUILD/ceph-17.0.0-10229-g7e035110/src/os/bluestore/BlueStore.h: In function 'virtual void RocksDBBlueFSVolumeSelector::sub_usage(void*, const bluefs_fnode_t&)' thread 7f03e4a29700 time 2022-01-17T22:56:40.066674+0000
/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-10229-g7e035110/rpm/el8/BUILD/ceph-17.0.0-10229-g7e035110/src/os/bluestore/BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length)

 ceph version 17.0.0-10229-g7e035110 (7e035110784fba02ba81944e444be9a36932c6a3) quincy (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x561ec5505b8e]
 2: /usr/bin/ceph-osd(+0x5d5daf) [0x561ec5505daf]
 3: (RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t const&)+0x16a) [0x561ec5bc9cca]
 4: (BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, unsigned long)+0x735) [0x561ec5c67b45]
 5: (BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0xa9) [0x561ec5c68009]
 6: (BlueFS::fsync(BlueFS::FileWriter*)+0x18e) [0x561ec5c836de]
 7: (BlueRocksWritableFile::Sync()+0x18) [0x561ec5c93fb8]
 8: (rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x1f) [0x561ec61b774f]
 9: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x662) [0x561ec62e7f22]
 10: (rocksdb::WritableFileWriter::Sync(bool)+0xf8) [0x561ec62e98e8]
 11: (rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned long)+0x341) [0x561ec61ce701]
 12: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x1c04) [0x561ec61d6454]
 13: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21) [0x561ec61d65a1]
 14: (RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x84) [0x561ec6170f84]
 15: (RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9a) [0x561ec617198a]
 16: (BlueStore::_kv_sync_thread()+0x3530) [0x561ec5bc8390]
 17: (BlueStore::KVSyncThread::entry()+0x11) [0x561ec5bf70b1]
 18: /lib64/libpthread.so.0(+0x817a) [0x7f03fb44b17a]
 19: clone()

2022-01-17T22:57:03.972+0000 7f33a0448240  0 set uid:gid to 167:167 (ceph:ceph)
2022-01-17T22:57:03.972+0000 7f33a0448240  0 ceph version 17.0.0-10229-g7e035110 (7e035110784fba02ba81944e444be9a36932c6a3) quincy (dev), process ceph-osd, pid 8
2022-01-17T22:57:03.972+0000 7f33a0448240  0 pidfile_write: ignore empty --pid-file
2022-01-17T22:57:03.974+0000 7f33a0448240  1 bdev(0x563d0b97d400 /var/lib/ceph/osd/ceph-95/block) open path /var/lib/ceph/osd/ceph-95/block
2022-01-17T22:57:03.975+0000 7f33a0448240  1 bdev(0x563d0b97d400 /var/lib/ceph/osd/ceph-95/block) open size 1999839952896 (0x1d19fc00000, 1.8 TiB) block_size 4096 (4 KiB) rotational discard not supported

#3 Updated by Vikhyat Umrao about 2 years ago

- OSD.0 also had similar crash

2022-01-17T23:40:22.136+0000 7f32423b0700 -1 /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-10229-g7e035110/rpm/el8/BUILD/ceph-17.0.0-10229-g7e035110/src/os/bluestore/BlueStore.h: In function 'virtual void RocksDBBlueFSVolumeSelector::sub_usage(void*, const bluefs_fnode_t&)' thread 7f32423b0700 time 2022-01-17T23:40:22.040838+0000
/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-10229-g7e035110/rpm/el8/BUILD/ceph-17.0.0-10229-g7e035110/src/os/bluestore/BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length)

 ceph version 17.0.0-10229-g7e035110 (7e035110784fba02ba81944e444be9a36932c6a3) quincy (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x55ffca66cb8e]
 2: /usr/bin/ceph-osd(+0x5d5daf) [0x55ffca66cdaf]
 3: (RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t const&)+0x16a) [0x55ffcad30cca]
 4: (BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, unsigned long)+0x735) [0x55ffcadceb45]
 5: (BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0xa9) [0x55ffcadcf009]
 6: (BlueFS::fsync(BlueFS::FileWriter*)+0x18e) [0x55ffcadea6de]
 7: (BlueRocksWritableFile::Sync()+0x18) [0x55ffcadfafb8]
 8: (rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x1f) [0x55ffcb31e74f]
 9: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x662) [0x55ffcb44ef22]
 10: (rocksdb::WritableFileWriter::Sync(bool)+0xf8) [0x55ffcb4508e8]
 11: (rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned long)+0x341) [0x55ffcb335701]
 12: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x1c04) [0x55ffcb33d454]
 13: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21) [0x55ffcb33d5a1]
 14: (RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x84) [0x55ffcb2d7f84]
 15: (RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9a) [0x55ffcb2d898a]
 16: (BlueStore::_kv_sync_thread()+0x3530) [0x55ffcad2f390]
 17: (BlueStore::KVSyncThread::entry()+0x11) [0x55ffcad5e0b1]
 18: /lib64/libpthread.so.0(+0x817a) [0x7f3258dd217a]
 19: clone()

2022-01-17T23:40:22.230+0000 7f32423b0700 -1 *** Caught signal (Aborted) **
 in thread 7f32423b0700 thread_name:bstore_kv_sync

 ceph version 17.0.0-10229-g7e035110 (7e035110784fba02ba81944e444be9a36932c6a3) quincy (dev)
 1: /lib64/libpthread.so.0(+0x12c20) [0x7f3258ddcc20]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b0) [0x55ffca66cbec]
 5: /usr/bin/ceph-osd(+0x5d5daf) [0x55ffca66cdaf]
 6: (RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t const&)+0x16a) [0x55ffcad30cca]
 7: (BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, unsigned long)+0x735) [0x55ffcadceb45]
 8: (BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0xa9) [0x55ffcadcf009]
 9: (BlueFS::fsync(BlueFS::FileWriter*)+0x18e) [0x55ffcadea6de]
 10: (BlueRocksWritableFile::Sync()+0x18) [0x55ffcadfafb8]
 11: (rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x1f) [0x55ffcb31e74f]
 12: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x662) [0x55ffcb44ef22]
 13: (rocksdb::WritableFileWriter::Sync(bool)+0xf8) [0x55ffcb4508e8]
 14: (rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned long)+0x341) [0x55ffcb335701]
 15: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x1c04) [0x55ffcb33d454]
 16: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21) [0x55ffcb33d5a1]
 17: (RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x84) [0x55ffcb2d7f84]
 18: (RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9a) [0x55ffcb2d898a]
 19: (BlueStore::_kv_sync_thread()+0x3530) [0x55ffcad2f390]
 20: (BlueStore::KVSyncThread::entry()+0x11) [0x55ffcad5e0b1]
 21: /lib64/libpthread.so.0(+0x817a) [0x7f3258dd217a]
 22: clone()

2022-01-17T23:40:42.537+0000 7fa18eaa0240  0 set uid:gid to 167:167 (ceph:ceph)
2022-01-17T23:40:42.537+0000 7fa18eaa0240  0 ceph version 17.0.0-10229-g7e035110 (7e035110784fba02ba81944e444be9a36932c6a3) quincy (dev), process ceph-osd, pid 7
2022-01-17T23:40:42.537+0000 7fa18eaa0240  0 pidfile_write: ignore empty --pid-file
2022-01-17T23:40:42.555+0000 7fa18eaa0240  1 bdev(0x559239489400 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block
2022-01-17T23:40:42.556+0000 7fa18eaa0240  1 bdev(0x559239489400 /var/lib/ceph/osd/ceph-0/block) open size 1999839952896 (0x1d19fc00000, 1.8 TiB) block_size 4096 (4 KiB) rotational discard not supported
2022-01-17T23:40:42.556+0000 7fa18eaa0240  1 bluestore(/var/lib/ceph/osd/ceph-0) _set_cache_sizes cache_size 1073741824 meta 0.45 kv 0.45 data 0.06
2022-01-17T23:40:42.556+0000 7fa18eaa0240  1 bdev(0x559239488c00 /var/lib/ceph/osd/ceph-0/block.db) open path /var/lib/ceph/osd/ceph-0/block.db

#4 Updated by Neha Ojha about 2 years ago

  • Assignee set to Adam Kupczyk

#5 Updated by Neha Ojha about 2 years ago

  • Duplicated by Bug #53906: BlueStore.h: 4158: FAILED ceph_assert(cur >= fnode.size) added

#6 Updated by Neha Ojha about 2 years ago

  • Priority changed from Normal to Immediate

#7 Updated by Adam Kupczyk about 2 years ago

  • Pull request ID set to 44713

#8 Updated by Igor Fedotov about 2 years ago

  • Status changed from New to Fix Under Review

#9 Updated by Neha Ojha about 2 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to quincy

#10 Updated by Backport Bot about 2 years ago

  • Copied to Backport #54209: quincy: BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length) added

#11 Updated by Backport Bot over 1 year ago

  • Tags set to backport_processed

#12 Updated by Igor Fedotov over 1 year ago

  • Status changed from Pending Backport to Resolved

#13 Updated by Igor Fedotov 6 months ago

  • Copied to Backport #62928: pacific: BlueStore.h: 4148: FAILED ceph_assert(cur >= p.length) added

#14 Updated by Igor Fedotov 6 months ago

  • Status changed from Resolved to Pending Backport

#15 Updated by Maximilian Stinsky 5 months ago

Hello.

we just upgraded our ceph from 16.2.13 to 16.2.14 and saw a osd crash with the same error message that this bug report states.

ceph-16.2.14/src/os/bluestore/BlueStore.h: 3870: FAILED ceph_assert(cur >= p.length)

The osd comes up without any problems after that.
We are not sure if this is related to the minor upgrade or not because before update 16.2.14 we had failing osds everyday due to another bug.

Should we create a new bug report regarding the ceph_assert error or could it be related to the same issue stated here?

#16 Updated by Zakhar Kirpichenko 5 months ago

We're also affected by this bug, seems to have been introduced by 16.2.14, as crashes were not happening before we upgraded to this version.

#17 Updated by Igor Fedotov 5 months ago

@Maximilian @Zakhar - the issue to be fixed in the next Pacific minor release. Relevant backport PR is https://github.com/ceph/ceph/pull/53587

Please also see more details on how to workaround this in my post at https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/

#18 Updated by Igor Fedotov 5 months ago

  • Duplicates Bug #63161: bluestore: FAILED ceph_assert(cur2 >= p.length) added

#19 Updated by Igor Fedotov 5 months ago

  • Status changed from Pending Backport to Duplicate

#20 Updated by Igor Fedotov 5 months ago

  • Duplicates Bug #63172: BlueStore: FAILED ceph_assert(cur >= fnode.size) added

#21 Updated by Igor Fedotov 5 months ago

  • Status changed from Duplicate to Pending Backport

#22 Updated by jinzhi zhang 5 months ago

We also ran into the same issue, after upgrading from 16.2.13 to 16.2.14

2023-10-15T20:08:11.719196Z_d95e3d5d-3cfe-4a53-9025-daf1745f345d  osd.6    *
2023-10-15T22:27:03.691912Z_1b30c7d8-98f7-448c-a4d1-035c03ee79e7  osd.17   *
2023-10-16T01:38:59.282064Z_0ce2a23a-fe39-4aae-aceb-6cb17864f593  osd.4    *
2023-10-16T04:49:14.343657Z_912bda6e-dc46-4dbb-bc7d-697a709f626c  osd.0    *
2023-10-16T04:56:47.076272Z_bbf9634b-c2ba-40ea-bd20-9deca41c60fd  osd.10   *
2023-10-16T10:21:53.195344Z_610d7e04-a997-4562-89e5-ac86bf78f9e5  osd.23   *
2023-10-16T13:40:57.835138Z_c16c472d-9699-49b1-87d3-d7e2177c2912  osd.9    *
2023-10-16T13:40:57.840230Z_78e52ea5-396a-46ae-bb91-1cece6f3473b  osd.9    *
2023-10-16T20:00:21.100773Z_83aba303-ddec-4ffa-aa95-c5eea6466165  osd.11   *
2023-10-16T21:58:17.103333Z_2d5f5025-95f5-44a6-bbc9-ea60fa9381ee  osd.7    *
2023-10-17T01:19:32.163215Z_8b418d76-3925-4818-8de5-3f5181b930be  osd.6    *
2023-10-17T01:19:32.178271Z_ef39c70c-cdef-4f7a-90ca-54938165b5df  osd.6    *
2023-10-17T01:40:06.571108Z_800aedb2-6092-4f01-89d9-cb214a9ec6e4  osd.21   *

and the crash info is
{
    "assert_condition": "cur2 >= p.length",
    "assert_file": "/root/rpmbuild/BUILD/ceph/src/os/bluestore/BlueStore.h",
    "assert_func": "virtual void RocksDBBlueFSVolumeSelector::sub_usage(void*, const bluefs_fnode_t&)",
    "assert_line": 3875,
    "assert_msg": "/root/rpmbuild/BUILD/ceph/src/os/bluestore/BlueStore.h: In function 'virtual void RocksDBBlueFSVolumeSelector::sub_usage(void*, const bluefs_fnode_t&)' thread 7fba288fb700 time 2023-10-17T01:40:06.473145+0000\n/root/rpmbuild/BUILD/ceph/src/os/bluestore/BlueStore.h: 3875: FAILED ceph_assert(cur2 >= p.length)\n",
    "assert_thread_name": "bstore_kv_sync",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12c20) [0x7fba3d320c20]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x55fc33a4d199]",
        "ceph-osd(+0x540362) [0x55fc33a4d362]",
        "(RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t const&)+0x15e) [0x55fc340fb6ee]",
        "(BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, unsigned long)+0x74d) [0x55fc3418e5cd]",
        "(BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0x90) [0x55fc3418ea70]",
        "(BlueFS::fsync(BlueFS::FileWriter*)+0x181) [0x55fc341aa711]",
        "(BlueRocksWritableFile::Sync()+0x18) [0x55fc341bbb98]",
        "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x1f) [0x55fc3468537f]",
        "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402) [0x55fc347a2e92]",
        "(rocksdb::WritableFileWriter::Sync(bool)+0x88) [0x55fc347a4558]",
        "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned long)+0x30b) [0x55fc34696a1b]",
        "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x2687) [0x55fc3469f747]",
        "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21) [0x55fc3469f941]",
        "(RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x84) [0x55fc3463e304]",
        "(RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x9a) [0x55fc3463ed0a]",
        "(BlueStore::_kv_sync_thread()+0x2f78) [0x55fc340f9c38]",
        "(BlueStore::KVSyncThread::entry()+0x11) [0x55fc34121ab1]",
        "(Thread::entry_wrapper()+0x53) [0x55fc3421f4e3]",
        "/lib64/libpthread.so.0(+0x817a) [0x7fba3d31617a]",
        "clone()" 
    ],
    "ceph_version": "16.2.14-5.0.1",
    "crash_id": "2023-10-17T01:40:06.571108Z_800aedb2-6092-4f01-89d9-cb214a9ec6e4",
    "entity_name": "osd.21",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-osd",
    "stack_sig": "d4133fd81fc283e10ef4edc13fc01336f8bd22579ecab0ea9c0653bda7f3ffca",
    "timestamp": "2023-10-17T01:40:06.571108Z",
    "utsname_hostname": "storage-001",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.61-050461-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#202008260931 SMP Wed Aug 26 09:34:29 UTC 2020" 
}

#23 Updated by Igor Fedotov 5 months ago

  • Duplicated by Bug #63110: Crash in RocksDBBlueFSVolumeSelector::sub_usage via BlueFS::fsync via WriteToWAL in KVSyncThread added

#24 Updated by Igor Fedotov 5 months ago

  • Duplicated by Bug #63352: ceph-osd crashed with ceph_assert(cur2 >= p.length) error message added

#25 Updated by Konstantin Shalygin 5 months ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100

Also available in: Atom PDF