Actions
Bug #21932
closedOSD crash on boot with assert caused by Bluefs on flush write
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
bluestore, crash, osd
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Description
After network outage some nodes in cluster were hard-restarted and came up with one or more crashing OSD instances. All affected OSD's have the same issue.
Version is Luminous - Ubuntu packages,
Bluestore is configured with separate WAL.db device (nvme).
Can't start OSD - it crash immedetiary after startup.
root@ceph10:/var/log/ceph# /usr/bin/ceph-osd -f --cluster ceph --id 106 --setuser ceph --setgroup ceph starting osd.106 at - osd_data /var/lib/ceph/osd/ceph-106 /var/lib/ceph/osd/ceph-106/journal tcmalloc: large alloc 1497374720 bytes == 0x55f2b67ee000 @ 0x7f237e9ce1e1 0x7f237d751499 0x7f237d752833 0x55f24de2f359 0x55f24de20a48 0x55f24de22394 0x55f24de2375c 0x55f24de24ff1 0x55f24da064df 0x55f24d992623 0x55f24d9c2de7 0x55f24d9c240e 0x55f24d53acff 0x55f24d44d1f8 0x7f237cd69830 0x55f24d4d8a59 (nil) tcmalloc: large alloc 1313308672 bytes == 0x55f25d6a6000 @ 0x7f237e9ce1e1 0x7f237d751499 0x7f237d75200b 0x55f24de66e59 0x55f24de20c4a 0x55f24de22394 0x55f24de2375c 0x55f24de24ff1 0x55f24da064df 0x55f24d992623 0x55f24d9c2de7 0x55f24d9c240e 0x55f24d53acff 0x55f24d44d1f8 0x7f237cd69830 0x55f24d4d8a59 (nil) /build/ceph-12.2.1/src/include/buffer.h: In function 'void ceph::buffer::list::prepare_iov(VectorT*) const [with VectorT = boost::container::small_vector<iovec, 4ul>]' thread 7f237f8fce00 time 2017-10-26 00:53:40.482785 /build/ceph-12.2.1/src/include/buffer.h: 882: FAILED assert(_buffers.size() <= 1024) ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55f24db033f2] 2: (KernelDevice::aio_write(unsigned long, ceph::buffer::list&, IOContext*, bool)+0x1598) [0x55f24daa8a78] 3: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned long)+0x9c0) [0x55f24da832c0] 4: (BlueFS::_flush(BlueFS::FileWriter*, bool)+0x124) [0x55f24da84cf4] 5: (BlueRocksWritableFile::Flush()+0x3d) [0x55f24da9ae6d] 6: (rocksdb::WritableFileWriter::Flush()+0x24c) [0x55f24ded33ac] 7: (rocksdb::WritableFileWriter::Sync(bool)+0x3e) [0x55f24ded45ce] 8: (rocksdb::BuildTable(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::Env*, rocksdb::ImmutableCFOptions const&, rocksdb::MutableCFOptions const&, rocksdb::EnvOptions const&, rocksdb::TableCache*, rocksdb::InternalIterator*, std::unique_ptr<rocksdb::InternalIterator, std::default_delete<rocksdb::InternalIterator> >, rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&, std::vector<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> >, std::allocator<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> > > > const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<unsigned long, std::allocator<unsigned long> >, unsigned long, rocksdb::CompressionType, rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*, rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int, rocksdb::Env::IOPriority, rocksdb::TableProperties*, int)+0x190f) [0x55f24def304f] 9: (rocksdb::DBImpl::WriteLevel0TableForRecovery(int, rocksdb::ColumnFamilyData*, rocksdb::MemTable*, rocksdb::VersionEdit*)+0xad9) [0x55f24de1edd9] 10: (rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long*, bool)+0x17ec) [0x55f24de20e2c] 11: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool)+0x8c4) [0x55f24de22394] 12: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0xedc) [0x55f24de2375c] 13: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::DB**)+0x6b1) [0x55f24de24ff1] 14: (RocksDBStore::do_open(std::ostream&, bool)+0x8ff) [0x55f24da064df] 15: (BlueStore::_open_db(bool)+0xf73) [0x55f24d992623] 16: (BlueStore::fsck(bool)+0x3e7) [0x55f24d9c2de7] 17: (BlueStore::_mount(bool)+0x1ee) [0x55f24d9c240e] 18: (OSD::init()+0x3df) [0x55f24d53acff] 19: (main()+0x2eb8) [0x55f24d44d1f8] 20: (__libc_start_main()+0xf0) [0x7f237cd69830] 21: (_start()+0x29) [0x55f24d4d8a59] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2017-10-26 00:53:40.490798 7f237f8fce00 -1 /build/ceph-12.2.1/src/include/buffer.h: In function 'void ceph::buffer::list::prepare_iov(VectorT*) const [with VectorT = boost::container::small_vector<iovec, 4ul>]' thread 7f237f8fce00 time 2017-10-26 00:53:40.482785 /build/ceph-12.2.1/src/include/buffer.h: 882: FAILED assert(_buffers.size() <= 1024)
Updated by jianpeng ma over 6 years ago
Could you add debug bluestore = 20 in /etc/ceph.conf? And paste the message. Thanks!
Updated by Kefu Chai over 6 years ago
- Status changed from New to Pending Backport
Updated by Pawel Stefanski over 6 years ago
Kefu Chai wrote:
Thank you so much!
Unfortunately can't check this patch, all nodes with those failed OSDs were redeployed. We will do another hard reset tests in the future.
Updated by Nathan Cutler over 6 years ago
- Copied to Backport #22193: luminous: OSD crash on boot with assert caused by Bluefs on flush write added
Updated by Sage Weil about 6 years ago
- Related to Bug #22066: bluestore osd asserts repeatedly with ceph-12.2.1/src/include/buffer.h: 882: FAILED assert(_buffers.size() <= 1024) added
Updated by Sage Weil about 6 years ago
- Related to Bug #22115: OSD SIGABRT on bluestore_prefer_deferred_size = 104857600: assert(_buffers.size() <= 1024) added
Updated by Nathan Cutler about 6 years ago
- Status changed from Pending Backport to Resolved
Actions