Actions
Bug #21932
closedOSD crash on boot with assert caused by Bluefs on flush write
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
bluestore, crash, osd
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Description
After network outage some nodes in cluster were hard-restarted and came up with one or more crashing OSD instances. All affected OSD's have the same issue.
Version is Luminous - Ubuntu packages,
Bluestore is configured with separate WAL.db device (nvme).
Can't start OSD - it crash immedetiary after startup.
root@ceph10:/var/log/ceph# /usr/bin/ceph-osd -f --cluster ceph --id 106 --setuser ceph --setgroup ceph starting osd.106 at - osd_data /var/lib/ceph/osd/ceph-106 /var/lib/ceph/osd/ceph-106/journal tcmalloc: large alloc 1497374720 bytes == 0x55f2b67ee000 @ 0x7f237e9ce1e1 0x7f237d751499 0x7f237d752833 0x55f24de2f359 0x55f24de20a48 0x55f24de22394 0x55f24de2375c 0x55f24de24ff1 0x55f24da064df 0x55f24d992623 0x55f24d9c2de7 0x55f24d9c240e 0x55f24d53acff 0x55f24d44d1f8 0x7f237cd69830 0x55f24d4d8a59 (nil) tcmalloc: large alloc 1313308672 bytes == 0x55f25d6a6000 @ 0x7f237e9ce1e1 0x7f237d751499 0x7f237d75200b 0x55f24de66e59 0x55f24de20c4a 0x55f24de22394 0x55f24de2375c 0x55f24de24ff1 0x55f24da064df 0x55f24d992623 0x55f24d9c2de7 0x55f24d9c240e 0x55f24d53acff 0x55f24d44d1f8 0x7f237cd69830 0x55f24d4d8a59 (nil) /build/ceph-12.2.1/src/include/buffer.h: In function 'void ceph::buffer::list::prepare_iov(VectorT*) const [with VectorT = boost::container::small_vector<iovec, 4ul>]' thread 7f237f8fce00 time 2017-10-26 00:53:40.482785 /build/ceph-12.2.1/src/include/buffer.h: 882: FAILED assert(_buffers.size() <= 1024) ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55f24db033f2] 2: (KernelDevice::aio_write(unsigned long, ceph::buffer::list&, IOContext*, bool)+0x1598) [0x55f24daa8a78] 3: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned long)+0x9c0) [0x55f24da832c0] 4: (BlueFS::_flush(BlueFS::FileWriter*, bool)+0x124) [0x55f24da84cf4] 5: (BlueRocksWritableFile::Flush()+0x3d) [0x55f24da9ae6d] 6: (rocksdb::WritableFileWriter::Flush()+0x24c) [0x55f24ded33ac] 7: (rocksdb::WritableFileWriter::Sync(bool)+0x3e) [0x55f24ded45ce] 8: (rocksdb::BuildTable(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::Env*, rocksdb::ImmutableCFOptions const&, rocksdb::MutableCFOptions const&, rocksdb::EnvOptions const&, rocksdb::TableCache*, rocksdb::InternalIterator*, std::unique_ptr<rocksdb::InternalIterator, std::default_delete<rocksdb::InternalIterator> >, rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&, std::vector<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> >, std::allocator<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> > > > const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<unsigned long, std::allocator<unsigned long> >, unsigned long, rocksdb::CompressionType, rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*, rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int, rocksdb::Env::IOPriority, rocksdb::TableProperties*, int)+0x190f) [0x55f24def304f] 9: (rocksdb::DBImpl::WriteLevel0TableForRecovery(int, rocksdb::ColumnFamilyData*, rocksdb::MemTable*, rocksdb::VersionEdit*)+0xad9) [0x55f24de1edd9] 10: (rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long*, bool)+0x17ec) [0x55f24de20e2c] 11: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool)+0x8c4) [0x55f24de22394] 12: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0xedc) [0x55f24de2375c] 13: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::DB**)+0x6b1) [0x55f24de24ff1] 14: (RocksDBStore::do_open(std::ostream&, bool)+0x8ff) [0x55f24da064df] 15: (BlueStore::_open_db(bool)+0xf73) [0x55f24d992623] 16: (BlueStore::fsck(bool)+0x3e7) [0x55f24d9c2de7] 17: (BlueStore::_mount(bool)+0x1ee) [0x55f24d9c240e] 18: (OSD::init()+0x3df) [0x55f24d53acff] 19: (main()+0x2eb8) [0x55f24d44d1f8] 20: (__libc_start_main()+0xf0) [0x7f237cd69830] 21: (_start()+0x29) [0x55f24d4d8a59] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2017-10-26 00:53:40.490798 7f237f8fce00 -1 /build/ceph-12.2.1/src/include/buffer.h: In function 'void ceph::buffer::list::prepare_iov(VectorT*) const [with VectorT = boost::container::small_vector<iovec, 4ul>]' thread 7f237f8fce00 time 2017-10-26 00:53:40.482785 /build/ceph-12.2.1/src/include/buffer.h: 882: FAILED assert(_buffers.size() <= 1024)
Actions