Ceph : Issues
https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2019-07-11T19:26:08Z
Ceph
Redmine
bluestore - Bug #40741 (Triaged): Mass OSD failure, unable to restart
https://tracker.ceph.com/issues/40741
2019-07-11T19:26:08Z
Brett Chancellor
<p>Cluster: 14.2.1<br />OSDs: 250 spinners in default root, 63 SSDs in ssd root</p>
<p>History: 5 days ago, this cluster began losing spinning drives every couple of hours. Many of them were unable to be restarted once they went down, so they had to be rebuilt. After reaching out to the ceph users group we tried setting the bluestore_allocator and bluesfs_allocator to stupid. This allowed newly dying OSDs to be brought back online, although it didn't stop others from dying. Once the cluster finished rebalancing the performance was terrible (disks all idle and error free, individual clients getting 1-2 iops with 12k ms latency) with either allocator, We did note that performance was fine on any root other than the default root. OSDs continued to commit suicide every 30-40 minutes. In an attempt to improve performance we decided to try and move the <zone>.rgw.meta pool from spinning drives to SSD.</p>
<p>After doing this SSD's began to fail in mass. We are unable bring up the SSD OSDs with either the stupid or bitmap allocators.</p>
<p>Attached is an osd with debug_bluestore 10 set.</p>
<p>Current config<br />$ sudo ceph config dump<br />WHO MASK LEVEL OPTION VALUE RO <br />global advanced bluestore_warn_on_bluefs_spillover false <br />global advanced mon_warn_pg_not_deep_scrubbed_ratio 0.000000 <br />global advanced mon_warn_pg_not_scrubbed_ratio 0.000000 <br />global advanced osd_deep_scrub_interval 1814400.000000 <br />global advanced osd_scrub_max_interval 1814400.000000 <br />global advanced osd_scrub_min_interval 259200.000000 <br /> mon advanced mon_osd_down_out_interval 1200 <br /> mon.ceph0rdi-mon1-1-prd advanced ms_bind_msgr2 false <br /> mon.ceph0rdi-mon2-1-prd advanced ms_bind_msgr2 false <br /> mon.ceph0rdi-mon3-1-prd advanced ms_bind_msgr2 false <br /> osd advanced bluestore_bluefs_gift_ratio 0.000200 <br /> osd advanced osd_max_backfills 3 <br /> osd basic osd_memory_target 4294967296 <br /> osd advanced osd_op_thread_timeout 90 <br /> osd advanced osd_recovery_sleep_hdd 0.000000 <br /> osd advanced osd_recovery_sleep_hybrid 0.000000</p>
<p>Example stack trace:<br /> -1> 2019-07-11 18:59:24.296 7fcc1caf5d80 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rpm/el7/BUILD/ceph-14.2.1/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' thread 7fcc1caf5d80 time 2019-07-11 18:59:24.289089<br />/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rpm/el7/BUILD/ceph-14.2.1/src/os/bluestore/BlueFS.cc: 2044: abort()</p>
<pre><code>ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable)<br /> 1: (ceph::__ceph_abort(char const*, int, char const*, std::string const&)+0xd8) [0x560dbdbc9cd0]<br /> 2: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned long)+0x1daf) [0x560dbe2819af]<br /> 3: (BlueFS::_flush(BlueFS::FileWriter*, bool)+0x10b) [0x560dbe281b4b]<br /> 4: (BlueRocksWritableFile::Flush()+0x3d) [0x560dbe29f84d]<br /> 5: (rocksdb::WritableFileWriter::Flush()+0x19e) [0x560dbe8cdd0e]<br /> 6: (rocksdb::WritableFileWriter::Sync(bool)+0x2e) [0x560dbe8cdfee]<br /> 7: (rocksdb::BuildTable(std::string const&, rocksdb::Env*, rocksdb::ImmutableCFOptions const&, rocksdb::MutableCFOptions const&, rocksdb::EnvOptions const&, rocksdb::TableCache*, rocksdb::InternalIteratorBase&lt;rocksdb::Slice&gt;*, std::unique_ptr&lt;rocksdb::InternalIteratorBase&lt;rocksdb::Slice&gt;, std::default_delete&lt;rocksdb::InternalIteratorBase&lt;rocksdb::Slice&gt; > >, rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&, std::vector&lt;std::unique_ptr&lt;rocksdb::IntTblPropCollectorFactory, std::default_delete&lt;rocksdb::IntTblPropCollectorFactory&gt; >, std::allocator&lt;std::unique_ptr&lt;rocksdb::IntTblPropCollectorFactory, std::default_delete&lt;rocksdb::IntTblPropCollectorFactory&gt; > > > const*, unsigned int, std::string const&, std::vector&lt;unsigned long, std::allocator&lt;unsigned long&gt; >, unsigned long, rocksdb::SnapshotChecker*, rocksdb::CompressionType, rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*, rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int, rocksdb::Env::IOPriority, rocksdb::TableProperties*, int, unsigned long, unsigned long, rocksdb::Env::WriteLifeTimeHint)+0x2368) [0x560dbe8fb978]<br /> 8: (rocksdb::DBImpl::WriteLevel0TableForRecovery(int, rocksdb::ColumnFamilyData*, rocksdb::MemTable*, rocksdb::VersionEdit*)+0xc66) [0x560dbe7716a6]<br /> 9: (rocksdb::DBImpl::RecoverLogFiles(std::vector&lt;unsigned long, std::allocator&lt;unsigned long&gt; > const&, unsigned long*, bool)+0x1672) [0x560dbe7735a2]<br /> 10: (rocksdb::DBImpl::Recover(std::vector&lt;rocksdb::ColumnFamilyDescriptor, std::allocator&lt;rocksdb::ColumnFamilyDescriptor&gt; > const&, bool, bool, bool)+0x809) [0x560dbe774b99]<br /> 11: (rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::string const&, std::vector&lt;rocksdb::ColumnFamilyDescriptor, std::allocator&lt;rocksdb::ColumnFamilyDescriptor&gt; > const&, std::vector&lt;rocksdb::ColumnFamilyHandle*, std::allocator&lt;rocksdb::ColumnFamilyHandle*&gt; ><strong>, rocksdb::DB</strong>*, bool, bool)+0x658) [0x560dbe7759a8]<br /> 12: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::string const&, std::vector&lt;rocksdb::ColumnFamilyDescriptor, std::allocator&lt;rocksdb::ColumnFamilyDescriptor&gt; > const&, std::vector&lt;rocksdb::ColumnFamilyHandle*, std::allocator&lt;rocksdb::ColumnFamilyHandle*&gt; ><strong>, rocksdb::DB</strong>*)+0x24) [0x560dbe777184]<br /> 13: (RocksDBStore::do_open(std::ostream&, bool, bool, std::vector&lt;KeyValueDB::ColumnFamily, std::allocator&lt;KeyValueDB::ColumnFamily&gt; > const*)+0x1660) [0x560dbe20bde0]<br /> 14: (BlueStore::_open_db(bool, bool, bool)+0xf8e) [0x560dbe16077e]<br /> 15: (BlueStore::_open_db_and_around(bool)+0x165) [0x560dbe17dcb5]<br /> 16: (BlueStore::_mount(bool, bool)+0x6a4) [0x560dbe1ba694]<br /> 17: (OSD::init()+0x3aa) [0x560dbdd30d7a]<br /> 18: (main()+0x14fa) [0x560dbdbcd1da]<br /> 19: (__libc_start_main()+0xf5) [0x7fcc185283d5]<br /> 20: (()+0x564555) [0x560dbdcc1555]</code></pre>
<pre><code>0> 2019-07-11 18:59:24.304 7fcc1caf5d80 -1 *<strong>* Caught signal (Aborted) *</strong><br /> in thread 7fcc1caf5d80 thread_name:ceph-osd</code></pre>
<pre><code>ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable)<br /> 1: (()+0xf5d0) [0x7fcc197455d0]<br /> 2: (gsignal()+0x37) [0x7fcc1853c207]<br /> 3: (abort()+0x148) [0x7fcc1853d8f8]<br /> 4: (ceph::__ceph_abort(char const*, int, char const*, std::string const&)+0x19c) [0x560dbdbc9d94]<br /> 5: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned long)+0x1daf) [0x560dbe2819af]<br /> 6: (BlueFS::_flush(BlueFS::FileWriter*, bool)+0x10b) [0x560dbe281b4b]<br /> 7: (BlueRocksWritableFile::Flush()+0x3d) [0x560dbe29f84d]<br /> 8: (rocksdb::WritableFileWriter::Flush()+0x19e) [0x560dbe8cdd0e]<br /> 9: (rocksdb::WritableFileWriter::Sync(bool)+0x2e) [0x560dbe8cdfee]<br /> 10: (rocksdb::BuildTable(std::string const&, rocksdb::Env*, rocksdb::ImmutableCFOptions const&, rocksdb::MutableCFOptions const&, rocksdb::EnvOptions const&, rocksdb::TableCache*, rocksdb::InternalIteratorBase&lt;rocksdb::Slice&gt;*, std::unique_ptr&lt;rocksdb::InternalIteratorBase&lt;rocksdb::Slice&gt;, std::default_delete&lt;rocksdb::InternalIteratorBase&lt;rocksdb::Slice&gt; > >, rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&, std::vector&lt;std::unique_ptr&lt;rocksdb::IntTblPropCollectorFactory, std::default_delete&lt;rocksdb::IntTblPropCollectorFactory&gt; >, std::allocator&lt;std::unique_ptr&lt;rocksdb::IntTblPropCollectorFactory, std::default_delete&lt;rocksdb::IntTblPropCollectorFactory&gt; > > > const*, unsigned int, std::string const&, std::vector&lt;unsigned long, std::allocator&lt;unsigned long&gt; >, unsigned long, rocksdb::SnapshotChecker*, rocksdb::CompressionType, rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*, rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int, rocksdb::Env::IOPriority, rocksdb::TableProperties*, int, unsigned long, unsigned long, rocksdb::Env::WriteLifeTimeHint)+0x2368) [0x560dbe8fb978]<br /> 11: (rocksdb::DBImpl::WriteLevel0TableForRecovery(int, rocksdb::ColumnFamilyData*, rocksdb::MemTable*, rocksdb::VersionEdit*)+0xc66) [0x560dbe7716a6]<br /> 12: (rocksdb::DBImpl::RecoverLogFiles(std::vector&lt;unsigned long, std::allocator&lt;unsigned long&gt; > const&, unsigned long*, bool)+0x1672) [0x560dbe7735a2]<br /> 13: (rocksdb::DBImpl::Recover(std::vector&lt;rocksdb::ColumnFamilyDescriptor, std::allocator&lt;rocksdb::ColumnFamilyDescriptor&gt; > const&, bool, bool, bool)+0x809) [0x560dbe774b99]<br /> 14: (rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::string const&, std::vector&lt;rocksdb::ColumnFamilyDescriptor, std::allocator&lt;rocksdb::ColumnFamilyDescriptor&gt; > const&, std::vector&lt;rocksdb::ColumnFamilyHandle*, std::allocator&lt;rocksdb::ColumnFamilyHandle*&gt; ><strong>, rocksdb::DB</strong>*, bool, bool)+0x658) [0x560dbe7759a8]<br /> 15: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::string const&, std::vector&lt;rocksdb::ColumnFamilyDescriptor, std::allocator&lt;rocksdb::ColumnFamilyDescriptor&gt; > const&, std::vector&lt;rocksdb::ColumnFamilyHandle*, std::allocator&lt;rocksdb::ColumnFamilyHandle*&gt; ><strong>, rocksdb::DB</strong>*)+0x24) [0x560dbe777184]<br /> 16: (RocksDBStore::do_open(std::ostream&, bool, bool, std::vector&lt;KeyValueDB::ColumnFamily, std::allocator&lt;KeyValueDB::ColumnFamily&gt; > const*)+0x1660) [0x560dbe20bde0]<br /> 17: (BlueStore::_open_db(bool, bool, bool)+0xf8e) [0x560dbe16077e]<br /> 18: (BlueStore::_open_db_and_around(bool)+0x165) [0x560dbe17dcb5]<br /> 19: (BlueStore::_mount(bool, bool)+0x6a4) [0x560dbe1ba694]<br /> 20: (OSD::init()+0x3aa) [0x560dbdd30d7a]<br /> 21: (main()+0x14fa) [0x560dbdbcd1da]<br /> 22: (__libc_start_main()+0xf5) [0x7fcc185283d5]<br /> 23: (()+0x564555) [0x560dbdcc1555]<br /> NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.</code></pre>