Project

General

Profile

Actions

Bug #40741

open

Mass OSD failure, unable to restart

Added by Brett Chancellor almost 5 years ago. Updated almost 4 years ago.

Status:
Triaged
Priority:
Normal
Assignee:
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Cluster: 14.2.1
OSDs: 250 spinners in default root, 63 SSDs in ssd root

History: 5 days ago, this cluster began losing spinning drives every couple of hours. Many of them were unable to be restarted once they went down, so they had to be rebuilt. After reaching out to the ceph users group we tried setting the bluestore_allocator and bluesfs_allocator to stupid. This allowed newly dying OSDs to be brought back online, although it didn't stop others from dying. Once the cluster finished rebalancing the performance was terrible (disks all idle and error free, individual clients getting 1-2 iops with 12k ms latency) with either allocator, We did note that performance was fine on any root other than the default root. OSDs continued to commit suicide every 30-40 minutes. In an attempt to improve performance we decided to try and move the <zone>.rgw.meta pool from spinning drives to SSD.

After doing this SSD's began to fail in mass. We are unable bring up the SSD OSDs with either the stupid or bitmap allocators.

Attached is an osd with debug_bluestore 10 set.

Current config
$ sudo ceph config dump
WHO MASK LEVEL OPTION VALUE RO
global advanced bluestore_warn_on_bluefs_spillover false
global advanced mon_warn_pg_not_deep_scrubbed_ratio 0.000000
global advanced mon_warn_pg_not_scrubbed_ratio 0.000000
global advanced osd_deep_scrub_interval 1814400.000000
global advanced osd_scrub_max_interval 1814400.000000
global advanced osd_scrub_min_interval 259200.000000
mon advanced mon_osd_down_out_interval 1200
mon.ceph0rdi-mon1-1-prd advanced ms_bind_msgr2 false
mon.ceph0rdi-mon2-1-prd advanced ms_bind_msgr2 false
mon.ceph0rdi-mon3-1-prd advanced ms_bind_msgr2 false
osd advanced bluestore_bluefs_gift_ratio 0.000200
osd advanced osd_max_backfills 3
osd basic osd_memory_target 4294967296
osd advanced osd_op_thread_timeout 90
osd advanced osd_recovery_sleep_hdd 0.000000
osd advanced osd_recovery_sleep_hybrid 0.000000

Example stack trace:
-1> 2019-07-11 18:59:24.296 7fcc1caf5d80 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rpm/el7/BUILD/ceph-14.2.1/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' thread 7fcc1caf5d80 time 2019-07-11 18:59:24.289089
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rpm/el7/BUILD/ceph-14.2.1/src/os/bluestore/BlueFS.cc: 2044: abort()

ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable)
1: (ceph::__ceph_abort(char const*, int, char const*, std::string const&)+0xd8) [0x560dbdbc9cd0]
2: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned long)+0x1daf) [0x560dbe2819af]
3: (BlueFS::_flush(BlueFS::FileWriter*, bool)+0x10b) [0x560dbe281b4b]
4: (BlueRocksWritableFile::Flush()+0x3d) [0x560dbe29f84d]
5: (rocksdb::WritableFileWriter::Flush()+0x19e) [0x560dbe8cdd0e]
6: (rocksdb::WritableFileWriter::Sync(bool)+0x2e) [0x560dbe8cdfee]
7: (rocksdb::BuildTable(std::string const&, rocksdb::Env*, rocksdb::ImmutableCFOptions const&, rocksdb::MutableCFOptions const&, rocksdb::EnvOptions const&, rocksdb::TableCache*, rocksdb::InternalIteratorBase&lt;rocksdb::Slice&gt;*, std::unique_ptr&lt;rocksdb::InternalIteratorBase&lt;rocksdb::Slice&gt;, std::default_delete&lt;rocksdb::InternalIteratorBase&lt;rocksdb::Slice&gt; > >, rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&, std::vector&lt;std::unique_ptr&lt;rocksdb::IntTblPropCollectorFactory, std::default_delete&lt;rocksdb::IntTblPropCollectorFactory&gt; >, std::allocator&lt;std::unique_ptr&lt;rocksdb::IntTblPropCollectorFactory, std::default_delete&lt;rocksdb::IntTblPropCollectorFactory&gt; > > > const*, unsigned int, std::string const&, std::vector&lt;unsigned long, std::allocator&lt;unsigned long&gt; >, unsigned long, rocksdb::SnapshotChecker*, rocksdb::CompressionType, rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*, rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int, rocksdb::Env::IOPriority, rocksdb::TableProperties*, int, unsigned long, unsigned long, rocksdb::Env::WriteLifeTimeHint)+0x2368) [0x560dbe8fb978]
8: (rocksdb::DBImpl::WriteLevel0TableForRecovery(int, rocksdb::ColumnFamilyData*, rocksdb::MemTable*, rocksdb::VersionEdit*)+0xc66) [0x560dbe7716a6]
9: (rocksdb::DBImpl::RecoverLogFiles(std::vector&lt;unsigned long, std::allocator&lt;unsigned long&gt; > const&, unsigned long*, bool)+0x1672) [0x560dbe7735a2]
10: (rocksdb::DBImpl::Recover(std::vector&lt;rocksdb::ColumnFamilyDescriptor, std::allocator&lt;rocksdb::ColumnFamilyDescriptor&gt; > const&, bool, bool, bool)+0x809) [0x560dbe774b99]
11: (rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::string const&, std::vector&lt;rocksdb::ColumnFamilyDescriptor, std::allocator&lt;rocksdb::ColumnFamilyDescriptor&gt; > const&, std::vector&lt;rocksdb::ColumnFamilyHandle*, std::allocator&lt;rocksdb::ColumnFamilyHandle*&gt; >, rocksdb::DB*, bool, bool)+0x658) [0x560dbe7759a8]
12: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::string const&, std::vector&lt;rocksdb::ColumnFamilyDescriptor, std::allocator&lt;rocksdb::ColumnFamilyDescriptor&gt; > const&, std::vector&lt;rocksdb::ColumnFamilyHandle*, std::allocator&lt;rocksdb::ColumnFamilyHandle*&gt; >, rocksdb::DB*)+0x24) [0x560dbe777184]
13: (RocksDBStore::do_open(std::ostream&, bool, bool, std::vector&lt;KeyValueDB::ColumnFamily, std::allocator&lt;KeyValueDB::ColumnFamily&gt; > const*)+0x1660) [0x560dbe20bde0]
14: (BlueStore::_open_db(bool, bool, bool)+0xf8e) [0x560dbe16077e]
15: (BlueStore::_open_db_and_around(bool)+0x165) [0x560dbe17dcb5]
16: (BlueStore::_mount(bool, bool)+0x6a4) [0x560dbe1ba694]
17: (OSD::init()+0x3aa) [0x560dbdd30d7a]
18: (main()+0x14fa) [0x560dbdbcd1da]
19: (__libc_start_main()+0xf5) [0x7fcc185283d5]
20: (()+0x564555) [0x560dbdcc1555]
0> 2019-07-11 18:59:24.304 7fcc1caf5d80 -1 ** Caught signal (Aborted) *
in thread 7fcc1caf5d80 thread_name:ceph-osd
ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable)
1: (()+0xf5d0) [0x7fcc197455d0]
2: (gsignal()+0x37) [0x7fcc1853c207]
3: (abort()+0x148) [0x7fcc1853d8f8]
4: (ceph::__ceph_abort(char const*, int, char const*, std::string const&)+0x19c) [0x560dbdbc9d94]
5: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned long)+0x1daf) [0x560dbe2819af]
6: (BlueFS::_flush(BlueFS::FileWriter*, bool)+0x10b) [0x560dbe281b4b]
7: (BlueRocksWritableFile::Flush()+0x3d) [0x560dbe29f84d]
8: (rocksdb::WritableFileWriter::Flush()+0x19e) [0x560dbe8cdd0e]
9: (rocksdb::WritableFileWriter::Sync(bool)+0x2e) [0x560dbe8cdfee]
10: (rocksdb::BuildTable(std::string const&, rocksdb::Env*, rocksdb::ImmutableCFOptions const&, rocksdb::MutableCFOptions const&, rocksdb::EnvOptions const&, rocksdb::TableCache*, rocksdb::InternalIteratorBase&lt;rocksdb::Slice&gt;*, std::unique_ptr&lt;rocksdb::InternalIteratorBase&lt;rocksdb::Slice&gt;, std::default_delete&lt;rocksdb::InternalIteratorBase&lt;rocksdb::Slice&gt; > >, rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&, std::vector&lt;std::unique_ptr&lt;rocksdb::IntTblPropCollectorFactory, std::default_delete&lt;rocksdb::IntTblPropCollectorFactory&gt; >, std::allocator&lt;std::unique_ptr&lt;rocksdb::IntTblPropCollectorFactory, std::default_delete&lt;rocksdb::IntTblPropCollectorFactory&gt; > > > const*, unsigned int, std::string const&, std::vector&lt;unsigned long, std::allocator&lt;unsigned long&gt; >, unsigned long, rocksdb::SnapshotChecker*, rocksdb::CompressionType, rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*, rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int, rocksdb::Env::IOPriority, rocksdb::TableProperties*, int, unsigned long, unsigned long, rocksdb::Env::WriteLifeTimeHint)+0x2368) [0x560dbe8fb978]
11: (rocksdb::DBImpl::WriteLevel0TableForRecovery(int, rocksdb::ColumnFamilyData*, rocksdb::MemTable*, rocksdb::VersionEdit*)+0xc66) [0x560dbe7716a6]
12: (rocksdb::DBImpl::RecoverLogFiles(std::vector&lt;unsigned long, std::allocator&lt;unsigned long&gt; > const&, unsigned long*, bool)+0x1672) [0x560dbe7735a2]
13: (rocksdb::DBImpl::Recover(std::vector&lt;rocksdb::ColumnFamilyDescriptor, std::allocator&lt;rocksdb::ColumnFamilyDescriptor&gt; > const&, bool, bool, bool)+0x809) [0x560dbe774b99]
14: (rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::string const&, std::vector&lt;rocksdb::ColumnFamilyDescriptor, std::allocator&lt;rocksdb::ColumnFamilyDescriptor&gt; > const&, std::vector&lt;rocksdb::ColumnFamilyHandle*, std::allocator&lt;rocksdb::ColumnFamilyHandle*&gt; >, rocksdb::DB*, bool, bool)+0x658) [0x560dbe7759a8]
15: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::string const&, std::vector&lt;rocksdb::ColumnFamilyDescriptor, std::allocator&lt;rocksdb::ColumnFamilyDescriptor&gt; > const&, std::vector&lt;rocksdb::ColumnFamilyHandle*, std::allocator&lt;rocksdb::ColumnFamilyHandle*&gt; >, rocksdb::DB*)+0x24) [0x560dbe777184]
16: (RocksDBStore::do_open(std::ostream&, bool, bool, std::vector&lt;KeyValueDB::ColumnFamily, std::allocator&lt;KeyValueDB::ColumnFamily&gt; > const*)+0x1660) [0x560dbe20bde0]
17: (BlueStore::_open_db(bool, bool, bool)+0xf8e) [0x560dbe16077e]
18: (BlueStore::_open_db_and_around(bool)+0x165) [0x560dbe17dcb5]
19: (BlueStore::_mount(bool, bool)+0x6a4) [0x560dbe1ba694]
20: (OSD::init()+0x3aa) [0x560dbdd30d7a]
21: (main()+0x14fa) [0x560dbdbcd1da]
22: (__libc_start_main()+0xf5) [0x7fcc185283d5]
23: (()+0x564555) [0x560dbdcc1555]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

Files

ceph-osd.123.log.truncated.gz (71.8 KB) ceph-osd.123.log.truncated.gz truncated log file from ssd failure Brett Chancellor, 07/11/2019 07:22 PM
ceph-osd.34.log.truncated.gz (515 KB) ceph-osd.34.log.truncated.gz Brett Chancellor, 07/12/2019 05:54 PM
ceph-osd.110.log.truncated.gz (170 KB) ceph-osd.110.log.truncated.gz Brett Chancellor, 07/12/2019 06:07 PM
ceph-osd.44.log.truncated.gz (318 KB) ceph-osd.44.log.truncated.gz Brett Chancellor, 07/12/2019 06:24 PM
osd.34.bluefs.log.gz (505 KB) osd.34.bluefs.log.gz osd.34 log with bluefs set to 20/20 Brett Chancellor, 07/12/2019 10:34 PM
osd.110.bluefs.log.gz (486 KB) osd.110.bluefs.log.gz osd.110 log with bluefs set to 20/20 Brett Chancellor, 07/12/2019 10:34 PM

Related issues 2 (0 open2 closed)

Related to bluestore - Bug #45765: BlueStore::_collection_list causes huge latency growth pg deletionResolved

Actions
Related to bluestore - Bug #45994: OSD crash - in thread tp_osd_tpDuplicate

Actions
Actions

Also available in: Atom PDF