Bug #51217: BlueFS::_flush_range assert(h->file->fnode.ino != 1) - bluestore - Ceph

Actions

Copy link

Bug #51217

open

BlueFS::_flush_range assert(h->file->fnode.ino != 1)

Added by Anonymous almost 3 years ago. Updated over 2 years ago.

Status:

In Progress

Priority:

Normal

Assignee:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v12.2.12

ceph-qa-suite:

Pull request ID:

42988

Crash signature (v1):

Crash signature (v2):

Description

[Version]
ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)

[Operation]
1. set args

bluefs_alloc_size = 8192
bluefs_max_prefetch = 8192
bluefs_min_log_runway = 8192
bluefs_max_log_runway = 16384
bluefs_log_compact_min_size = 21474836480
bluestore_min_alloc_size = 8192

2. create osd and pool, then write data.

[Appearance]
1. This is the first backstrace, which looks similar to https://tracker.ceph.com/issues/45519

2021-06-15 03:17:11.951251 7f1d6d49c700 -1 /work/Product/rpmbuild/BUILD/ceph-12.2.12/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_range(BlueFS::
FileWriter*, uint64_t, uint64_t)' thread 7f1d6d49c700 time 2021-06-15 03:17:11.944069
/work/Product/rpmbuild/BUILD/ceph-12.2.12/src/os/bluestore/BlueFS.cc: 1548: FAILED assert(h->file->fnode.ino != 1)

 ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55754b705a50]
 2: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned long)+0x1d89) [0x55754b683d79]
 3: (BlueFS::_flush(BlueFS::FileWriter*, bool)+0x188) [0x55754b684138]
 4: (BlueFS::_flush_and_sync_log(std::unique_lock<std::mutex>&, unsigned long, unsigned long)+0x796) [0x55754b684d46]
 5: (BlueFS::_fsync(BlueFS::FileWriter*, std::unique_lock<std::mutex>&)+0x2a1) [0x55754b6867f1]
 6: (BlueRocksWritableFile::Sync()+0x63) [0x55754b69fdc3]
 7: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x149) [0x55754ba88e69]
 8: (rocksdb::WritableFileWriter::Sync(bool)+0xe8) [0x55754ba89b38]
 9: (rocksdb::DBImpl::WriteToWAL(rocksdb::autovector<rocksdb::WriteThread::Writer*, 8ul> const&, rocksdb::log::Writer*, bool, bool, unsigned long)+0x41a) [0x55754bad569a]
 10: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool)+0x94b) [0x55754bad627b]
 11: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x27) [0x55754bad7247]
 12: (RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0xcf) [0x55754b617d0f]
 13: (BlueStore::_kv_sync_thread()+0x1c6f) [0x55754b5ac93f]
 14: (BlueStore::KVSyncThread::entry()+0xd) [0x55754b5f540d]
 15: (()+0x7e25) [0x7f1d7e995e25]
 16: (clone()+0x6d) [0x7f1d7da8934d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

2. Restart OSD, the second backstrace which seems related to https://github.com/ceph/ceph/pull/35776 and https://github.com/ceph/ceph/pull/35473

2021-06-15 03:17:32.631234 7f5c09e52d00  1 bluefs mount
2021-06-15 03:17:33.381762 7f5c09e52d00 -1 *** Caught signal (Segmentation fault) **
 in thread 7f5c09e52d00 thread_name:ceph-osd

 ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)
 1: (()+0xa64d91) [0x557527796d91]
 2: (()+0xf5e0) [0x7f5c074995e0]
 3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0x4e2) [0x55752774b752]
 4: (BlueFS::_replay(bool)+0x48d) [0x55752775edfd]
 5: (BlueFS::mount()+0x1d4) [0x5575277629e4]
 6: (BlueStore::_open_db(bool)+0x1847) [0x557527673b67]
 7: (BlueStore::_mount(bool)+0x40e) [0x5575276a89be]
 8: (OSD::init()+0x3bd) [0x55752724f9dd]
 9: (main()+0x2d07) [0x557527152dd7]
 10: (__libc_start_main()+0xf5) [0x7f5c064aec05]
 11: (()+0x4c0ca3) [0x5575271f2ca3]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

My goal is to reproduce problem 2 and then analyze the root cause, but encounters problem 1.

Actions

Copy link

Updated by Anonymous almost 3 years ago

1. I use this pr(https://github.com/ceph/ceph/pull/41884) to fix problem 1, then repeat the same operation, problem 2 still exists.
2. I use this pr(https://github.com/ceph/ceph/pull/35776) to fix problem 2, return ok. Then start osd failed, appearance is the same as problem 2.
3. Set bluefs_replay_recovery = true, start osd success.

Actions

Copy link

Updated by Anonymous almost 3 years ago

shu yu wrote:

1. I use this pr(https://github.com/ceph/ceph/pull/41884) to fix problem 1, then repeat the same operation, problem 2 still exists.
2. I use this pr(https://github.com/ceph/ceph/pull/35776) to fix problem 2, return ok. Then start osd failed, appearance is the same as problem 2.
3. Set bluefs_replay_recovery = true, start osd success.

update https://github.com/ceph/ceph/pull/41884 to https://github.com/ceph/ceph/pull/41888

Actions

Copy link

Updated by Igor Fedotov almost 3 years ago

Curious why did you tune bluefs log_runaway settings? Did you really need that low numbers or just set them to reproduce the crash?

Actions

Copy link

Updated by Anonymous almost 3 years ago

Igor Fedotov wrote:

Curious why did you tune bluefs log_runaway settings? Did you really need that low numbers or just set them to reproduce the crash?

I just set them to reproduce the problem 2.

Actions

Copy link

Updated by Igor Fedotov almost 3 years ago

shu yu wrote:

Igor Fedotov wrote:

Curious why did you tune bluefs log_runaway settings? Did you really need that low numbers or just set them to reproduce the crash?

I just set them to reproduce the problem 2.

Have you ever seen that problem with default settings?

Actions

Copy link

Updated by Anonymous almost 3 years ago

Igor Fedotov wrote:

shu yu wrote:

Igor Fedotov wrote:

Curious why did you tune bluefs log_runaway settings? Did you really need that low numbers or just set them to reproduce the crash?

I just set them to reproduce the problem 2.

Have you ever seen that problem with default settings?

Yes, ceph ran for about 1 year without read or write, BlueFS log is about 400G. Restart ceph service, BlueFS replay failed at 300G. Repair success with this pr(https://github.com/ceph/ceph/pull/35776). But I could not take out the disk from the customer's computer room for further analysis, I try to reproduce the problem.

Actions

Copy link