Actions
Bug #49861
closedBlueFS _flush_range coredump
% Done:
0%
Description
Hi,
BlueFS coredump occurred during rocksdb background compact.
FAILED ceph_assert(!h->file->deleted)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x296) [0x7f3d5bb66be6]
2: (()+0x245f220) [0x7f3d5bb67220]
3: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned long)+0x5eb) [0x55d441acf193]
4: (BlueFS::_flush(BlueFS::FileWriter*, bool, bool*)+0xd83) [0x55d441ad6055]
5: (BlueFS::_flush(BlueFS::FileWriter*, bool, std::unique_lock<std::mutex>&)+0x56) [0x55d441ad50c6]
6: (BlueFS::flush(BlueFS::FileWriter*, bool)+0x78) [0x55d441b36918]
7: (BlueRocksWritableFile::Close()+0x51) [0x55d441b37ad9]
8: (rocksdb::WritableFileWriter::Close()+0x2a7) [0x55d4414be0a1]
9: (rocksdb::WritableFileWriter::~WritableFileWriter()+0x1f) [0x55d4414a85c9]
10: (std::default_delete<rocksdb::WritableFileWriter>::operator()(rocksdb::WritableFileWriter*) const+0x22) [0x55d4414aa85c]
11: (std::unique_ptr<rocksdb::WritableFileWriter, std::default_delete<rocksdb::WritableFileWriter> >::~unique_ptr()+0x49) [0x55d4414a9e3f]
12: (rocksdb::log::Writer::~Writer()+0x37) [0x55d4415f84f7]
13: (rocksdb::JobContext::Clean()+0x19d) [0x55d4414f3d01]
14: (rocksdb::DBImpl::BackgroundCallFlush(rocksdb::Env::Priority)+0x509) [0x55d44157a195]
15: (rocksdb::DBImpl::BGWorkFlush(void*)+0xb8) [0x55d441578f60]
our cluster: ceph version 15.2.0
It seems that rocksdb does cleanup on file close even when the file is unlinked.
As like liewegas metioned https://github.com/ceph/ceph/pull/10686#issuecomment-239447408, rocksdb probably doing such behavior.
So can we do nothing in BlueFS::_flush_range if h->file->deleted == true, like that
int BlueFS::_flush_range(FileWriter *h, uint64_t offset, uint64_t length)
{
if (h->file->deleted) {
dout(10) << __func__ << " deleted, no-op" << dendl;
return 0;
}
}
Updated by Kefu Chai about 3 years ago
- Status changed from New to Fix Under Review
- Backport set to nautilus, octopus, pacific
- Pull request ID set to 40581
Updated by Kefu Chai about 3 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot about 3 years ago
- Copied to Backport #50210: octopus: BlueFS _flush_range coredump added
Updated by Backport Bot about 3 years ago
- Copied to Backport #50211: nautilus: BlueFS _flush_range coredump added
Updated by Backport Bot about 3 years ago
- Copied to Backport #50212: pacific: BlueFS _flush_range coredump added
Updated by Loïc Dachary almost 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".
Actions