Project

General

Profile

Actions

Bug #46994

open

14.2.11 OSD crash BlueFS.cc: 1662: FAILED ceph_assert(r == 0)

Added by Manuel Rios over 3 years ago. Updated over 3 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi Dev Team,

Today our one OSD failed with this crash error related to BlueFS.cc , OSD is fully dedicated to RGW INDEX, not shared with other task.

Right now OSD is not able to boot.

2020-08-17 15:45:27.598 7f1e2fa82700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7                       /MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReader                       Buffer*, uint64_t, size_t, ceph::bufferlist*, char*)' thread 7f1e2fa82700 time 2020-08-17 15:45:27.589402
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/B                       UILD/ceph-14.2.11/src/os/bluestore/BlueFS.cc: 1662: FAILED ceph_assert(r == 0)

 ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x563f96b550e5]
 2: (()+0x4d72ad) [0x563f96b552ad]
 3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::v14_2_0::list*, char*)+0xf0e) [0x563f9715aa9e]
 4: (BlueRocksRandomAccessFile::Prefetch(unsigned long, unsigned long)+0x2a) [0x563f9718453a]
 5: (rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, rocksdb::Slice>::InitDataBlock()+0x29f) [0x563f9772697f]
 6: (rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, rocksdb::Slice>::FindKeyForward()+0x1c0) [0x563f97726bb0]
 7: (()+0x102fd29) [0x563f976add29]
 8: (rocksdb::MergingIterator::Next()+0x42) [0x563f97738162]
 9: (rocksdb::DBIter::Next()+0x1f3) [0x563f97641e53]
 10: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::next()+0x2d) [0x563f975b36bd]
 11: (RocksDBStore::RocksDBTransactionImpl::rm_range_keys(std::string const&, std::string const&, std::string const&)+0x567) [0x563f975beab7]
 12: (BlueStore::_do_omap_clear(BlueStore::TransContext*, std::string const&, unsigned long)+0x72) [0x563f9708f2f2]
 13: (BlueStore::_do_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>)+0xc16) [0x563f970a6026]
 14: (BlueStore::_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&)+0x5f) [0x563f970a6cbf]
 15: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x13f5) [0x563f970acca5]
 16: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transactio                       n> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x370) [0x563f970c1100]
 17: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TP                       Handle*)+0x7f) [0x563f96cb6d3f]
 18: (non-virtual thunk to PrimaryLogPG::queue_transaction(ObjectStore::Transaction&&, boost::intrusive_ptr<OpRequest>)+0x4f) [0x563f96e3015f]
 19: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x4a0) [0x563f96f2a970]
 20: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x298) [0x563f96f32d38]
 21: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x4a) [0x563f96e4486a]
 22: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x5b3) [0x563f96df4c63]
 23: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x362) [0x563f96c34da2]
 24: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x563f96ec37c2]
 25: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x90f) [0x563f96c4fd3f]
 26: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) [0x563f97203c46]
 27: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x563f97206760]
 28: (()+0x7dd5) [0x7f1e504eddd5]
 29: (clone()+0x6d) [0x7f1e4f3ad02d]

     0> 2020-08-17 15:45:27.609 7f1e2fa82700 -1 *** Caught signal (Aborted) **
 in thread 7f1e2fa82700 thread_name:tp_osd_tp

 ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)
 1: (()+0xf5d0) [0x7f1e504f55d0]
 2: (gsignal()+0x37) [0x7f1e4f2e52c7]
 3: (abort()+0x148) [0x7f1e4f2e69b8]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x563f96b55134]
 5: (()+0x4d72ad) [0x563f96b552ad]
 6: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::v14_2_0::list*, char*)+0xf0e) [0x563f9715aa9e]
 7: (BlueRocksRandomAccessFile::Prefetch(unsigned long, unsigned long)+0x2a) [0x563f9718453a]
 8: (rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, rocksdb::Slice>::InitDataBlock()+0x29f) [0x563f9772697f]
 9: (rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, rocksdb::Slice>::FindKeyForward()+0x1c0) [0x563f97726bb0]
 10: (()+0x102fd29) [0x563f976add29]
 11: (rocksdb::MergingIterator::Next()+0x42) [0x563f97738162]
 12: (rocksdb::DBIter::Next()+0x1f3) [0x563f97641e53]
 13: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::next()+0x2d) [0x563f975b36bd]
 14: (RocksDBStore::RocksDBTransactionImpl::rm_range_keys(std::string const&, std::string const&, std::string const&)+0x567) [0x563f975beab7]
 15: (BlueStore::_do_omap_clear(BlueStore::TransContext*, std::string const&, unsigned long)+0x72) [0x563f9708f2f2]
 16: (BlueStore::_do_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>)+0xc16) [0x563f970a6026]
 17: (BlueStore::_remove(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&)+0x5f) [0x563f970a6cbf]
 18: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x13f5) [0x563f970acca5]
 19: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transactio                       n> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x370) [0x563f970c1100]
 20: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TP                       Handle*)+0x7f) [0x563f96cb6d3f]
 21: (non-virtual thunk to PrimaryLogPG::queue_transaction(ObjectStore::Transaction&&, boost::intrusive_ptr<OpRequest>)+0x4f) [0x563f96e3015f]
 22: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x4a0) [0x563f96f2a970]
 23: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x298) [0x563f96f32d38]
 24: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x4a) [0x563f96e4486a]
 25: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x5b3) [0x563f96df4c63]
 26: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x362) [0x563f96c34da2]
 27: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x563f96ec37c2]
 28: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x90f) [0x563f96c4fd3f]
 29: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) [0x563f97203c46]
 30: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x563f97206760]
 31: (()+0x7dd5) [0x7f1e504eddd5]
 32: (clone()+0x6d) [0x7f1e4f3ad02d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Any ideas?

Actions #1

Updated by Igor Fedotov over 3 years ago

Manuel,
could you please share previous 10000 lines from your OSD log?

Actions #2

Updated by Neha Ojha over 3 years ago

  • Status changed from New to Need More Info
Actions

Also available in: Atom PDF