Project

General

Profile

Actions

Bug #48726

closed

/build/ceph-15.2.6/src/os/bluestore/BlueStore.cc: 13150: FAILED ceph_assert(r >= 0 && r <= (int)tail_read)

Added by Chris Dunlop over 3 years ago. Updated over 3 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Per attached log

/build/ceph-15.2.6/src/os/bluestore/BlueStore.cc: 13150: FAILED ceph_assert(r >= 0 && r <= (int)tail_read)

 ceph version 15.2.6 (cb8c61a60551b72614257d632a574d420064c17a) octopus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x154) [0x561ef2b0b316]
 2: (()+0x9c24ee) [0x561ef2b0b4ee]
 3: (BlueStore::_do_write_small(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)+0x33e3) [0x561ef305e4a3]
 4: (BlueStore::_do_write_data(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, BlueStore::WriteContext*)+0x1c1) [0x561ef305ec41]
 5: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x2b8) [0x561ef3065e18]
 6: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xdc) [0x561ef3066d7c]
 7: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x16ec) [0x561ef306a37c]
 8: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x408) [0x561ef3080ae8]
 9: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x54) [0x561ef2d3c1b4]
 10: (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, std::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0xca8) [0x561ef2ecb018]
 11: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0xc60) [0x561ef2cb53a0]
 12: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x1055) [0x561ef2d0a455]
 13: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x36de) [0x561ef2d0e5be]
 14: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xd9e) [0x561ef2d15dae]
 15: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x325) [0x561ef2bae665]
 16: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x64) [0x561ef2df1ce4]
 17: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x12fa) [0x561ef2bcb06a]
 18: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4) [0x561ef31c92b4]
 19: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561ef31cbd30]
 20: (()+0x7fa3) [0x7f747fe22fa3]
 21: (clone()+0x3f) [0x7f747f9d04cf]

2020-12-28T18:27:59.474+1100 7f7463325700 -1 *** Caught signal (Aborted) **
 in thread 7f7463325700 thread_name:tp_osd_tp

Files

ceph-osd.3.log-20201229.gz (224 KB) ceph-osd.3.log-20201229.gz Chris Dunlop, 12/28/2020 09:55 PM
Actions #2

Updated by Igor Fedotov over 3 years ago

There are the following lines prior to the assertion:

-3> 2020-12-28T18:27:59.444+1100 7f7477b4e700 -1 bdev(0x561eff1e4000 /var/lib/ceph/osd/ceph-3/block) _aio_thread got r=-61 ((61) No data available)
-2> 2020-12-28T18:27:59.444+1100 7f7477b4e700 -1 bdev(0x561eff1e4000 /var/lib/ceph/osd/ceph-3/block) _aio_thread translating the error to EIO for upper layer

Highly likely this indicated H/W errors while reading from the disk.
Could you please check dmesg output for any relevant errors?

Actions #3

Updated by Chris Dunlop over 3 years ago

Sorry, false alarm. The issue would have been caused by this disk error:

# journalctl -k --since '2020-12-28 18:20'
-- Logs begin at Wed 2020-08-05 00:42:23 AEST, end at Tue 2020-12-29 09:34:15 AEDT. --
Dec 28 18:27:59 b4 kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Dec 28 18:27:59 b4 kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Dec 28 18:27:59 b4 kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Dec 28 18:27:59 b4 kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Dec 28 18:27:59 b4 kernel: sd 0:0:13:0: [sdn] tag#2731 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=3s
Dec 28 18:27:59 b4 kernel: sd 0:0:13:0: [sdn] tag#2731 Sense Key : Medium Error [current] 
Dec 28 18:27:59 b4 kernel: sd 0:0:13:0: [sdn] tag#2731 Add. Sense: Unrecovered read error
Dec 28 18:27:59 b4 kernel: sd 0:0:13:0: [sdn] tag#2731 CDB: Read(16) 88 00 00 00 00 00 48 02 5e 70 00 00 00 08 00 00
Dec 28 18:27:59 b4 kernel: blk_update_request: critical medium error, dev sdn, sector 1208114800 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Dec 28 18:28:00 b4 kernel: libceph: osd3 down
Dec 28 18:28:35 b4 kernel: libceph: osd3 up

Actions #4

Updated by Chris Dunlop over 3 years ago

Just beat me to it :-)

Actions #5

Updated by Igor Fedotov over 3 years ago

  • Status changed from New to Rejected
Actions

Also available in: Atom PDF