Actions
Bug #48726
closed/build/ceph-15.2.6/src/os/bluestore/BlueStore.cc: 13150: FAILED ceph_assert(r >= 0 && r <= (int)tail_read)
Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Per attached log
/build/ceph-15.2.6/src/os/bluestore/BlueStore.cc: 13150: FAILED ceph_assert(r >= 0 && r <= (int)tail_read) ceph version 15.2.6 (cb8c61a60551b72614257d632a574d420064c17a) octopus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x154) [0x561ef2b0b316] 2: (()+0x9c24ee) [0x561ef2b0b4ee] 3: (BlueStore::_do_write_small(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list::iterator&, BlueStore::WriteContext*)+0x33e3) [0x561ef305e4a3] 4: (BlueStore::_do_write_data(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, BlueStore::WriteContext*)+0x1c1) [0x561ef305ec41] 5: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x2b8) [0x561ef3065e18] 6: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0xdc) [0x561ef3066d7c] 7: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x16ec) [0x561ef306a37c] 8: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x408) [0x561ef3080ae8] 9: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x54) [0x561ef2d3c1b4] 10: (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, std::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0xca8) [0x561ef2ecb018] 11: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0xc60) [0x561ef2cb53a0] 12: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x1055) [0x561ef2d0a455] 13: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x36de) [0x561ef2d0e5be] 14: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xd9e) [0x561ef2d15dae] 15: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x325) [0x561ef2bae665] 16: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x64) [0x561ef2df1ce4] 17: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x12fa) [0x561ef2bcb06a] 18: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4) [0x561ef31c92b4] 19: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x561ef31cbd30] 20: (()+0x7fa3) [0x7f747fe22fa3] 21: (clone()+0x3f) [0x7f747f9d04cf] 2020-12-28T18:27:59.474+1100 7f7463325700 -1 *** Caught signal (Aborted) ** in thread 7f7463325700 thread_name:tp_osd_tp
Files
Updated by Chris Dunlop over 3 years ago
See also: https://tracker.ceph.com/issues/19984
Updated by Igor Fedotov over 3 years ago
There are the following lines prior to the assertion:
-3> 2020-12-28T18:27:59.444+1100 7f7477b4e700 -1 bdev(0x561eff1e4000 /var/lib/ceph/osd/ceph-3/block) _aio_thread got r=-61 ((61) No data available)
-2> 2020-12-28T18:27:59.444+1100 7f7477b4e700 -1 bdev(0x561eff1e4000 /var/lib/ceph/osd/ceph-3/block) _aio_thread translating the error to EIO for upper layer
Highly likely this indicated H/W errors while reading from the disk.
Could you please check dmesg output for any relevant errors?
Updated by Chris Dunlop over 3 years ago
Sorry, false alarm. The issue would have been caused by this disk error:
# journalctl -k --since '2020-12-28 18:20' -- Logs begin at Wed 2020-08-05 00:42:23 AEST, end at Tue 2020-12-29 09:34:15 AEDT. -- Dec 28 18:27:59 b4 kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Dec 28 18:27:59 b4 kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Dec 28 18:27:59 b4 kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Dec 28 18:27:59 b4 kernel: mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Dec 28 18:27:59 b4 kernel: sd 0:0:13:0: [sdn] tag#2731 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=3s Dec 28 18:27:59 b4 kernel: sd 0:0:13:0: [sdn] tag#2731 Sense Key : Medium Error [current] Dec 28 18:27:59 b4 kernel: sd 0:0:13:0: [sdn] tag#2731 Add. Sense: Unrecovered read error Dec 28 18:27:59 b4 kernel: sd 0:0:13:0: [sdn] tag#2731 CDB: Read(16) 88 00 00 00 00 00 48 02 5e 70 00 00 00 08 00 00 Dec 28 18:27:59 b4 kernel: blk_update_request: critical medium error, dev sdn, sector 1208114800 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 Dec 28 18:28:00 b4 kernel: libceph: osd3 down Dec 28 18:28:35 b4 kernel: libceph: osd3 up
Actions