Project

General

Profile

Bug #57857

KernelDevice::read doesn't translate error codes correctly

Added by Joshua Baergen 4 months ago. Updated about 2 months ago.


Description

"(()+0xf630) [0x7f746eadc630]",
"(gsignal()+0x37) [0x7f746d8cf387]",
"(abort()+0x148) [0x7f746d8d0a78]",
"(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x556386de14ae]",
"(()+0x4d9627) [0x556386de1627]",
"(BlueStore::_do_read(BlueStore::Collection*, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v14_2_0::list&, unsigned int, unsigned long)+0x3512) [0x55638732b9c2]",
"(BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::v14_2_0::list&, unsigned int)+0x1b8) [0x55638732c008]",
"(ECBackend::be_deep_scrub(hobject_t const&, ScrubMap&, ScrubMapBuilder&, ScrubMap::object&)+0x2d0) [0x5563871e97a0]",
"(PGBackend::be_scan_list(ScrubMap&, ScrubMapBuilder&)+0x663) [0x5563870ce303]",
"(PG::build_scrub_map_chunk(ScrubMap&, ScrubMapBuilder&, hobject_t, hobject_t, bool, ThreadPool::TPHandle&)+0x8b) [0x556386f8022b]",
"(PG::chunky_scrub(ThreadPool::TPHandle&)+0x182c) [0x556386faa88c]",
"(PG::scrub(unsigned int, ThreadPool::TPHandle&)+0x4bb) [0x556386fabc5b]",
"(PGScrub::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x12) [0x556387151e82]",
"(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x90f) [0x556386edde6f]",
"(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) [0x556387499f26]",
"(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55638749ca40]",
"(()+0x7ea5) [0x7f746ead4ea5]",
"(clone()+0x6d) [0x7f746d997b0d]"

2022-10-12 07:53:28.507 7fd2acda8700 -1 bdev(0x55e552c3a700 /var/lib/ceph/osd/ceph-171/block) read stalled read 0x2e923fc000~80000 (direct) since 4733634.560527s, timeout is 5.000000s
2022-10-12 07:53:28.507 7fd2acda8700 -1 bluestore(/var/lib/ceph/osd/ceph-171) _do_read bdev-read failed: (61) No data available
2022-10-12 07:53:28.511 7fd2acda8700 -1 /.../src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_do_read(BlueStore::Collection*, BlueStore::OnodeRef, uint64_t, size_t, ceph::bufferlist&, uint32_t, uint64_t)' thread 7fd2acda8700 time 2022-10-12 07:53:28.510278
/.../src/os/bluestore/BlueStore.cc: 9625: FAILED ceph_assert(r == 0)

This should be impossible, as this code path asks that ENODATA be translated to EIO and passed up for repair. However, the wrong error code is being checked in KernelDevice::read and this doesn't happen - see https://github.com/ceph/ceph/pull/48467.


Related issues

Copied to bluestore - Backport #58180: quincy: KernelDevice::read doesn't translate error codes correctly New
Copied to bluestore - Backport #58181: pacific: KernelDevice::read doesn't translate error codes correctly In Progress

History

#1 Updated by Igor Fedotov 3 months ago

  • Status changed from New to Fix Under Review
  • Backport set to quincy, pacific

#2 Updated by Igor Fedotov about 2 months ago

  • Status changed from Fix Under Review to Pending Backport

#3 Updated by Backport Bot about 2 months ago

  • Copied to Backport #58180: quincy: KernelDevice::read doesn't translate error codes correctly added

#4 Updated by Backport Bot about 2 months ago

  • Copied to Backport #58181: pacific: KernelDevice::read doesn't translate error codes correctly added

#5 Updated by Backport Bot about 2 months ago

  • Tags set to backport_processed

Also available in: Atom PDF