Bug #36455
closedBlueStore: ENODATA not fully handled
0%
Description
We have a drive model experiencing weak writes, which manifest themselves as failed reads later; the drive notices the low-level checksum doesn't validate and returns ENODATA.
This is 12.2.7; this was supposedly(?) fixed in http://tracker.ceph.com/issues/23333 and backported for 12.2.6: http://tracker.ceph.com/issues/23672 - is_expected_ioerr() is supposed to map this to EIO.
However, we still observe an assert:
2018-10-01T12:14:22.503855-05:00 cosd9 ceph-osd[10672]: /home/abuild/rpmbuild/BUILD/ceph-12.2.7-420-gc0ef85b854/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_do_read(BlueStore::Collection*, BlueStore::OnodeRef, uint64_t, size_t, ceph::bufferlist&, uint32_t)' thread 7fed7de36700 time 2018-10-01 12:14:22.503757 2018-10-01T12:14:22.504167-05:00 cosd9 ceph-osd[10672]: /home/abuild/rpmbuild/BUILD/ceph-12.2.7-420-gc0ef85b854/src/os/bluestore/BlueStore.cc: 6673: FAILED assert(r == 0) 2018-10-01T12:14:22.504305-05:00 cosd9 ceph-osd[10672]: 2018-10-01 12:14:22.503728 7fed7de36700 -1 bluestore(/var/lib/ceph/osd/ceph-117) _do_read bdev-read failed: (61) No data available 2018-10-01T12:14:22.506015-05:00 cosd9 kernel: [950406.153546] sd 0:0:3:0: [sdd] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE 2018-10-01T12:14:22.506024-05:00 cosd9 kernel: [950406.153552] sd 0:0:3:0: [sdd] tag#2 Sense Key : Medium Error [current] [descriptor] 2018-10-01T12:14:22.506025-05:00 cosd9 kernel: [950406.153555] sd 0:0:3:0: [sdd] tag#2 Add. Sense: Unrecovered read error 2018-10-01T12:14:22.506028-05:00 cosd9 kernel: [950406.153560] mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) 2018-10-01T12:14:22.506029-05:00 cosd9 kernel: [950406.153562] sd 0:0:3:0: [sdd] tag#2 CDB: Read(16) 88 00 00 00 00 02 de 46 b7 00 00 00 04 00 00 00 2018-10-01T12:14:22.506030-05:00 cosd9 kernel: [950406.153565] blk_update_request: critical medium error, dev sdd, sector 12319111672 2018-10-01T12:14:22.506031-05:00 cosd9 kernel: [950406.153582] mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) 2018-10-01T12:14:22.506032-05:00 cosd9 kernel: [950406.153588] mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) 2018-10-01T12:14:22.506033-05:00 cosd9 kernel: [950406.153593] mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) 2018-10-01T12:14:22.508227-05:00 cosd9 ceph-osd[10672]: ceph version 12.2.7-420-gc0ef85b854 (c0ef85b8544d85079fa7b1643ff944121eb51513) luminous (stable) 2018-10-01T12:14:22.508410-05:00 cosd9 ceph-osd[10672]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x10e) [0x557e216fbabe] 2018-10-01T12:14:22.508541-05:00 cosd9 ceph-osd[10672]: 2: (BlueStore::_do_read(BlueStore::Collection*, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0 x1023) [0x557e215cde23] 2018-10-01T12:14:22.508655-05:00 cosd9 ceph-osd[10672]: 3: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int )+0x47a) [0x557e215cfdfa] 2018-10-01T12:14:22.508766-05:00 cosd9 ceph-osd[10672]: 4: (ECBackend::be_deep_scrub(hobject_t const&, ScrubMap&, ScrubMapBuilder&, ScrubMap::object&)+0x248) [0x557e2147d978] 2018-10-01T12:14:22.508893-05:00 cosd9 ceph-osd[10672]: 5: (PGBackend::be_scan_list(ScrubMap&, ScrubMapBuilder&)+0x87b) [0x557e2138baab] 2018-10-01T12:14:22.508997-05:00 cosd9 ceph-osd[10672]: 6: (PG::build_scrub_map_chunk(ScrubMap&, ScrubMapBuilder&, hobject_t, hobject_t, bool, ThreadPool::TPHandle&)+0x88) [0x557e2123a518] 2018-10-01T12:14:22.509101-05:00 cosd9 ceph-osd[10672]: 7: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x5d3) [0x557e21266b53] 2018-10-01T12:14:22.509210-05:00 cosd9 ceph-osd[10672]: 8: (PG::scrub(unsigned int, ThreadPool::TPHandle&)+0x2e4) [0x557e21269034] 2018-10-01T12:14:22.509315-05:00 cosd9 ceph-osd[10672]: 9: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfd7) [0x557e211b2f17] 2018-10-01T12:14:22.509420-05:00 cosd9 ceph-osd[10672]: 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x829) [0x557e21701269] 2018-10-01T12:14:22.509524-05:00 cosd9 ceph-osd[10672]: 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x557e217031b0] 2018-10-01T12:14:22.509628-05:00 cosd9 ceph-osd[10672]: 12: (()+0x8724) [0x7fed9c98b724] 2018-10-01T12:14:22.509733-05:00 cosd9 ceph-osd[10672]: 13: (clone()+0x6d) [0x7fed9ba13e8d]
(During deep scrub.)
Updated by Lars Marowsky-Brée over 5 years ago
The code appears identical in master.
For this particular case, especially during scrub, we know our local copy is lost. It'd make sense to flag the PG for rebuild - a hard drive will remember this fault (tracked as "pending sector" in SMART), and relocate the affected sector on the next write.
It should be logged properly, so appropriate action can be taken if these occur with too high a frequency (besides firmware bugs, they really can be caused by cosmic radiation and bad luck), but a large install will likely occasionally experience these and we should recover and restore redundancy.
(Re-writing the affected data is a common mitigation for traditional RAID.)
Is that feasible?
Updated by Sage Weil over 5 years ago
- Status changed from New to 12
- Assignee set to Radoslaw Zarzynski
Updated by Radoslaw Zarzynski over 5 years ago
Updated by Radoslaw Zarzynski over 5 years ago
- Status changed from 12 to Fix Under Review
- Backport set to luminous
Updated by Sage Weil over 5 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport changed from luminous to luminous,mimic
Updated by Nathan Cutler over 5 years ago
- Copied to Backport #37824: mimic: BlueStore: ENODATA not fully handled added
Updated by Nathan Cutler over 5 years ago
- Copied to Backport #37825: luminous: BlueStore: ENODATA not fully handled added
Updated by Nathan Cutler about 5 years ago
- Status changed from Pending Backport to Resolved
Updated by Paul Emmerich about 4 years ago
I'm seeing this on 14.2.7 during deep-scrubbing. I'll investigate and maybe open a new issue.
2020-02-27 23:18:30 XXXXXXXX osd.258 03ea89e700 -1 bluestore(/var/lib/ceph/osd/ceph-258) _do_read bdev-read failed: (61) No data available 2020-02-27 23:18:30 XXXXXXXX osd.258 /tmp/release/Debian/WORKDIR/ceph-14.2.7/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_do_read(BlueStore::Collection*, BlueStore::OnodeRef, uint64_t, size_t, ceph::bufferlist&, uint32_t, uint64_t)' thread 7f03ea89e700 time 2020-02-27 23:18:30.647763 2020-02-27 23:18:30 XXXXXXXX osd.258 /tmp/release/Debian/WORKDIR/ceph-14.2.7/src/os/bluestore/BlueStore.cc: 9152: FAILED ceph_assert(r == 0) 2020-02-27 23:18:30 XXXXXXXX osd.258 03ea89e700 -1 bdev(0x55d366e0a000 /var/lib/ceph/osd/ceph-258/block) read stalled read 0x474fddb0000~80000 (direct) since 300036s, timeout is 5s 2020-02-27 23:18:30 XXXXXXXX osd.258 03ea89e700 -1 bluestore(/var/lib/ceph/osd/ceph-258) _do_read bdev-read failed: (61) No data available 2020-02-27 23:18:30 XXXXXXXX osd.258 ceph version 14.2.7 (fb8e34a687d76cd3bd45c2a0fb445432ab69b4ff) nautilus (stable) 2020-02-27 23:18:30 XXXXXXXX osd.258 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x55d35c06ba86] 2020-02-27 23:18:30 XXXXXXXX osd.258 2: (()+0x518c5e) [0x55d35c06bc5e] 2020-02-27 23:18:30 XXXXXXXX osd.258 3: (BlueStore::_do_read(BlueStore::Collection*, boost::intrusive_ptr, unsigned long, unsigned long, ceph::buffer::v14_2_0::list&, unsigned int, unsigned long)+0x39e7) [0x55d35c5d67f7] 2020-02-27 23:18:30 XXXXXXXX osd.258 4: (BlueStore::read(boost::intrusive_ptr&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::v14_2_0::list&, unsigned int)+0x1b5) [0x55d35c5da595] 2020-02-27 23:18:30 XXXXXXXX osd.258 5: (ReplicatedBackend::be_deep_scrub(hobject_t const&, ScrubMap&, ScrubMapBuilder&, ScrubMap::object&)+0x2cb) [0x55d35c44e89b] 2020-02-27 23:18:30 XXXXXXXX osd.258 6: (PGBackend::be_scan_list(ScrubMap&, ScrubMapBuilder&)+0x6db) [0x55d35c369e3b] 2020-02-27 23:18:30 XXXXXXXX osd.258 7: (PG::build_scrub_map_chunk(ScrubMap&, ScrubMapBuilder&, hobject_t, hobject_t, bool, ThreadPool::TPHandle&)+0x83) [0x55d35c208c53] 2020-02-27 23:18:30 XXXXXXXX osd.258 8: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x17d3) [0x55d35c234ed3] 2020-02-27 23:18:30 XXXXXXXX osd.258 9: (PG::scrub(unsigned int, ThreadPool::TPHandle&)+0x4bb) [0x55d35c23610b] 2020-02-27 23:18:30 XXXXXXXX osd.258 10: (PGScrub::run(OSD*, OSDShard*, boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x1a) [0x55d35c3ee6ea] 2020-02-27 23:18:30 XXXXXXXX osd.258 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x7d7) [0x55d35c167d17] 2020-02-27 23:18:30 XXXXXXXX osd.258 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4) [0x55d35c726864] 2020-02-27 23:18:30 XXXXXXXX osd.258 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55d35c729270] 2020-02-27 23:18:30 XXXXXXXX osd.258 14: (()+0x7fa3) [0x7f04075e4fa3] 2020-02-27 23:18:30 XXXXXXXX osd.258 15: (clone()+0x3f) [0x7f04071944cf] 2020-02-27 23:18:30 XXXXXXXX osd.258 *** Caught signal (Aborted) ** 2020-02-27 23:18:30 XXXXXXXX osd.258 in thread 7f03ea89e700 thread_name:tp_osd_tp