Project

General

Profile

Bug #36455

Updated by Joao Eduardo Luis over 1 year ago

We have a drive model experiencing weak writes, which manifest themselves as failed reads later; the drive notices the low-level checksum doesn't validate and returns ENODATA.

This is 12.2.7; this was supposedly(?) fixed in http://tracker.ceph.com/issues/23333 and backported for 12.2.6: http://tracker.ceph.com/issues/23672 - is_expected_ioerr() is supposed to map this to EIO.

However, we the customer still observe observes an assert:

<pre>
2018-10-01T12:14:22.503855-05:00 cosd9 ceph-osd[10672]: /home/abuild/rpmbuild/BUILD/ceph-12.2.7-420-gc0ef85b854/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_do_read(BlueStore::Collection*, BlueStore::OnodeRef, uint64_t, size_t, ceph::bufferlist&, uint32_t)' thread 7fed7de36700 time 2018-10-01 12:14:22.503757
2018-10-01T12:14:22.504167-05:00 cosd9 ceph-osd[10672]: /home/abuild/rpmbuild/BUILD/ceph-12.2.7-420-gc0ef85b854/src/os/bluestore/BlueStore.cc: 6673: FAILED assert(r == 0)
2018-10-01T12:14:22.504305-05:00 cosd9 ceph-osd[10672]: 2018-10-01 12:14:22.503728 7fed7de36700 -1 bluestore(/var/lib/ceph/osd/ceph-117) _do_read bdev-read failed: (61) No data available
2018-10-01T12:14:22.506015-05:00 cosd9 kernel: [950406.153546] sd 0:0:3:0: [sdd] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
2018-10-01T12:14:22.506024-05:00 cosd9 kernel: [950406.153552] sd 0:0:3:0: [sdd] tag#2 Sense Key : Medium Error [current] [descriptor]
2018-10-01T12:14:22.506025-05:00 cosd9 kernel: [950406.153555] sd 0:0:3:0: [sdd] tag#2 Add. Sense: Unrecovered read error
2018-10-01T12:14:22.506028-05:00 cosd9 kernel: [950406.153560] mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
2018-10-01T12:14:22.506029-05:00 cosd9 kernel: [950406.153562] sd 0:0:3:0: [sdd] tag#2 CDB: Read(16) 88 00 00 00 00 02 de 46 b7 00 00 00 04 00 00 00
2018-10-01T12:14:22.506030-05:00 cosd9 kernel: [950406.153565] blk_update_request: critical medium error, dev sdd, sector 12319111672
2018-10-01T12:14:22.506031-05:00 cosd9 kernel: [950406.153582] mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
2018-10-01T12:14:22.506032-05:00 cosd9 kernel: [950406.153588] mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
2018-10-01T12:14:22.506033-05:00 cosd9 kernel: [950406.153593] mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
2018-10-01T12:14:22.508227-05:00 cosd9 ceph-osd[10672]: ceph version 12.2.7-420-gc0ef85b854 (c0ef85b8544d85079fa7b1643ff944121eb51513) luminous (stable)
2018-10-01T12:14:22.508410-05:00 cosd9 ceph-osd[10672]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x10e) [0x557e216fbabe]
2018-10-01T12:14:22.508541-05:00 cosd9 ceph-osd[10672]: 2: (BlueStore::_do_read(BlueStore::Collection*, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0
x1023) [0x557e215cde23]
2018-10-01T12:14:22.508655-05:00 cosd9 ceph-osd[10672]: 3: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int
)+0x47a) [0x557e215cfdfa]
2018-10-01T12:14:22.508766-05:00 cosd9 ceph-osd[10672]: 4: (ECBackend::be_deep_scrub(hobject_t const&, ScrubMap&, ScrubMapBuilder&, ScrubMap::object&)+0x248) [0x557e2147d978]
2018-10-01T12:14:22.508893-05:00 cosd9 ceph-osd[10672]: 5: (PGBackend::be_scan_list(ScrubMap&, ScrubMapBuilder&)+0x87b) [0x557e2138baab]
2018-10-01T12:14:22.508997-05:00 cosd9 ceph-osd[10672]: 6: (PG::build_scrub_map_chunk(ScrubMap&, ScrubMapBuilder&, hobject_t, hobject_t, bool, ThreadPool::TPHandle&)+0x88) [0x557e2123a518]
2018-10-01T12:14:22.509101-05:00 cosd9 ceph-osd[10672]: 7: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x5d3) [0x557e21266b53]
2018-10-01T12:14:22.509210-05:00 cosd9 ceph-osd[10672]: 8: (PG::scrub(unsigned int, ThreadPool::TPHandle&)+0x2e4) [0x557e21269034]
2018-10-01T12:14:22.509315-05:00 cosd9 ceph-osd[10672]: 9: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfd7) [0x557e211b2f17]
2018-10-01T12:14:22.509420-05:00 cosd9 ceph-osd[10672]: 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x829) [0x557e21701269]
2018-10-01T12:14:22.509524-05:00 cosd9 ceph-osd[10672]: 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x557e217031b0]
2018-10-01T12:14:22.509628-05:00 cosd9 ceph-osd[10672]: 12: (()+0x8724) [0x7fed9c98b724]
2018-10-01T12:14:22.509733-05:00 cosd9 ceph-osd[10672]: 13: (clone()+0x6d) [0x7fed9ba13e8d]
</pre>


(During deep scrub.)

Back