Project

General

Profile

Bug #54615

virtual int KernelDevice::read(uint64_t, uint64_t, ceph::bufferlist*, IOContext*, bool): assert((uint64_t)r == len)

Added by jing zhang about 2 years ago. Updated almost 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-disk
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

debug     -4> 2022-03-16T03:34:05.163+0000 7fad822f6080  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1647401645168984, "job": 1, "event": "recovery_started", "log_files": [3611, 3624, 3627, 3632
]}
debug     -3> 2022-03-16T03:34:05.163+0000 7fad822f6080  4 rocksdb: [db_impl/db_impl_open.cc:760] Recovering log #3611 mode 2
debug     -2> 2022-03-16T03:34:15.303+0000 7fad822f6080 -1 bdev(0x55b913340800 /var/lib/ceph/osd/ceph-83/block) read stalled read  0xc7bfe60000~a0000 (buffered) since 6572456.911950s, timeou
t is 5.000000s
debug     -1> 2022-03-16T03:34:15.311+0000 7fad822f6080 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/g
igantic/release/16.2.6/rpm/el8/BUILD/ceph-16.2.6/src/blk/kernel/KernelDevice.cc: In function 'virtual int KernelDevice::read(uint64_t, uint64_t, ceph::bufferlist*, IOContext*, bool)' thread
7fad822f6080 time 2022-03-16T03:34:15.309430+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.6/rpm/el8/BUILD/ceph-16.2.6/src/blk/kernel/KernelDevice.cc: 1066: FAILED ceph_assert((uint64_t)r == len)

 ceph version 16.2.6 (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x55b90806254c]
 2: ceph-osd(+0x56a766) [0x55b908062766]
 3: (KernelDevice::read(unsigned long, unsigned long, ceph::buffer::v15_2_0::list*, IOContext*, bool)+0x8a1) [0x55b908bac3a1]
 4: (BlueFS::_read(BlueFS::FileReader*, unsigned long, unsigned long, ceph::buffer::v15_2_0::list*, char*)+0x79c) [0x55b90875494c]
 5: (BlueRocksSequentialFile::Read(unsigned long, rocksdb::Slice*, char*)+0x2e) [0x55b908782b7e]
 6: (rocksdb::LegacySequentialFileWrapper::Read(unsigned long, rocksdb::IOOptions const&, rocksdb::Slice*, char*, rocksdb::IODebugContext*)+0x25) [0x55b908c15705]
 7: (rocksdb::SequentialFileReader::Read(unsigned long, rocksdb::Slice*, char*)+0x77) [0x55b908d23a77]
 8: (rocksdb::log::Reader::ReadMore(unsigned long*, int*)+0xae) [0x55b908ca694e]
 9: (rocksdb::log::Reader::ReadPhysicalRecord(rocksdb::Slice*, unsigned long*)+0x66) [0x55b908ca6a36]
 10: (rocksdb::log::Reader::ReadRecord(rocksdb::Slice*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, rocksdb::WALRecoveryMode)+0xb5) [0x55b908ca6c55]
 11: (rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long*, bool, bool*)+0x1250) [0x55b908c5a170]
 12: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool, unsigned long*)+0xae8) [0x55b908c5bea8
]
 13: (rocksdb::DB::OpenForReadOnly(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescrip
tor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool)+0x1a0) [0x55b90
8c66e30]
 14: (RocksDBStore::do_open(std::ostream&, bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xb98) [0x55b908bce998]
 15: (BlueStore::_open_db(bool, bool, bool)+0x214) [0x55b908653da4]
 16: (BlueStore::_open_db_and_around(bool, bool)+0x273) [0x55b9086be5d3]
 17: (BlueStore::_mount()+0x204) [0x55b9086c1514]
 18: (OSD::init()+0x380) [0x55b908196a10]
 19: main()
 20: __libc_start_main()
 21: _start()

History

#1 Updated by jing zhang about 2 years ago

I deploy it by ceph-rook, and the other ~100 osds is normal but one osd is keeping CrashLoopBackOff with the error stack as the description.

#2 Updated by Igor Fedotov about 2 years ago

Looks like rather a hardware issue:
debug -2> 2022-03-16T03:34:15.303+0000 7fad822f6080 -1 bdev(0x55b913340800 /var/lib/ceph/osd/ceph-83/block) read stalled read 0xc7bfe60000~a0000 (buffered) since 6572456.911950s, timeou
t is 5.000000s

Curious if the failing extent is the same on every restart: 0xc7bfe60000~a0000?
If so you can try to check with dd if raw reading works well. Inspecting dmesg output for disk errors might make sense as well...

#3 Updated by jing zhang almost 2 years ago

yes, the error is the same on every restart.

I found the dmesg output some error. I will try to fixit. super big thank you Igor!

[Wed Mar 23 02:03:13 2022] sd 0:0:10:0: [sdk] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Wed Mar 23 02:03:13 2022] sd 0:0:10:0: [sdk] tag#0 Sense Key : Medium Error [current] [descriptor]
[Wed Mar 23 02:03:13 2022] sd 0:0:10:0: [sdk] tag#0 Add. Sense: Unrecovered read error
[Wed Mar 23 02:03:13 2022] sd 0:0:10:0: [sdk] tag#0 CDB: Read(16) 88 00 00 00 00 00 63 df fc 00 00 00 01 00 00 00
[Wed Mar 23 02:03:13 2022] print_req_error: critical medium error, dev sdk, sector 1675623432
[Wed Mar 23 02:03:18 2022] sd 0:0:10:0: [sdk] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Wed Mar 23 02:03:18 2022] sd 0:0:10:0: [sdk] tag#0 Sense Key : Medium Error [current] [descriptor]
[Wed Mar 23 02:03:18 2022] sd 0:0:10:0: [sdk] tag#0 Add. Sense: Unrecovered read error
[Wed Mar 23 02:03:18 2022] sd 0:0:10:0: [sdk] tag#0 CDB: Read(16) 88 00 00 00 00 00 63 df fc 08 00 00 00 08 00 00
[Wed Mar 23 02:03:18 2022] print_req_error: critical medium error, dev sdk, sector 1675623432
[Wed Mar 23 02:03:18 2022] Buffer I/O error on dev dm-13, logical block 209452673, async page read

#4 Updated by Igor Fedotov almost 2 years ago

@jing zhang - mind me closing the ticket?

#5 Updated by Igor Fedotov almost 2 years ago

  • Subject changed from crash: virtual int KernelDevice::read(uint64_t, uint64_t, ceph::bufferlist*, IOContext*, bool): assert((uint64_t)r == len) to virtual int KernelDevice::read(uint64_t, uint64_t, ceph::bufferlist*, IOContext*, bool): assert((uint64_t)r == len)

#6 Updated by Adam Kupczyk almost 2 years ago

  • Status changed from New to Closed

Also available in: Atom PDF