Project

General

Profile

Actions

Bug #48276

closed

Bug #47751: Hybrid allocator might segfault when fallback allocator is present

OSD Crash with ceph_assert(is_valid_io(off, len))

Added by Bastian Mäuser over 3 years ago. Updated over 3 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello,

Last night one OSD in a 3-node Cluster crashed with the attached Crashreport. I can pretty much rule out Hardware issues, dmesg shows no errors on that NVME. Hardware is a few month old. There was only very few load on the System when it happened.

OSD came back fine. I want to know more about the root cause.

{
"os_version_id": "10",
"utsname_machine": "x86_64",
"entity_name": "osd.29",
"backtrace": [
"(()+0x12730) [0x7fb4df645730]",
"(gsignal()+0x10b) [0x7fb4df1287bb]",
"(abort()+0x121) [0x7fb4df113535]",
"(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x55def3ba0419]",
"(()+0x5115a0) [0x55def3ba05a0]",
"(KernelDevice::aio_write(unsigned long, ceph::buffer::v14_2_0::list&, IOContext*, bool, int)+0x90) [0x55def4214570]",
"(BlueStore::_do_alloc_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>, boost::intrusive_ptr<BlueStore::Onode>, BlueStore::WriteContext*)+0x2237) [0x55def40f4247]",
"(BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v14_2_0::list&, unsigned int)+0x318) [0x55def411cef8]",
"(BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::v14_2_0::list&, unsigned int)+0xda) [0x55def411ddfa]",
"(BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x1671) [0x55def4121481]",
"(BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3c8) [0x55def4122eb8]",
"(non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x54) [0x55def3e908b4]",
"(ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xdf8) [0x55def3f89978]",
"(ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x267) [0x55def3f97ab7]",
"(PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x57) [0x55def3ea8e17]",
"(PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x61f) [0x55def3e5784f]",
"(OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x392) [0x55def3c83f02]",
"(PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x55def3f27e92]",
"(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x7d7) [0x55def3c9fba7]",
"(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4) [0x55def426c0c4]",
"(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55def426ead0]",
"(()+0x7fa3) [0x7fb4df63afa3]",
"(clone()+0x3f) [0x7fb4df1ea4cf]"
],
"assert_line": 864,
"utsname_release": "5.4.65-1-pve",
"assert_file": "/build/ceph-JY24tx/ceph-14.2.11/src/os/bluestore/KernelDevice.cc",
"utsname_sysname": "Linux",
"os_version": "10 (buster)",
"os_id": "10",
"assert_thread_name": "tp_osd_tp",
"assert_msg": "/build/ceph-JY24tx/ceph-14.2.11/src/os/bluestore/KernelDevice.cc: In function 'virtual int KernelDevice::aio_write(uint64_t, ceph::bufferlist&, IOContext*, bool, int)' thread 7fb4c06c2700 time 2020-11-18 03:24:35.419736\n/build/ceph-JY24tx/ceph-14.2.11/src/os/bluestore/KernelDevice.cc: 864: FAILED ceph_assert(is_valid_io(off, len))\n",
"assert_func": "virtual int KernelDevice::aio_write(uint64_t, ceph::bufferlist&, IOContext*, bool, int)",
"ceph_version": "14.2.11",
"os_name": "Debian GNU/Linux 10 (buster)",
"timestamp": "2020-11-18 02:24:35.429967Z",
"process_name": "ceph-osd",
"archived": "2020-11-18 10:14:48.914391",
"utsname_hostname": "px3",
"crash_id": "2020-11-18_02:24:35.429967Z_800333e3-630a-406b-9a0e-c7c345336087",
"assert_condition": "is_valid_io(off, len)",
"utsname_version": "#1 SMP PVE 5.4.65-1 (Mon, 21 Sep 2020 15:40:22 +0200)"
}


Files

cephlog-px2-osd17.gz (347 KB) cephlog-px2-osd17.gz Ceph OSD Log Bastian Mäuser, 11/24/2020 12:39 PM
osd147_fsck.log (18.1 KB) osd147_fsck.log Jonas Jelten, 12/16/2020 05:37 PM

Related issues 1 (0 open1 closed)

Related to bluestore - Bug #46800: Octopus OSD died and fails to start with FAILED ceph_assert(is_valid_io(off, len))Duplicate

Actions
Actions

Also available in: Atom PDF