Project

General

Profile

Actions

Bug #46443

closed

ceph_osd crash in _committed_osd_maps when failed to encode first inc map

Added by Markus Binz almost 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
nautilus, octopus
Regression:
Yes
Severity:
1 - critical
Reviewed:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We upgraded a mimic cluster to v14.2.10, everything was running and ok.
I triggerd an monmap change with the command,
ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false

which resulted in ceph_osd processes crashing (30 out of 50)

later on, it seems to happen on any monmap change. (add osd...)

downgrade to 14.2.9 solved the problem for us.

We have 196 crash reports, i attached just one.

It's the same for 16.04 or 18.04.

{
"crash_id": "2020-06-30_21:27:08.639797Z_a6cf1fdd-5cd6-4355-86d3-bbd39a4d8164",
"timestamp": "2020-06-30 21:27:08.639797Z",
"process_name": "ceph-osd",
"entity_name": "osd.30",
"ceph_version": "14.2.10",
"utsname_hostname": "bigstore06.solnet.ch",
"utsname_sysname": "Linux",
"utsname_release": "4.4.0-101-generic",
"utsname_version": "#124-Ubuntu SMP Fri Nov 10 18:29:59 UTC 2017",
"utsname_machine": "x86_64",
"os_name": "Ubuntu",
"os_id": "ubuntu",
"os_version_id": "16.04",
"os_version": "16.04.6 LTS (Xenial Xerus)",
"backtrace": [
"(()+0x11390) [0x7fcb74ef4390]",
"/usr/bin/ceph-osd() [0x87fd12]",
"(OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*)+0x5e1) [0x8f0f91]",
"(C_OnMapCommit::finish(int)+0x17) [0x946897]",
"(Context::complete(int)+0x9) [0x8fbfb9]",
"(Finisher::finisher_thread_entry()+0x15e) [0xeb2b8e]",
"(()+0x76ba) [0x7fcb74eea6ba]",
"(clone()+0x6d) [0x7fcb744f141d]"
]
}


Files

crash.a6cf1fdd-5cd6-4355-86d3-bbd39a4d8164.tar.gz (136 KB) crash.a6cf1fdd-5cd6-4355-86d3-bbd39a4d8164.tar.gz crash report Markus Binz, 07/10/2020 07:40 AM
log.tar.gz (362 KB) log.tar.gz Xiaoxi Chen, 07/27/2020 10:27 AM

Related issues 3 (0 open3 closed)

Related to RADOS - Bug #43903: osd segv in ceph::buffer::v14_2_0::ptr::release (PGTempMap::decode)ResolvedRadoslaw Zarzynski

Actions
Copied to RADOS - Backport #46741: nautilus: ceph_osd crash in _committed_osd_maps when failed to encode first inc mapResolvedNathan CutlerActions
Copied to RADOS - Backport #46742: octopus: ceph_osd crash in _committed_osd_maps when failed to encode first inc mapResolvedNathan CutlerActions
Actions

Also available in: Atom PDF