Project

General

Profile

Bug #24664

osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_metrics

Added by Patrick Donnelly almost 6 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
luminous mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Crash: "2018-06-23 05:39:37.575049 mon.a (mon.0) 198 : cluster [WRN] Health check failed: Degraded data redundancy: 423/13136 objects degraded (3.220%), 1 pg degraded (PG_DEGRADED)" in cluster log
ceph version 14.0.0-787-g8f48616 (8f4861641855b60e687113ea0c79b428042ba302) nautilus (dev)
 1: (()+0x11390) [0x7f3d8b26b390]
 2: (gsignal()+0x38) [0x7f3d8a79e428]
 3: (abort()+0x16a) [0x7f3d8a7a002a]
 4: (()+0x2dbd7) [0x7f3d8a796bd7]
 5: (()+0x2dc82) [0x7f3d8a796c82]
 6: (OpTracker::unregister_inflight_op(TrackedOp*)+0x42d) [0x557eecb2335d]
 7: (TrackedOp::put()+0x33e) [0x557eec8e46de]
 8: (OSD::get_health_metrics()+0x34b) [0x557eec8c7d9b]
 9: (OSD::tick_without_osd_lock()+0x131) [0x557eec8c8441]
 10: (Context::complete(int)+0x9) [0x557eec8d9459]
 11: (SafeTimer::timer_thread()+0x18b) [0x7f3d8cbd799b]
 12: (SafeTimerThread::entry()+0xd) [0x7f3d8cbd8f5d]
 13: (()+0x76ba) [0x7f3d8b2616ba]
 14: (clone()+0x6d) [0x7f3d8a87041d]
1 jobs: ['2692479']
suites: ['clusters/fixed-2-ucephfs.yaml', 'frag_enable.yaml', 'fs/basic_workload/{begin.yaml', 'inline/yes.yaml', 'mount/fuse.yaml', 'objectstore-ec/bluestore-comp-ec-root.yaml', 'omap_limit/10.yaml', 'overrides/{debug.yaml', 'tasks/cfuse_workunit_suites_ffsb.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']

From: http://pulpito.ceph.com/pdonnell-2018-06-23_02:11:08-fs-wip-pdonnell-testing-20180622.235254-testing-basic-smithi/2692479/


Related issues

Related to RADOS - Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value))' failed. Resolved 05/07/2018
Duplicated by RADOS - Bug #25076: MON crash when upgrading luminous v12.2.7 -> mimic v13.2.0 during ceph-fuse task Duplicate 07/24/2018
Duplicated by RADOS - Bug #39336: "*** Caught signal (Bus error) **" in upgrade:luminous-x-mimic Duplicate 04/16/2019
Copied to RADOS - Backport #24888: luminous: osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_metrics Rejected
Copied to RADOS - Backport #24889: mimic: osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_metrics Resolved

History

#1 Updated by Josh Durgin almost 6 years ago

  • Related to Bug #23352: osd: segfaults under normal operation added

#2 Updated by Brad Hubbard over 5 years ago

  • Related to deleted (Bug #23352: osd: segfaults under normal operation)

#3 Updated by Brad Hubbard over 5 years ago

  • Status changed from New to In Progress

#4 Updated by Brad Hubbard over 5 years ago

  • Related to Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value))' failed. added

#5 Updated by Brad Hubbard over 5 years ago

  • Backport set to luminous

#6 Updated by Brad Hubbard over 5 years ago

  • Backport changed from luminous to luminous mimic

#7 Updated by Sage Weil over 5 years ago

  • Status changed from In Progress to Pending Backport

#8 Updated by Nathan Cutler over 5 years ago

  • Copied to Backport #24888: luminous: osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_metrics added

#9 Updated by Nathan Cutler over 5 years ago

  • Copied to Backport #24889: mimic: osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_metrics added

#10 Updated by Radoslaw Zarzynski over 5 years ago

  • Duplicated by Bug #25076: MON crash when upgrading luminous v12.2.7 -> mimic v13.2.0 during ceph-fuse task added

#11 Updated by Nathan Cutler over 5 years ago

Need help with the luminous backport, which is needed to fix a failure in upgrade/luminous-x.

#12 Updated by Greg Farnum almost 5 years ago

  • Duplicated by Bug #39336: "*** Caught signal (Bus error) **" in upgrade:luminous-x-mimic added

#13 Updated by Greg Farnum almost 5 years ago

  • Related to Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup added

#14 Updated by Brad Hubbard over 4 years ago

  • Related to deleted (Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup)

#15 Updated by Nathan Cutler about 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF