Project

General

Profile

Actions

Bug #24664

closed

osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_metrics

Added by Patrick Donnelly almost 6 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
luminous mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Crash: "2018-06-23 05:39:37.575049 mon.a (mon.0) 198 : cluster [WRN] Health check failed: Degraded data redundancy: 423/13136 objects degraded (3.220%), 1 pg degraded (PG_DEGRADED)" in cluster log
ceph version 14.0.0-787-g8f48616 (8f4861641855b60e687113ea0c79b428042ba302) nautilus (dev)
 1: (()+0x11390) [0x7f3d8b26b390]
 2: (gsignal()+0x38) [0x7f3d8a79e428]
 3: (abort()+0x16a) [0x7f3d8a7a002a]
 4: (()+0x2dbd7) [0x7f3d8a796bd7]
 5: (()+0x2dc82) [0x7f3d8a796c82]
 6: (OpTracker::unregister_inflight_op(TrackedOp*)+0x42d) [0x557eecb2335d]
 7: (TrackedOp::put()+0x33e) [0x557eec8e46de]
 8: (OSD::get_health_metrics()+0x34b) [0x557eec8c7d9b]
 9: (OSD::tick_without_osd_lock()+0x131) [0x557eec8c8441]
 10: (Context::complete(int)+0x9) [0x557eec8d9459]
 11: (SafeTimer::timer_thread()+0x18b) [0x7f3d8cbd799b]
 12: (SafeTimerThread::entry()+0xd) [0x7f3d8cbd8f5d]
 13: (()+0x76ba) [0x7f3d8b2616ba]
 14: (clone()+0x6d) [0x7f3d8a87041d]
1 jobs: ['2692479']
suites: ['clusters/fixed-2-ucephfs.yaml', 'frag_enable.yaml', 'fs/basic_workload/{begin.yaml', 'inline/yes.yaml', 'mount/fuse.yaml', 'objectstore-ec/bluestore-comp-ec-root.yaml', 'omap_limit/10.yaml', 'overrides/{debug.yaml', 'tasks/cfuse_workunit_suites_ffsb.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']

From: http://pulpito.ceph.com/pdonnell-2018-06-23_02:11:08-fs-wip-pdonnell-testing-20180622.235254-testing-basic-smithi/2692479/


Related issues 5 (0 open5 closed)

Related to RADOS - Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value))' failed.ResolvedRadoslaw Zarzynski05/07/2018

Actions
Has duplicate RADOS - Bug #25076: MON crash when upgrading luminous v12.2.7 -> mimic v13.2.0 during ceph-fuse taskDuplicateRadoslaw Zarzynski07/24/2018

Actions
Has duplicate RADOS - Bug #39336: "*** Caught signal (Bus error) **" in upgrade:luminous-x-mimicDuplicate04/16/2019

Actions
Copied to RADOS - Backport #24888: luminous: osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_metricsRejectedRadoslaw ZarzynskiActions
Copied to RADOS - Backport #24889: mimic: osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_metricsResolvedNathan CutlerActions
Actions #1

Updated by Josh Durgin almost 6 years ago

  • Related to Bug #23352: osd: segfaults under normal operation added
Actions #2

Updated by Brad Hubbard almost 6 years ago

  • Related to deleted (Bug #23352: osd: segfaults under normal operation)
Actions #3

Updated by Brad Hubbard almost 6 years ago

  • Status changed from New to In Progress
Actions #4

Updated by Brad Hubbard almost 6 years ago

  • Related to Bug #24037: osd: Assertion `!node_algorithms::inited(this->priv_value_traits().to_node_ptr(value))' failed. added
Actions #5

Updated by Brad Hubbard almost 6 years ago

  • Backport set to luminous
Actions #6

Updated by Brad Hubbard almost 6 years ago

  • Backport changed from luminous to luminous mimic
Actions #7

Updated by Sage Weil almost 6 years ago

  • Status changed from In Progress to Pending Backport
Actions #8

Updated by Nathan Cutler almost 6 years ago

  • Copied to Backport #24888: luminous: osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_metrics added
Actions #9

Updated by Nathan Cutler almost 6 years ago

  • Copied to Backport #24889: mimic: osd: crash in OpTracker::unregister_inflight_op via OSD::get_health_metrics added
Actions #10

Updated by Radoslaw Zarzynski almost 6 years ago

  • Has duplicate Bug #25076: MON crash when upgrading luminous v12.2.7 -> mimic v13.2.0 during ceph-fuse task added
Actions #11

Updated by Nathan Cutler almost 6 years ago

Need help with the luminous backport, which is needed to fix a failure in upgrade/luminous-x.

Actions #12

Updated by Greg Farnum about 5 years ago

  • Has duplicate Bug #39336: "*** Caught signal (Bus error) **" in upgrade:luminous-x-mimic added
Actions #13

Updated by Greg Farnum about 5 years ago

  • Related to Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup added
Actions #14

Updated by Brad Hubbard over 4 years ago

  • Related to deleted (Bug #38345: mon: segv in MonOpRequest::~MonOpRequest OpHistory::cleanup)
Actions #15

Updated by Nathan Cutler about 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF