Project

General

Profile

Actions

Bug #45076

closed

rados: Sharded OpWQ drops suicide_grace after waiting for work

Added by Dan Hill about 4 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
mimic, nautilus, octopus
Regression:
No
Severity:
3 - minor
Reviewed:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The Sharded OpWQ will opportunistically wait for more work when processing an empty queue. While waiting, the default work queue heartbeat grace and suicide_grace values are modified [0]. The `threadpool_default_timeout` grace is left applied and suicide_grace is disabled.

The original work queue defaults should be re-applied if work is found. This can result in hung operations that do not trigger an OSD suicide recovery.

[0] https://github.com/ceph/ceph/blob/38ae96e1c9a4f8ad3095626c71951a122bdc8fe7/src/osd/OSD.cc#L10451


Related issues 3 (0 open3 closed)

Copied to RADOS - Backport #45357: octopus: rados: Sharded OpWQ drops suicide_grace after waiting for workResolvedDan HillActions
Copied to RADOS - Backport #45358: mimic: rados: Sharded OpWQ drops suicide_grace after waiting for workRejectedDan HillActions
Copied to RADOS - Backport #45359: nautilus: rados: Sharded OpWQ drops suicide_grace after waiting for workResolvedDan HillActions
Actions #1

Updated by Josh Durgin about 4 years ago

  • Priority changed from Normal to High
Actions #2

Updated by Dan Hill about 4 years ago

  • Backport set to mimic, nautilus, octopus
  • Pull request ID set to 34575
Actions #3

Updated by Dan Hill about 4 years ago

  • Status changed from In Progress to Fix Under Review
Actions #4

Updated by Dan Hill almost 4 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #5

Updated by Dan Hill almost 4 years ago

  • Copied to Backport #45357: octopus: rados: Sharded OpWQ drops suicide_grace after waiting for work added
Actions #6

Updated by Dan Hill almost 4 years ago

  • Copied to Backport #45358: mimic: rados: Sharded OpWQ drops suicide_grace after waiting for work added
Actions #7

Updated by Dan Hill almost 4 years ago

  • Copied to Backport #45359: nautilus: rados: Sharded OpWQ drops suicide_grace after waiting for work added
Actions #8

Updated by Dan Hill almost 4 years ago

This issue is also present in Luminous, which is EOL now that Octopus has released.

Should I open a tracker/pr for consideration?

Actions #9

Updated by Nathan Cutler about 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF