Bug #45076
closedrados: Sharded OpWQ drops suicide_grace after waiting for work
0%
Description
The Sharded OpWQ will opportunistically wait for more work when processing an empty queue. While waiting, the default work queue heartbeat grace and suicide_grace values are modified [0]. The `threadpool_default_timeout` grace is left applied and suicide_grace is disabled.
The original work queue defaults should be re-applied if work is found. This can result in hung operations that do not trigger an OSD suicide recovery.
[0] https://github.com/ceph/ceph/blob/38ae96e1c9a4f8ad3095626c71951a122bdc8fe7/src/osd/OSD.cc#L10451
Updated by Dan Hill about 4 years ago
- Backport set to mimic, nautilus, octopus
- Pull request ID set to 34575
Updated by Dan Hill about 4 years ago
- Status changed from In Progress to Fix Under Review
Updated by Dan Hill almost 4 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Dan Hill almost 4 years ago
- Copied to Backport #45357: octopus: rados: Sharded OpWQ drops suicide_grace after waiting for work added
Updated by Dan Hill almost 4 years ago
- Copied to Backport #45358: mimic: rados: Sharded OpWQ drops suicide_grace after waiting for work added
Updated by Dan Hill almost 4 years ago
- Copied to Backport #45359: nautilus: rados: Sharded OpWQ drops suicide_grace after waiting for work added
Updated by Dan Hill almost 4 years ago
This issue is also present in Luminous, which is EOL now that Octopus has released.
Should I open a tracker/pr for consideration?
Updated by Nathan Cutler about 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".