Project

General

Profile

Actions

Bug #53327

closed

osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_shutdown_notify_mon by default

Added by Sage Weil over 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Urgent
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific,quincy,octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

- it should send MOSDMarkMeDead not MarkMeDown
- we must confirm that we set a flag (preparing to stop?) that makes the OSD drop all messages coming in, so that we can be treated as really dead.


Related issues 4 (0 open4 closed)

Has duplicate RADOS - Bug #53328: osd_fast_shutdown_notify_mon option should be true by defaultDuplicate

Actions
Copied to RADOS - Backport #55073: pacific: osd: osd_fast_shutdown_notify_mon not quite rightResolvedNitzan MordechaiActions
Copied to RADOS - Backport #55074: octopus: osd: osd_fast_shutdown_notify_mon not quite rightResolvedLaura FloresActions
Copied to RADOS - Backport #55075: quincy: osd: osd_fast_shutdown_notify_mon not quite rightResolvedActions
Actions #1

Updated by Manuel Lausch over 2 years ago

Hi Sage,

is there some update?

Actions #2

Updated by Neha Ojha about 2 years ago

  • Assignee changed from Sage Weil to Nitzan Mordechai
Actions #3

Updated by Neha Ojha about 2 years ago

  • Status changed from In Progress to New
  • Backport changed from pacific to pacific,quincy
Actions #4

Updated by Nitzan Mordechai about 2 years ago

  • Status changed from New to In Progress
Actions #5

Updated by Nitzan Mordechai about 2 years ago

  • Status changed from In Progress to Fix Under Review
Actions #6

Updated by Manuel Lausch about 2 years ago

Hi Nitzan,
I checked your patch on the current pacific branch.

unfortunately I still get slow ops (slow >= 5 seconds blocked IO) after stopping all OSDs from one host. (systemctl stop ceph-osd.target)
In the OSD log I see the message, that the osd sends the dead notification to the mon, but in the ceph.log I get only for some of the OSDs the "marked itself dead" messages. The down messages are there for all affected OSDs..

I hope you can have a further look into this.

Thanks
Manuel

Actions #7

Updated by Neha Ojha about 2 years ago

  • Pull request ID set to 44807
Actions #8

Updated by Laura Flores about 2 years ago

  • Backport changed from pacific,quincy to pacific,quincy,octopus
Actions #9

Updated by Laura Flores about 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #10

Updated by Laura Flores about 2 years ago

  • Copied to Backport #55073: pacific: osd: osd_fast_shutdown_notify_mon not quite right added
Actions #11

Updated by Laura Flores about 2 years ago

  • Copied to Backport #55074: octopus: osd: osd_fast_shutdown_notify_mon not quite right added
Actions #12

Updated by Laura Flores about 2 years ago

  • Copied to Backport #55075: quincy: osd: osd_fast_shutdown_notify_mon not quite right added
Actions #13

Updated by Neha Ojha almost 2 years ago

  • Subject changed from osd: osd_fast_shutdown_notify_mon not quite right to osd: osd_fast_shutdown_notify_mon not quite right and enable osd_fast_shutdown_notify_mon by default
Actions #14

Updated by Neha Ojha almost 2 years ago

  • Has duplicate Bug #53328: osd_fast_shutdown_notify_mon option should be true by default added
Actions #15

Updated by jianwei zhang almost 2 years ago

octopus: osd/OSD: osd_fast_shutdown_notify_mon not quite right #45655
https://github.com/ceph/ceph/pull/45655/commits
osd/OSD: osd_fast_shutdown_notify_mon not quite right
osd: make osd_fast_shutdown_notify_mon option true by default

Are there any problems with the backport of these two patch to octopus?
Why not merge into octopus yet?

Thanks!

Actions #17

Updated by jianwei zhang almost 2 years ago

Manuel Lausch wrote:

Hi Nitzan,
I checked your patch on the current pacific branch.

unfortunately I still get slow ops (slow >= 5 seconds blocked IO) after stopping all OSDs from one host. (systemctl stop ceph-osd.target)
In the OSD log I see the message, that the osd sends the dead notification to the mon, but in the ceph.log I get only for some of the OSDs the "marked itself dead" messages. The down messages are there for all affected OSDs..

I hope you can have a further look into this.

Thanks
Manuel

Please follow this commit of mine :
https://github.com/ceph/ceph/pull/46273
https://tracker.ceph.com/issues/55665

Actions #18

Updated by Neha Ojha almost 2 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF