Project

General

Profile

Bug #37775

some pg_created messages not sent to mon

Added by Sage Weil almost 2 years ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature:

Description

mon doesn't get pg_created for two pgs. CREATING flag is never removed, job fails with a final scrub timeout

/a/sage-2018-12-29_16:59:10-rados-master-distro-basic-smithi/3405637

osd sends them to mon, but a msgr reconnect drops them. there is no retry.

2018-12-29 17:47:31.080 7f949c7a8700  1 -- 172.21.15.110:6812/10945 --> 172.21.15.110:6790/0 -- osd_pg_created(1.0) v1 -- 0x55ecc3c00e00 con 0

Related issues

Duplicated by RADOS - Bug #37752: pool stuck with 'creating' flag set Duplicate 12/24/2018

History

#1 Updated by Sage Weil almost 2 years ago

how about,
- if pool CREATING flag is sent, we queue a 'created' message when the pg peers
- osd tracks pending created messages, resends on mon reset
- prune pgs from the list when the pool flag is cleared

this will easily mean resending some of these if it takes a while for the pool's pgs to be created, but the messages are cheap and harmless.

#2 Updated by Sage Weil almost 2 years ago

  • Status changed from 12 to Fix Under Review

#3 Updated by Kefu Chai almost 2 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to luminous, mimic
  • Pull request ID set to 25731

#4 Updated by Greg Farnum almost 2 years ago

  • Duplicated by Bug #37752: pool stuck with 'creating' flag set added

#7 Updated by Neha Ojha over 1 year ago

/a/yuriw-2019-04-04_00:00:53-rados-luminous-distro-basic-smithi/3806121/

#8 Updated by Greg Farnum about 1 year ago

  • Assignee set to Neha Ojha

#9 Updated by Neha Ojha about 1 year ago

The original bug is about a pool level flag - "FLAG_CREATING", which was introduced in 0e526b467af2699e389e7f28a6d709f216e4533e. This flag is not present in mimic and luminous. I am not sure the entire fix in https://github.com/ceph/ceph/pull/25731 is needed in those branches and the commit message "The OSD has to reliably deliver a pg_created message to the mon in order for the mon to clear the pool's CREATING flag." also indicates that.

https://tracker.ceph.com/issues/36498 and /a/yuriw-2019-04-04_00:00:53-rados-luminous-distro-basic-smithi/3806121/ are about PGs stuck in the creating state, which has got to do with the PG_STATE_CREATING flag, need to figure out if any parts of https://github.com/ceph/ceph/pull/25731 are fixing that too.

#10 Updated by Neha Ojha about 1 year ago

This patch does not make sense for mimic and luminous.
@Nathan can we please resolve this issue and close the corresponding backport trackers.

#11 Updated by Nathan Cutler about 1 year ago

  • Status changed from Pending Backport to Resolved
  • Backport deleted (luminous, mimic)

Also available in: Atom PDF