Bug #37775
closedsome pg_created messages not sent to mon
0%
Description
mon doesn't get pg_created for two pgs. CREATING flag is never removed, job fails with a final scrub timeout
/a/sage-2018-12-29_16:59:10-rados-master-distro-basic-smithi/3405637
osd sends them to mon, but a msgr reconnect drops them. there is no retry.
2018-12-29 17:47:31.080 7f949c7a8700 1 -- 172.21.15.110:6812/10945 --> 172.21.15.110:6790/0 -- osd_pg_created(1.0) v1 -- 0x55ecc3c00e00 con 0
Updated by Sage Weil over 5 years ago
how about,
- if pool CREATING flag is sent, we queue a 'created' message when the pg peers
- osd tracks pending created messages, resends on mon reset
- prune pgs from the list when the pool flag is cleared
this will easily mean resending some of these if it takes a while for the pool's pgs to be created, but the messages are cheap and harmless.
Updated by Sage Weil over 5 years ago
- Status changed from 12 to Fix Under Review
Updated by Kefu Chai over 5 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to luminous, mimic
- Pull request ID set to 25731
Updated by Greg Farnum over 5 years ago
- Has duplicate Bug #37752: pool stuck with 'creating' flag set added
Updated by Neha Ojha about 5 years ago
/a/yuriw-2019-04-04_00:00:53-rados-luminous-distro-basic-smithi/3806121/
Updated by Neha Ojha over 4 years ago
The original bug is about a pool level flag - "FLAG_CREATING", which was introduced in 0e526b467af2699e389e7f28a6d709f216e4533e. This flag is not present in mimic and luminous. I am not sure the entire fix in https://github.com/ceph/ceph/pull/25731 is needed in those branches and the commit message "The OSD has to reliably deliver a pg_created message to the mon in order for the mon to clear the pool's CREATING flag." also indicates that.
https://tracker.ceph.com/issues/36498 and /a/yuriw-2019-04-04_00:00:53-rados-luminous-distro-basic-smithi/3806121/ are about PGs stuck in the creating state, which has got to do with the PG_STATE_CREATING flag, need to figure out if any parts of https://github.com/ceph/ceph/pull/25731 are fixing that too.
Updated by Neha Ojha over 4 years ago
This patch does not make sense for mimic and luminous.
@Nathan Weinberg can we please resolve this issue and close the corresponding backport trackers.
Updated by Nathan Cutler over 4 years ago
- Status changed from Pending Backport to Resolved
- Backport deleted (
luminous, mimic)