some pg_created messages not sent to mon
mon doesn't get pg_created for two pgs. CREATING flag is never removed, job fails with a final scrub timeout
osd sends them to mon, but a msgr reconnect drops them. there is no retry.
2018-12-29 17:47:31.080 7f949c7a8700 1 -- 172.21.15.110:6812/10945 --> 172.21.15.110:6790/0 -- osd_pg_created(1.0) v1 -- 0x55ecc3c00e00 con 0
#1 Updated by Sage Weil almost 2 years ago
- if pool CREATING flag is sent, we queue a 'created' message when the pg peers
- osd tracks pending created messages, resends on mon reset
- prune pgs from the list when the pool flag is cleared
this will easily mean resending some of these if it takes a while for the pool's pgs to be created, but the messages are cheap and harmless.
#9 Updated by Neha Ojha about 1 year ago
The original bug is about a pool level flag - "FLAG_CREATING", which was introduced in 0e526b467af2699e389e7f28a6d709f216e4533e. This flag is not present in mimic and luminous. I am not sure the entire fix in https://github.com/ceph/ceph/pull/25731 is needed in those branches and the commit message "The OSD has to reliably deliver a pg_created message to the mon in order for the mon to clear the pool's CREATING flag." also indicates that.
https://tracker.ceph.com/issues/36498 and /a/yuriw-2019-04-04_00:00:53-rados-luminous-distro-basic-smithi/3806121/ are about PGs stuck in the creating state, which has got to do with the PG_STATE_CREATING flag, need to figure out if any parts of https://github.com/ceph/ceph/pull/25731 are fixing that too.