Project

General

Profile

Actions

Bug #37775

closed

some pg_created messages not sent to mon

Added by Sage Weil over 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

mon doesn't get pg_created for two pgs. CREATING flag is never removed, job fails with a final scrub timeout

/a/sage-2018-12-29_16:59:10-rados-master-distro-basic-smithi/3405637

osd sends them to mon, but a msgr reconnect drops them. there is no retry.

2018-12-29 17:47:31.080 7f949c7a8700  1 -- 172.21.15.110:6812/10945 --> 172.21.15.110:6790/0 -- osd_pg_created(1.0) v1 -- 0x55ecc3c00e00 con 0

Related issues 1 (0 open1 closed)

Has duplicate RADOS - Bug #37752: pool stuck with 'creating' flag setDuplicate12/24/2018

Actions
Actions #1

Updated by Sage Weil over 5 years ago

how about,
- if pool CREATING flag is sent, we queue a 'created' message when the pg peers
- osd tracks pending created messages, resends on mon reset
- prune pgs from the list when the pool flag is cleared

this will easily mean resending some of these if it takes a while for the pool's pgs to be created, but the messages are cheap and harmless.

Actions #2

Updated by Sage Weil over 5 years ago

  • Status changed from 12 to Fix Under Review
Actions #3

Updated by Kefu Chai over 5 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to luminous, mimic
  • Pull request ID set to 25731
Actions #4

Updated by Greg Farnum over 5 years ago

  • Has duplicate Bug #37752: pool stuck with 'creating' flag set added
Actions #7

Updated by Neha Ojha about 5 years ago

/a/yuriw-2019-04-04_00:00:53-rados-luminous-distro-basic-smithi/3806121/

Actions #8

Updated by Greg Farnum over 4 years ago

  • Assignee set to Neha Ojha
Actions #9

Updated by Neha Ojha over 4 years ago

The original bug is about a pool level flag - "FLAG_CREATING", which was introduced in 0e526b467af2699e389e7f28a6d709f216e4533e. This flag is not present in mimic and luminous. I am not sure the entire fix in https://github.com/ceph/ceph/pull/25731 is needed in those branches and the commit message "The OSD has to reliably deliver a pg_created message to the mon in order for the mon to clear the pool's CREATING flag." also indicates that.

https://tracker.ceph.com/issues/36498 and /a/yuriw-2019-04-04_00:00:53-rados-luminous-distro-basic-smithi/3806121/ are about PGs stuck in the creating state, which has got to do with the PG_STATE_CREATING flag, need to figure out if any parts of https://github.com/ceph/ceph/pull/25731 are fixing that too.

Actions #10

Updated by Neha Ojha over 4 years ago

This patch does not make sense for mimic and luminous.
@Nathan Weinberg can we please resolve this issue and close the corresponding backport trackers.

Actions #11

Updated by Nathan Cutler over 4 years ago

  • Status changed from Pending Backport to Resolved
  • Backport deleted (luminous, mimic)
Actions

Also available in: Atom PDF