Bug #22233
prime_pg_temp breaks on uncreated pgs
% Done:
0%
Source:
Development
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature:
Description
- mon.b instructed osd.3 to create pg 92.4. the upset was [3,6]
- osd.3 created pg 92.4, and sent "created" message to mon
- but osd.3 was pending on up_thru message from mon, it wanted its up_thru to be 92 in osdmap, but currently 89.
- osd.3 was killed, marked down and out by the thrashosd task
- mon.b primed pg temp for pg 92.4, and mapped it to [4,6], but osd.4 is not updated with osd_pg_create message.
- so wait_for_clean task times out, and fail
/a/kchai-2017-11-23_12:43:54-rados-wip-kefu-testing-2017-11-23-1812-distro-basic-mira/1881963
History
#1 Updated by John Spray about 3 years ago
- Project changed from Ceph to RADOS
#2 Updated by Greg Farnum about 3 years ago
- Subject changed from cluster fails to return clean after marking down an OSD the thrashosd test to prime_pg_temp breaks on uncreated pgs
- Priority changed from Normal to High
#3 Updated by Sage Weil about 3 years ago
/a/sage-2017-12-05_18:31:27-rados-wip-pg-scrub-preempt-distro-basic-smithi/1934001 ?
#4 Updated by Kefu Chai about 3 years ago
- Status changed from New to In Progress
#5 Updated by Kefu Chai about 3 years ago
- Category set to Correctness/Safety
- Status changed from In Progress to Fix Under Review
- Backport set to luminous
#6 Updated by Sage Weil over 2 years ago
I don't understand by the bug happened (or what the proposed fix is trying to do). Given the description above, the mon should see hte pg mapping change from [3,6] to [4,6] and send the create to osd.4
(Also, haven't been seeing this failure...)
#7 Updated by Greg Farnum over 1 year ago
- Status changed from Fix Under Review to In Progress
- Priority changed from High to Normal
#8 Updated by Kefu Chai over 1 year ago
the mon should see hte pg mapping change from [3,6] to [4,6] and send the create to osd.4
exactly. that's why i am adding inc.new_pg_temp to pending_creatings.pgs in the fix.