Actions
Bug #18369
closedosd_recovery_incomplete: failed assert not manager.is_recovered()
Status:
Resolved
Priority:
Immediate
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
kraken,jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Updated by Sage Weil over 7 years ago
It looks like teh PGs are all active+remapped, as expected, but this satisfies the ceph_manager get_num_active_recovered() check, which looks like
for pg in pgs: if (pg['state'].count('active') and not pg['state'].count('recover') and not pg['state'].count('backfill') and not pg['state'].count('stale')): num += 1
I think we can simply drop is_recovered; is_clean is sufficient for this test.
There are tons of callers to wait_for_recovery(), though, which relies on this. I think they are fine, though...
Updated by Sage Weil over 7 years ago
No, they're not supposed to be active+remapped...
Updated by Sage Weil over 7 years ago
- Status changed from New to Fix Under Review
Okay, the problem seems to just be that the PG went into a backfill state but didn't tell the mon. e.g., in run
/a/sage-2016-12-29_20:50:13-rados-wip-sage-testing---basic-smithi/675453
pg 0.f did
2016-12-30 00:49:35.607752 7fe5d2c70700 15 osd.2 pg_epoch: 14 pg[0.f( v 11'1324 (11'1300,11'1324] local-les=14 n=1324 ec=1 les/c/f 14/9/0 13/13/5) [0,1]/[2,3] r=0 lpr=13 pi=8-12/2 bft=0,1 crt=11'1324 lcod 11'1323 mlcod 0'0 active+remapped] publish_stats_to_osd 14: no change since 2016-12-30 00:49:35.607509 ... 2016-12-30 00:49:35.612789 7fe5d3471700 10 osd.2 pg_epoch: 14 pg[0.f( v 11'1324 (11'1300,11'1324] local-les=14 n=1324 ec=1 les/c/f 14/9/0 13/13/5) [0,1]/[2,3] r=0 lpr=13 pi=8-12/2 bft=0,1 crt=11'1324 lcod 11'1323 mlcod 0'0 active+remapped+backfill_wait] queue_recovery -- queuing
but no stat updates that reflect the backfill_wait state bit.
Updated by Sage Weil over 7 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to kraken,jewel
Updated by Alexey Sheplyakov over 7 years ago
- Copied to Backport #18485: jewel: osd_recovery_incomplete: failed assert not manager.is_recovered() added
Updated by Loïc Dachary over 7 years ago
- Copied to Backport #18497: kraken: osd_recovery_incomplete: failed assert not manager.is_recovered() added
Updated by Nathan Cutler about 7 years ago
- Status changed from Pending Backport to Resolved
Actions