Bug #2602
osd: push failed because local copy is X
0%
Description
Hi,
filestore updated completed.
When i start the "updated" OSD the whole cluster starts lagging.
Is the next branch OSD incompatible to 0.47-2 OSDs?
We've this error in the logfile:
2012-06-18 10:24:18.299444 osd.14 46.19.94.2:6800/29435 3298 : [ERR] 2.110 push d09d4910/rb.0.29.0000000001ab/head v 8390'3328971 to osd.2 failed because local copy is 8402'3329186
History
#1 Updated by Sage Weil over 11 years ago
Is this reproducible with 'debug osd = 20'?
#2 Updated by Simon Frerichs over 11 years ago
- File osd.log.gz added
Updated another osd to 'next' and same errors happened.
I've attached the log with debug osd = 20 set.
#3 Updated by Sage Weil over 11 years ago
- Status changed from New to Need More Info
Hi Simon-
This looks like something that could be caused by the broken rolling osd upgrade support in the branch you tried. The current wip_rolling_upgrade branch is passing our testing (so far) and should avoid it. However, having run the previous version you may have corrupted some of the metadata, so it may not be as simple as upgrading to wip_rolling_upgrade. Is this a throwaway data set?
#4 Updated by Simon Frerichs over 11 years ago
Hi Sage,
just updated to your wip_rolling_upgrade branch.
FileStore update worked ( 100GB => 30 minutes on XFS ) and after that the osd started without any errors and got back in sync.
2012-06-22 09:24:14.403839 pg v5537683: 2112 pgs: 2112 active+clean; 2609 GB data, 5195 GB used, 35660 GB / 40856 GB avail
#5 Updated by Sage Weil over 11 years ago
- Status changed from Need More Info to Resolved