Project

General

Profile

Bug #25057

jewel->luminous: osdmap crc mismatch

Added by Sage Weil about 1 year ago. Updated 12 months ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
-
Start date:
07/22/2018
Due date:
% Done:

0%

Source:
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:

Description

The upgrade/jewel-x runs for 12.2.6 and 12.2.7 threw osdmap crc mismatch errors.


Related issues

Copied to RADOS - Backport #25100: luminous: jewel->luminous: osdmap crc mismatch Resolved
Copied to RADOS - Backport #25101: mimic: jewel->luminous: osdmap crc mismatch Resolved

History

#1 Updated by Sage Weil about 1 year ago

/a/teuthology-2018-07-20_04:23:01-upgrade:jewel-x-luminous-distro-basic-smithi/2799173

is an instance where the mon still has the relevant original osdmap.

The diff is here:

< 00000650  32 00 00 00 01 00 00 00  01 01 36 00 00 00 01 00  |2.........6.....|
---
> 00000650  32 00 00 00 01 00 00 00  01 01 36 00 00 00 00 00  |2.........6.....|

of epoch 535, which corresponds so the chooseleaf_stable = 1.

535 is the first luminous osdmap. the incremental didn't reencode the crushmap. the erroring osd had a value of 0 for that field and the mon had a value of 1.

#2 Updated by Sage Weil about 1 year ago

  • Status changed from Verified to In Progress
  • Assignee set to Sage Weil

#3 Updated by Sage Weil about 1 year ago

  • Status changed from In Progress to Pending Backport
  • Backport set to mimic,luminous

#4 Updated by Sage Weil about 1 year ago

The problem was that CRUSH_TUNABLES5 was associated with kraken instead of jewel in 0ceb5c0, backported to luminous in 686b054 in 12.2.6.

The net impact of this is that luminous mons with require_osd_release jewel would not encode the "jewel tunables" in the osdmap. If a cluster has jewel tunables, then upgrades to 12.2.6 or 12.2.7, the jewel tunables will get reverted. Except "reverted" is ambiguous: the mon's in-memory crushmap will have the tunables set, but will not encode them in the full map until the require_osd_release=luminous flag is set. The OSDs will be upgraded/restarted after themons, so their in-memory copy will not have jewel tunables (their chooseleaf_stable=0). Thus there may be some problems with prime_pg_temp etc during this upgrade period. And clients who are surviving this whole period may misdirect requests (their in-memory tuanble may be 1). Once the final switch is flipped, things will go back to matching everywhere (in memory) and they'll get a crc error.

#6 Updated by Nathan Cutler about 1 year ago

  • Copied to Backport #25100: luminous: jewel->luminous: osdmap crc mismatch added

#7 Updated by Nathan Cutler about 1 year ago

  • Copied to Backport #25101: mimic: jewel->luminous: osdmap crc mismatch added

#8 Updated by Nathan Cutler 12 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF