Project

General

Profile

Actions

Bug #63389

open

Failed to encode map X with expected CRC

Added by Navid Golpa 6 months ago. Updated about 1 month ago.

Status:
Pending Backport
Priority:
High
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
reef,squid
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

During upgrade of ceph from Quincy to Reef we encountered a problem as we upgraded each OSD. Every time an OSD was restarted to upgrade the Reef the MON's would get spammed with

failed to encode map X with expected crc

Network load on the MON would skyrocket. The problem was identical to what was described by Kefu here in 2016:
https://lore.kernel.org/all/CAJE9aONFauhy7v6n9bT11Sga+e0Qgi8hWu=gr-zoxuAq5Yv+cA@mail.gmail.com/T/

We did not follow the recommendation in that post of downgrading the MONs and upgrading the OSDs first and then upgrading the MONs again. Instead we powered through the upgrade by just taking a one day downtime and upgrading all remaining OSDs. Once all OSDs were upgraded the errors went away and cluster was back to normal operation.


Files

mon.csv (234 KB) mon.csv mon logs Navid Golpa, 11/02/2023 07:41 PM
ceph-w.txt.gz (214 KB) ceph-w.txt.gz ceph -w output Navid Golpa, 11/02/2023 07:43 PM
network.png (85.6 KB) network.png network transmit Navid Golpa, 11/02/2023 07:43 PM

Related issues 4 (2 open2 closed)

Related to Ceph - Bug #63425: tasks.cephadm: ceph.log No such file or directoryPending BackportDan van der Ster

Actions
Related to Ceph - Bug #17386: Upgrading 0.94.6 -> 0.94.9 saturating mon node networkingResolvedKefu Chai09/22/2016

Actions
Copied to RADOS - Backport #64406: reef: Failed to encode map X with expected CRCResolvedRadoslaw ZarzynskiActions
Copied to RADOS - Backport #65198: squid: Failed to encode map X with expected CRCIn ProgressRadoslaw ZarzynskiActions
Actions

Also available in: Atom PDF