Bug #19449: 10.2.3->10.2.6 upgrade switched crush tunables, generated crc errors while processing? - RADOS - Ceph

Actions

Copy link

Bug #19449

closed

10.2.3->10.2.6 upgrade switched crush tunables, generated crc errors while processing?

Added by Herbert Faleiros about 7 years ago. Updated over 6 years ago.

Status:

Won't Fix

Priority:

High

Assignee:

Category:

Administration/Usability

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v10.2.3, Ceph - v10.2.6

ceph-qa-suite:

upgrade/jewel-x

Component(RADOS):

Monitor

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Hi,

when upgrading my cluster from 10.2.3 to 10.2.6 I've faced a major failure and I think it could(?) be a bug.

My SO is Ubuntu (Xenial), Ceph packages are also from distro. My cluster have 3 monitors and 96 OSDs.

First I stoped one mon, then upgrade SO packages, reboot, it came back on as expected with no failures. Did the same with another mon, OK too, but when I stopped my last mon a HEALTH_ERR, tons of blocked requests and several minutes (with almost zero client I/O) until the recovery process starts...

Two days latter (with an inconvinient performance degradation) the cluster became HEALTH_OK again, just then I upgraded all my OSDs from 10.2.3 to 10.2.6 (this time fortunately without any surprises).

My question is: why this happened?

In my logs I can see only (from monitor booting process) things like:

2017-03-27 11:21:13.955155 7f7b24df3700 0 mon.mon-node1@-1(probing).osd e166803 crush map has features 288514051259236352, adjusting msgr requires

2017-03-27 11:21:14.020915 7f7b16a10700 0 -- 10.2.15.20:6789/0 >> 10.2.15.22:6789/0 pipe(0x55eeea485400 sd=12 :49238 s=2 pgs=3041041 cs=1 l=0 c=0x55eee9206c00).reader missed message? skipped from seq 0 to 821720064
2017-03-27 11:21:14.021322 7f7b1690f700 0 -- 10.2.15.20:6789/0 >> 10.2.15.21:6789/0 pipe(0x55eeea484000 sd=11 :44714 s=2 pgs=6749444 cs=1 l=0 c=0x55eee9206a80).reader missed message? skipped from seq 0 to 1708671746

And also (from all my OSDs) a lot of:

2017-03-27 11:21:46.991533 osd.62 10.2.15.37:6812/4072 21935 : cluster [WRN] failed to encode map e167847 with expected crc

When things started to goes wrong (when I stopped mon-node1 to upgrade, the last one) I can see:

2017-03-27 11:05:07.143529 mon.1 10.2.15.21:6789/0 653 : cluster [INF]
+HEALTH_ERR; 54 pgs are stuck inactive for more than 300 seconds; 2153 pgs backfill_wait; 21 pgs
+backfilling; 53 pgs degraded; 2166 pgs peering; 3 pgs recovering; 50 pgs recovery_wait; 54 pgs stuck
+inactive; 118 pgs stuck unclean; 1549 requests are blocked > 32 sec; recovery 28926/57075284 objects
+degraded (0.051%); recovery 24971455/57075284 objects misplaced (43.752%); all OSDs are running jewel or
+later but the 'require_jewel_osds' osdmap flag is not set; 1 mons down, quorum 1,2 mon-node2,mon-node3

And when mon-node1 came back (already upgraded):

2017-03-27 11:21:58.987092 7f7b18c16700 0 log_channel(cluster) log [INF] : mon.mon-node1 calling new monitor election
2017-03-27 11:21:58.987186 7f7b18c16700 1 mon.mon-node1@0(electing).elector(162) init, last seen epoch 162
2017-03-27 11:21:59.064957 7f7b18c16700 0 log_channel(cluster) log [INF] : mon.mon-node1 calling new monitor election
2017-03-27 11:21:59.065029 7f7b18c16700 1 mon.mon-node1@0(electing).elector(165) init, last seen epoch 165
2017-03-27 11:21:59.096933 7f7b18c16700 0 log_channel(cluster) log [INF] : mon.mon-node1@0 won leader election with quorum 0,1,2
2017-03-27 11:21:59.114194 7f7b18c16700 0 log_channel(cluster) log [INF] : HEALTH_ERR; 2167 pgs are stuck inactive for more than 300 seconds; 2121 pgs backfill_wait; 25 pgs backfilling; 25 pgs degraded; 2147 pgs peering; 25 pgs recovery_wait; 25 pgs stuck degraded; 2167 pgs stuck inactive; 4338 pgs stuck unclean; 5082 requests are blocked > 32 sec; recovery 11846/55732755 objects degraded (0.021%); recovery 24595033/55732755 objects misplaced (44.130%); all OSDs are running jewel or later but the 'require_jewel_osds' osdmap flag is not set

crc errors disappeared when all monitors were upgraded and require_jewel_osds flag was set too.

It seems that the entire cluster was rebuilded, fortunately I didn't lose any data.

Actions

Copy link

Updated by Herbert Faleiros about 7 years ago

seems that my tunables jumped (for some reason) from firefly (jewel defaults, right?) to hammer, if it really happened this is the cause of the entire cluster reconfiguration.

Is there any way to check (or test) it?

If the upgrade did it it really a bug.

Before:

ceph osd crush show-tunables {
"choose_local_tries": 0,
"choose_local_fallback_tries": 0,
"choose_total_tries": 50,
"chooseleaf_descend_once": 1,
"chooseleaf_vary_r": 1,
"chooseleaf_stable": 0,
"straw_calc_version": 1,
"allowed_bucket_algs": 22,
"profile": "firefly",
"optimal_tunables": 0,
"legacy_tunables": 0,
"minimum_required_version": "firefly",
"require_feature_tunables": 1,
"require_feature_tunables2": 1,
"has_v2_rules": 0,
"require_feature_tunables3": 1,
"has_v3_rules": 0,
"has_v4_buckets": 0,
"require_feature_tunables5": 0,
"has_v5_rules": 0
}

Now:

ceph osd crush show-tunables {
"choose_local_tries": 0,
"choose_local_fallback_tries": 0,
"choose_total_tries": 50,
"chooseleaf_descend_once": 1,
"chooseleaf_vary_r": 1,
"chooseleaf_stable": 0,
"straw_calc_version": 1,
"allowed_bucket_algs": 54,
"profile": "hammer",
"optimal_tunables": 0,
"legacy_tunables": 0,
"minimum_required_version": "hammer",
"require_feature_tunables": 1,
"require_feature_tunables2": 1,
"has_v2_rules": 0,
"require_feature_tunables3": 1,
"has_v3_rules": 0,
"has_v4_buckets": 1,
"require_feature_tunables5": 0,
"has_v5_rules": 0
}

Actions

Copy link

Updated by Greg Farnum almost 7 years ago

Project changed from Ceph to RADOS
Subject changed from Problem upgrading Jewel from 10.2.3 to 10.2.6 to 10.2.3->10.2.6 upgrade switched crush tunables, generated crc errors while processing?
Category changed from Monitor to Administration/Usability
Component(RADOS) Monitor added