Bug #58166
openmon:DAEMON_OLD_VERSION newer versions is considered older than earlier
0%
Description
We have a cluster with most mon/mgr/osd are running 16.2.10 and some OSDs are running 16.2.9
The healthcheck does not properly work as it thinks 16.2.10 is older than 16.2.9
$ ceph versions
"mon": {
"ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)": 3
},
"mgr": {
"ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)": 2
},
"osd": {
"ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)": censored,
"ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable)": censored
},
$ ceph health detail
(MUTED) [WRN] DAEMON_OLD_VERSION: There are daemons running an older version of ceph
mon.x mon.y mon.z osd.0 osd.1 osd.2 osd.3 <more OSDs in this list...> mgr.x mgr.y are running an older version of ceph: 16.2.10
Updated by Tobias Urdin over 1 year ago
This was probably introduced in https://github.com/ceph/ceph/pull/36759
Updated by Neha Ojha over 1 year ago
- Status changed from New to Need More Info
If your cluster is in the same state, can you please share mon logs with debug_mon=20? The following code snippet in check_for_older_version() does the version checking and it would be helpful to see the logs from here.
if (all_versions.size() > 1) { dout(20) << __func__ << " all_versions=" << all_versions << dendl; // The last entry has the largest version dout(20) << __func__ << " highest version daemon count " << all_versions.rbegin()->second.size() << dendl; // Erase last element (the highest version running) all_versions.erase(all_versions.rbegin()->first); ceph_assert(all_versions.size() > 0); ostringstream ss; unsigned daemon_count = 0; for (auto& g : all_versions) { daemon_count += g.second.size(); } int ver_count = all_versions.size(); ceph_assert(!(daemon_count == 1 && ver_count != 1)); ss << "There " << (daemon_count == 1 ? "is a daemon" : "are daemons") << " running " << (ver_count > 1 ? "multiple old versions" : "an older version") << " of ceph"; health_status_t status; if (ver_count > 1) status = HEALTH_ERR; else status = HEALTH_WARN; auto& d = checks->add("DAEMON_OLD_VERSION", status, ss.str(), all_versions.size()); for (auto& g : all_versions) { ostringstream ds; for (auto& i : g.second) { // Daemon list ds << i << " "; } ds << (g.second.size() == 1 ? "is" : "are") << " running an older version of ceph: " << g.first; d.detail.push_back(ds.str()); } } else { old_version_first_time = ceph::coarse_mono_clock::zero(); } }