Project

General

Profile

Bug #58166

mon:DAEMON_OLD_VERSION newer versions is considered older than earlier

Added by Tobias Urdin about 2 months ago. Updated about 2 months ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have a cluster with most mon/mgr/osd are running 16.2.10 and some OSDs are running 16.2.9

The healthcheck does not properly work as it thinks 16.2.10 is older than 16.2.9

$ ceph versions
"mon": {
"ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)": 3
},
"mgr": {
"ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)": 2
},
"osd": {
"ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)": censored,
"ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable)": censored
},

$ ceph health detail
(MUTED) [WRN] DAEMON_OLD_VERSION: There are daemons running an older version of ceph
mon.x mon.y mon.z osd.0 osd.1 osd.2 osd.3 <more OSDs in this list...> mgr.x mgr.y are running an older version of ceph: 16.2.10

History

#1 Updated by Tobias Urdin about 2 months ago

This was probably introduced in https://github.com/ceph/ceph/pull/36759

#2 Updated by Neha Ojha about 2 months ago

  • Status changed from New to Need More Info

If your cluster is in the same state, can you please share mon logs with debug_mon=20? The following code snippet in check_for_older_version() does the version checking and it would be helpful to see the logs from here.

    if (all_versions.size() > 1) {
      dout(20) << __func__ << " all_versions=" << all_versions << dendl;
      // The last entry has the largest version
      dout(20) << __func__ << " highest version daemon count " 
               << all_versions.rbegin()->second.size() << dendl;
      // Erase last element (the highest version running)
      all_versions.erase(all_versions.rbegin()->first);
      ceph_assert(all_versions.size() > 0);
      ostringstream ss;
      unsigned daemon_count = 0;
      for (auto& g : all_versions) {
        daemon_count += g.second.size();
      }
      int ver_count = all_versions.size();
      ceph_assert(!(daemon_count == 1 && ver_count != 1)); 
      ss << "There " << (daemon_count == 1 ? "is a daemon" : "are daemons")
         << " running " << (ver_count > 1 ? "multiple old versions" : "an older version")  << " of ceph";
      health_status_t status;
      if (ver_count > 1)
        status = HEALTH_ERR;
      else
        status = HEALTH_WARN;
      auto& d = checks->add("DAEMON_OLD_VERSION", status, ss.str(), all_versions.size());
      for (auto& g : all_versions) {
        ostringstream ds;
        for (auto& i : g.second) { // Daemon list
          ds << i << " ";
        }
        ds << (g.second.size() == 1 ? "is" : "are") 
           << " running an older version of ceph: " << g.first;
        d.detail.push_back(ds.str());
      }
    } else {
      old_version_first_time = ceph::coarse_mono_clock::zero();
    }
  }

Also available in: Atom PDF