Project

General

Profile

Actions

Bug #61662

open

mon/MDSMonitor: fs->mds_map.compat.compare(compat) == 0

Added by yite gu 11 months ago. Updated 7 months ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDSMonitor
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

# ceph crash info 2023-06-13T12:10:26.386825Z_98af3cea-c0a4-48ba-802a-15abd54a8350
{
    "assert_condition": "fs->mds_map.compat.compare(compat) == 0",
    "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.7/rpm/el8/BUILD/ceph-15.2.7/src/mds/FSMap.cc",
    "assert_func": "void FSMap::sanity() const",
    "assert_line": 846,
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.7/rpm/el8/BUILD/ceph-15.2.7/src/mds/FSMap.cc: In function 'void FSMap::sanity() const' thread 7f3dba2dd700 time 2023-06-13T12:10:26.383068+0000\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.7/rpm/el8/BUILD/ceph-15.2.7/src/mds/FSMap.cc: 846: FAILED ceph_assert(fs->mds_map.compat.compare(compat) == 0)\n",
    "assert_thread_name": "ms_dispatch",
    "backtrace": [
        "(()+0x12b20) [0x7f3dc5943b20]",
        "(gsignal()+0x10f) [0x7f3dc45ab7ff]",
        "(abort()+0x127) [0x7f3dc4595c35]",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f3dc7e4cbdd]",
        "(()+0x279da6) [0x7f3dc7e4cda6]",
        "(FSMap::sanity() const+0xcd) [0x7f3dc835204d]",
        "(MDSMonitor::update_from_paxos(bool*)+0x375) [0x55ae3ff7db95]",
        "(PaxosService::refresh(bool*)+0x10e) [0x55ae3fea483e]",
        "(Monitor::refresh_from_paxos(bool*)+0x18c) [0x55ae3fd7560c]",
        "(Paxos::do_refresh()+0x57) [0x55ae3fe92317]",
        "(Paxos::handle_commit(boost::intrusive_ptr<MonOpRequest>)+0x309) [0x55ae3fe98229]",
        "(Paxos::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x457) [0x55ae3fe9e827]",
        "(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x1089) [0x55ae3fdaa469]",
        "(Monitor::_ms_dispatch(Message*)+0x69d) [0x55ae3fdaafdd]",
        "(Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c) [0x55ae3fdd9b8c]",
        "(DispatchQueue::entry()+0x126a) [0x7f3dc806b2aa]",
        "(DispatchQueue::DispatchThread::entry()+0x11) [0x7f3dc810d901]",
        "(()+0x814a) [0x7f3dc593914a]",
        "(clone()+0x43) [0x7f3dc4670f23]" 
    ],
    "ceph_version": "15.2.7",
    "crash_id": "2023-06-13T12:10:26.386825Z_98af3cea-c0a4-48ba-802a-15abd54a8350",
    "entity_name": "mon.r",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mon",
    "stack_sig": "aa7259e6a9276895a70cc9ff18e5fd9c4d70e3e3e16c1e5bba67a000d7823289",
    "timestamp": "2023-06-13T12:10:26.386825Z",
    "utsname_hostname": "n133-017-070",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.56.bsk.9-amd64",
    "utsname_sysname": "Linux",
    "utsname_version": "#5.4.56.bsk.9 SMP Debian 5.4.56.bsk.9 Wed Aug 25 03:42:38 UTC 20" 
}
Actions #1

Updated by Patrick Donnelly 11 months ago

  • Description updated (diff)
  • Target version set to v19.0.0
  • Source set to Community (user)
  • Backport set to reef,quincy,pacific
  • Component(FS) MDSMonitor added
  • Labels (FS) crash added
Actions #2

Updated by Patrick Donnelly 11 months ago

  • Status changed from New to Need More Info

This is an Octopus monitor. Were you doing an upgrade? To what version?

Actions #3

Updated by Patrick Donnelly 11 months ago

  • Subject changed from fs->mds_map.compat.compare(compat) == 0 to mon/MDSMonitor: fs->mds_map.compat.compare(compat) == 0
Actions #4

Updated by yite gu 11 months ago

Patrick Donnelly wrote:

This is an Octopus monitor. Were you doing an upgrade? To what version?

15.2.7 upgrade to 16.2.13

Actions #5

Updated by yite gu 11 months ago

This PR:https://github.com/ceph/ceph/pull/44851
Can fix the problem?

Actions #6

Updated by Patrick Donnelly 11 months ago

yite gu wrote:

This PR:https://github.com/ceph/ceph/pull/44851
Can fix the problem?

That fix is already present in v16.2.13.

Actions #7

Updated by Patrick Donnelly 11 months ago

yite gu wrote:

Patrick Donnelly wrote:

This is an Octopus monitor. Were you doing an upgrade? To what version?

15.2.7 upgrade to 16.2.13

How did you do the upgrade? Were any MDS upgraded before this monitor?

Actions #8

Updated by yite gu 11 months ago

Patrick Donnelly wrote:

yite gu wrote:

Patrick Donnelly wrote:

This is an Octopus monitor. Were you doing an upgrade? To what version?

15.2.7 upgrade to 16.2.13

How did you do the upgrade? Were any MDS upgraded before this monitor?

  services:
    mon: 3 daemons, quorum h,q,r (age 18m)
    mgr: a(active, since 87m)
    mds: 1/1 daemons up, 1 hot standby
    osd: 24 osds: 24 up (since 58m), 24 in (since 3w)

1. upgrade mon.h, mon.h upgrade successful, but mon.q and mon.r happen crash
2. restart mon.q and mon.r, mon.q and mon.r upgrade success
3. upgrade mds success
4. upgrade mgr success
5. upgrade osd success
Actions #9

Updated by Patrick Donnelly 11 months ago

yite gu wrote:

Patrick Donnelly wrote:

yite gu wrote:

Patrick Donnelly wrote:

This is an Octopus monitor. Were you doing an upgrade? To what version?

15.2.7 upgrade to 16.2.13

How did you do the upgrade? Were any MDS upgraded before this monitor?

[...]
1. upgrade mon.h, mon.h upgrade successful, but mon.q and mon.r happen crash
2. restart mon.q and mon.r, mon.q and mon.r upgrade success
3. upgrade mds success
4. upgrade mgr success
5. upgrade osd success

Weird. I'm happy the upgrade continued for you without any further crashes. I'm not sure how this can happen without an MDS getting upgraded before all mons are upgraded.

Actions #10

Updated by yite gu 11 months ago

Patrick Donnelly wrote:

yite gu wrote:

Patrick Donnelly wrote:

yite gu wrote:

Patrick Donnelly wrote:

This is an Octopus monitor. Were you doing an upgrade? To what version?

15.2.7 upgrade to 16.2.13

How did you do the upgrade? Were any MDS upgraded before this monitor?

[...]
1. upgrade mon.h, mon.h upgrade successful, but mon.q and mon.r happen crash
2. restart mon.q and mon.r, mon.q and mon.r upgrade success
3. upgrade mds success
4. upgrade mgr success
5. upgrade osd success

Weird. I'm happy the upgrade continued for you without any further crashes. I'm not sure how this can happen without an MDS getting upgraded before all mons are upgraded.

I have many clusters of 15.2.7, I want to upgrade to 16.2.13 also, I will synchronize the situation here.

Actions #11

Updated by Milind Changire 11 months ago

  • Assignee set to Patrick Donnelly
Actions #12

Updated by Patrick Donnelly 11 months ago

yite gu wrote:

Patrick Donnelly wrote:

yite gu wrote:

Patrick Donnelly wrote:

yite gu wrote:

Patrick Donnelly wrote:

This is an Octopus monitor. Were you doing an upgrade? To what version?

15.2.7 upgrade to 16.2.13

How did you do the upgrade? Were any MDS upgraded before this monitor?

[...]
1. upgrade mon.h, mon.h upgrade successful, but mon.q and mon.r happen crash
2. restart mon.q and mon.r, mon.q and mon.r upgrade success
3. upgrade mds success
4. upgrade mgr success
5. upgrade osd success

Weird. I'm happy the upgrade continued for you without any further crashes. I'm not sure how this can happen without an MDS getting upgraded before all mons are upgraded.

I have many clusters of 15.2.7, I want to upgrade to 16.2.13 also, I will synchronize the situation here.

You can set `debug_mon = 20` and `debug_ms = 1` on the mons before doing the upgrade. That would be helpful if it reoccurs.

Actions #13

Updated by yite gu 11 months ago

Patrick Donnelly wrote:

yite gu wrote:

Patrick Donnelly wrote:

yite gu wrote:

Patrick Donnelly wrote:

yite gu wrote:

Patrick Donnelly wrote:

This is an Octopus monitor. Were you doing an upgrade? To what version?

15.2.7 upgrade to 16.2.13

How did you do the upgrade? Were any MDS upgraded before this monitor?

[...]
1. upgrade mon.h, mon.h upgrade successful, but mon.q and mon.r happen crash
2. restart mon.q and mon.r, mon.q and mon.r upgrade success
3. upgrade mds success
4. upgrade mgr success
5. upgrade osd success

Weird. I'm happy the upgrade continued for you without any further crashes. I'm not sure how this can happen without an MDS getting upgraded before all mons are upgraded.

I have many clusters of 15.2.7, I want to upgrade to 16.2.13 also, I will synchronize the situation here.

You can set `debug_mon = 20` and `debug_ms = 1` on the mons before doing the upgrade. That would be helpful if it reoccurs.

okay

Actions #14

Updated by Patrick Donnelly 7 months ago

  • Assignee deleted (Patrick Donnelly)
  • Target version deleted (v19.0.0)
  • Backport deleted (reef,quincy,pacific)
Actions

Also available in: Atom PDF