Bug #61662
openmon/MDSMonitor: fs->mds_map.compat.compare(compat) == 0
0%
Description
# ceph crash info 2023-06-13T12:10:26.386825Z_98af3cea-c0a4-48ba-802a-15abd54a8350 { "assert_condition": "fs->mds_map.compat.compare(compat) == 0", "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.7/rpm/el8/BUILD/ceph-15.2.7/src/mds/FSMap.cc", "assert_func": "void FSMap::sanity() const", "assert_line": 846, "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.7/rpm/el8/BUILD/ceph-15.2.7/src/mds/FSMap.cc: In function 'void FSMap::sanity() const' thread 7f3dba2dd700 time 2023-06-13T12:10:26.383068+0000\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.7/rpm/el8/BUILD/ceph-15.2.7/src/mds/FSMap.cc: 846: FAILED ceph_assert(fs->mds_map.compat.compare(compat) == 0)\n", "assert_thread_name": "ms_dispatch", "backtrace": [ "(()+0x12b20) [0x7f3dc5943b20]", "(gsignal()+0x10f) [0x7f3dc45ab7ff]", "(abort()+0x127) [0x7f3dc4595c35]", "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f3dc7e4cbdd]", "(()+0x279da6) [0x7f3dc7e4cda6]", "(FSMap::sanity() const+0xcd) [0x7f3dc835204d]", "(MDSMonitor::update_from_paxos(bool*)+0x375) [0x55ae3ff7db95]", "(PaxosService::refresh(bool*)+0x10e) [0x55ae3fea483e]", "(Monitor::refresh_from_paxos(bool*)+0x18c) [0x55ae3fd7560c]", "(Paxos::do_refresh()+0x57) [0x55ae3fe92317]", "(Paxos::handle_commit(boost::intrusive_ptr<MonOpRequest>)+0x309) [0x55ae3fe98229]", "(Paxos::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x457) [0x55ae3fe9e827]", "(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x1089) [0x55ae3fdaa469]", "(Monitor::_ms_dispatch(Message*)+0x69d) [0x55ae3fdaafdd]", "(Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c) [0x55ae3fdd9b8c]", "(DispatchQueue::entry()+0x126a) [0x7f3dc806b2aa]", "(DispatchQueue::DispatchThread::entry()+0x11) [0x7f3dc810d901]", "(()+0x814a) [0x7f3dc593914a]", "(clone()+0x43) [0x7f3dc4670f23]" ], "ceph_version": "15.2.7", "crash_id": "2023-06-13T12:10:26.386825Z_98af3cea-c0a4-48ba-802a-15abd54a8350", "entity_name": "mon.r", "os_id": "centos", "os_name": "CentOS Linux", "os_version": "8", "os_version_id": "8", "process_name": "ceph-mon", "stack_sig": "aa7259e6a9276895a70cc9ff18e5fd9c4d70e3e3e16c1e5bba67a000d7823289", "timestamp": "2023-06-13T12:10:26.386825Z", "utsname_hostname": "n133-017-070", "utsname_machine": "x86_64", "utsname_release": "5.4.56.bsk.9-amd64", "utsname_sysname": "Linux", "utsname_version": "#5.4.56.bsk.9 SMP Debian 5.4.56.bsk.9 Wed Aug 25 03:42:38 UTC 20" }
Updated by Patrick Donnelly 11 months ago
- Description updated (diff)
- Target version set to v19.0.0
- Source set to Community (user)
- Backport set to reef,quincy,pacific
- Component(FS) MDSMonitor added
- Labels (FS) crash added
Updated by Patrick Donnelly 11 months ago
- Status changed from New to Need More Info
This is an Octopus monitor. Were you doing an upgrade? To what version?
Updated by Patrick Donnelly 11 months ago
- Subject changed from fs->mds_map.compat.compare(compat) == 0 to mon/MDSMonitor: fs->mds_map.compat.compare(compat) == 0
Updated by Patrick Donnelly 11 months ago
yite gu wrote:
This PR:https://github.com/ceph/ceph/pull/44851
Can fix the problem?
That fix is already present in v16.2.13.
Updated by Patrick Donnelly 11 months ago
yite gu wrote:
Patrick Donnelly wrote:
This is an Octopus monitor. Were you doing an upgrade? To what version?
15.2.7 upgrade to 16.2.13
How did you do the upgrade? Were any MDS upgraded before this monitor?
Updated by yite gu 11 months ago
Patrick Donnelly wrote:
yite gu wrote:
Patrick Donnelly wrote:
This is an Octopus monitor. Were you doing an upgrade? To what version?
15.2.7 upgrade to 16.2.13
How did you do the upgrade? Were any MDS upgraded before this monitor?
services: mon: 3 daemons, quorum h,q,r (age 18m) mgr: a(active, since 87m) mds: 1/1 daemons up, 1 hot standby osd: 24 osds: 24 up (since 58m), 24 in (since 3w)
1. upgrade mon.h, mon.h upgrade successful, but mon.q and mon.r happen crash
2. restart mon.q and mon.r, mon.q and mon.r upgrade success
3. upgrade mds success
4. upgrade mgr success
5. upgrade osd success
Updated by Patrick Donnelly 11 months ago
yite gu wrote:
Patrick Donnelly wrote:
yite gu wrote:
Patrick Donnelly wrote:
This is an Octopus monitor. Were you doing an upgrade? To what version?
15.2.7 upgrade to 16.2.13
How did you do the upgrade? Were any MDS upgraded before this monitor?
[...]
1. upgrade mon.h, mon.h upgrade successful, but mon.q and mon.r happen crash
2. restart mon.q and mon.r, mon.q and mon.r upgrade success
3. upgrade mds success
4. upgrade mgr success
5. upgrade osd success
Weird. I'm happy the upgrade continued for you without any further crashes. I'm not sure how this can happen without an MDS getting upgraded before all mons are upgraded.
Updated by yite gu 11 months ago
Patrick Donnelly wrote:
yite gu wrote:
Patrick Donnelly wrote:
yite gu wrote:
Patrick Donnelly wrote:
This is an Octopus monitor. Were you doing an upgrade? To what version?
15.2.7 upgrade to 16.2.13
How did you do the upgrade? Were any MDS upgraded before this monitor?
[...]
1. upgrade mon.h, mon.h upgrade successful, but mon.q and mon.r happen crash
2. restart mon.q and mon.r, mon.q and mon.r upgrade success
3. upgrade mds success
4. upgrade mgr success
5. upgrade osd successWeird. I'm happy the upgrade continued for you without any further crashes. I'm not sure how this can happen without an MDS getting upgraded before all mons are upgraded.
I have many clusters of 15.2.7, I want to upgrade to 16.2.13 also, I will synchronize the situation here.
Updated by Patrick Donnelly 11 months ago
yite gu wrote:
Patrick Donnelly wrote:
yite gu wrote:
Patrick Donnelly wrote:
yite gu wrote:
Patrick Donnelly wrote:
This is an Octopus monitor. Were you doing an upgrade? To what version?
15.2.7 upgrade to 16.2.13
How did you do the upgrade? Were any MDS upgraded before this monitor?
[...]
1. upgrade mon.h, mon.h upgrade successful, but mon.q and mon.r happen crash
2. restart mon.q and mon.r, mon.q and mon.r upgrade success
3. upgrade mds success
4. upgrade mgr success
5. upgrade osd successWeird. I'm happy the upgrade continued for you without any further crashes. I'm not sure how this can happen without an MDS getting upgraded before all mons are upgraded.
I have many clusters of 15.2.7, I want to upgrade to 16.2.13 also, I will synchronize the situation here.
You can set `debug_mon = 20` and `debug_ms = 1` on the mons before doing the upgrade. That would be helpful if it reoccurs.
Updated by yite gu 11 months ago
Patrick Donnelly wrote:
yite gu wrote:
Patrick Donnelly wrote:
yite gu wrote:
Patrick Donnelly wrote:
yite gu wrote:
Patrick Donnelly wrote:
This is an Octopus monitor. Were you doing an upgrade? To what version?
15.2.7 upgrade to 16.2.13
How did you do the upgrade? Were any MDS upgraded before this monitor?
[...]
1. upgrade mon.h, mon.h upgrade successful, but mon.q and mon.r happen crash
2. restart mon.q and mon.r, mon.q and mon.r upgrade success
3. upgrade mds success
4. upgrade mgr success
5. upgrade osd successWeird. I'm happy the upgrade continued for you without any further crashes. I'm not sure how this can happen without an MDS getting upgraded before all mons are upgraded.
I have many clusters of 15.2.7, I want to upgrade to 16.2.13 also, I will synchronize the situation here.
You can set `debug_mon = 20` and `debug_ms = 1` on the mons before doing the upgrade. That would be helpful if it reoccurs.
okay
Updated by Patrick Donnelly 7 months ago
- Assignee deleted (
Patrick Donnelly) - Target version deleted (
v19.0.0) - Backport deleted (
reef,quincy,pacific)