Bug #42600
closedassert(addr_mons.count(m.public_addr) == 0);
0%
Description
We hit this (in some case) from a luminous client (12.2.12) connect to a nautilus cluster (14.2.0/1), probably because we had a mon with V2 only addr in monmap, after remove the V2-only-mon, the issue mitigated.
The monmaptool in luminous release cannot even decode the problematic monmap properly.
[V14.2.1]
root@slccephmon01-ump:~# monmaptool --print monmap
monmaptool: monmap file monmap
epoch 19
fsid b84ee31e-b074-43bf-9129-37cc38c29f70
last_changed 2019-10-31 21:35:11.758275
created 2019-03-25 02:32:12.437798
min_mon_release 14 (nautilus)
0: [v2:10.153.59.57:3300/0,v1:10.153.59.57:6789/0] mon.slccephmon05-ump
1: [v2:10.156.10.234:3300/0,v1:10.156.10.234:6789/0] mon.slccephmon01-ump
2: [v2:10.158.66.214:3300/0,v1:10.158.66.214:6789/0] mon.slccephmon04-ump
3: [v2:10.202.81.143:3300/0,v1:10.202.81.143:6789/0] mon.slccephmon03-ump
4: [v2:10.218.98.206:3300/0,v1:10.218.98.206:6789/0] mon.slccephmon02-ump
5: v2:10.73.208.15:3300/0 mon.slc-1
6: v2:10.73.210.16:3300/0 mon.slc-2
[Ceph -s on 12.2.12 got assert with below backtrace]
(gdb) bt
#0 0x00007f098b909c37 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f098b90d028 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f09814a24b0 in ceph::__ceph_assert_fail(char const*, char const*, int, char const*) () from /usr/lib/ceph/libceph-common.so.0
#3 0x00007f09814f5a3a in MonMap::calc_ranks() () from /usr/lib/ceph/libceph-common.so.0
#4 0x00007f09814f8926 in MonMap::decode(ceph::buffer::list::iterator&) () from /usr/lib/ceph/libceph-common.so.0
#5 0x00007f09814e8dec in MonClient::handle_monmap(MMonMap*) () from /usr/lib/ceph/libceph-common.so.0
#6 0x00007f09814ed71b in MonClient::ms_dispatch(Message*) () from /usr/lib/ceph/libceph-common.so.0
#7 0x00007f098150dfcb in DispatchQueue::entry() () from /usr/lib/ceph/libceph-common.so.0
#8 0x00007f09815f28dd in DispatchQueue::DispatchThread::entry() () from /usr/lib/ceph/libceph-common.so.0
#9 0x00007f098bca4184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007f098b9d103d in clone () from /lib/x86_64-linux-gnu/libc.so.6
Files
Updated by Xiaoxi Chen over 4 years ago
the core dump is too big (231MB) to be uploaded, let me know if you need it.
Updated by Sage Weil over 4 years ago
- Status changed from New to 12
- Priority changed from Normal to Urgent
Updated by Sage Weil over 4 years ago
- Status changed from 12 to Fix Under Review
- Backport set to nautilus
- Pull request ID set to 31472
Updated by Sage Weil over 4 years ago
The problem here is that there are >1 v2-only mons. Those get encoded as a legacy entity_addr_t(), but since there are >1 of them, the monmap calc_ranks() fails.
Updated by Sage Weil over 4 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler over 4 years ago
- Copied to Backport #42731: nautilus: assert(addr_mons.count(m.public_addr) == 0); added
Updated by Nathan Cutler over 4 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".