Project

General

Profile

Actions

Bug #42600

closed

assert(addr_mons.count(m.public_addr) == 0);

Added by Xiaoxi Chen over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We hit this (in some case) from a luminous client (12.2.12) connect to a nautilus cluster (14.2.0/1), probably because we had a mon with V2 only addr in monmap, after remove the V2-only-mon, the issue mitigated.

The monmaptool in luminous release cannot even decode the problematic monmap properly.

[V14.2.1]
root@slccephmon01-ump:~# monmaptool --print monmap
monmaptool: monmap file monmap
epoch 19
fsid b84ee31e-b074-43bf-9129-37cc38c29f70
last_changed 2019-10-31 21:35:11.758275
created 2019-03-25 02:32:12.437798
min_mon_release 14 (nautilus)
0: [v2:10.153.59.57:3300/0,v1:10.153.59.57:6789/0] mon.slccephmon05-ump
1: [v2:10.156.10.234:3300/0,v1:10.156.10.234:6789/0] mon.slccephmon01-ump
2: [v2:10.158.66.214:3300/0,v1:10.158.66.214:6789/0] mon.slccephmon04-ump
3: [v2:10.202.81.143:3300/0,v1:10.202.81.143:6789/0] mon.slccephmon03-ump
4: [v2:10.218.98.206:3300/0,v1:10.218.98.206:6789/0] mon.slccephmon02-ump
5: v2:10.73.208.15:3300/0 mon.slc-1
6: v2:10.73.210.16:3300/0 mon.slc-2

[Ceph -s on 12.2.12 got assert with below backtrace]

(gdb) bt
#0 0x00007f098b909c37 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f098b90d028 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f09814a24b0 in ceph::__ceph_assert_fail(char const*, char const*, int, char const*) () from /usr/lib/ceph/libceph-common.so.0
#3 0x00007f09814f5a3a in MonMap::calc_ranks() () from /usr/lib/ceph/libceph-common.so.0
#4 0x00007f09814f8926 in MonMap::decode(ceph::buffer::list::iterator&) () from /usr/lib/ceph/libceph-common.so.0
#5 0x00007f09814e8dec in MonClient::handle_monmap(MMonMap*) () from /usr/lib/ceph/libceph-common.so.0
#6 0x00007f09814ed71b in MonClient::ms_dispatch(Message*) () from /usr/lib/ceph/libceph-common.so.0
#7 0x00007f098150dfcb in DispatchQueue::entry() () from /usr/lib/ceph/libceph-common.so.0
#8 0x00007f09815f28dd in DispatchQueue::DispatchThread::entry() () from /usr/lib/ceph/libceph-common.so.0
#9 0x00007f098bca4184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007f098b9d103d in clone () from /lib/x86_64-linux-gnu/libc.so.6


Files

monmap (944 Bytes) monmap problematic monmap Xiaoxi Chen, 11/02/2019 04:27 PM

Related issues 1 (0 open1 closed)

Copied to Ceph - Backport #42731: nautilus: assert(addr_mons.count(m.public_addr) == 0);ResolvedNathan CutlerActions
Actions #1

Updated by Xiaoxi Chen over 4 years ago

Actions #2

Updated by Xiaoxi Chen over 4 years ago

the core dump is too big (231MB) to be uploaded, let me know if you need it.

Actions #3

Updated by Sage Weil over 4 years ago

  • Status changed from New to 12
  • Priority changed from Normal to Urgent
Actions #4

Updated by Sage Weil over 4 years ago

  • Status changed from 12 to Fix Under Review
  • Backport set to nautilus
  • Pull request ID set to 31472
Actions #5

Updated by Sage Weil over 4 years ago

The problem here is that there are >1 v2-only mons. Those get encoded as a legacy entity_addr_t(), but since there are >1 of them, the monmap calc_ranks() fails.

Actions #6

Updated by Sage Weil over 4 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #7

Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #42731: nautilus: assert(addr_mons.count(m.public_addr) == 0); added
Actions #8

Updated by Nathan Cutler over 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF