Bug #41799
closedclient: FAILED assert(cap == in->auth_cap)
0%
Description
below log explains the issue clearly, the auth_caps was set to NULL in previous remove_caps, and when add_update_cap get called from handle_import_cap, the seq == cap.seq holds.
2019-09-12 07:22:12.142637 7f200bee1700 1 -- 10.190.184.174:0/3460862904 <== mds.1 10.166.123.131:6800/3395465192 10 ==== client_caps(export ino 0x200000511ba 2782052306 seq 0 caps=- dirty=- wanted=- follows 0 size 0/0 mtime 0.000000) v11 ==== 252+0+0 (557563979 0 0) 0x564b9c90f600 con 0x564b9cc13000 2019-09-12 07:22:12.142721 7f200bee1700 10 client.138394001 mds.1 seq now 1 2019-09-12 07:22:12.142734 7f200bee1700 5 client.138394001 handle_cap_export ino 0x200000511ba mseq 0 EXPORT from mds.1 2019-09-12 07:22:12.142742 7f200bee1700 10 client.138394001 _open_mds_session mds.0 2019-09-12 07:22:12.143001 7f200bee1700 1 -- 10.190.184.174:0/3460862904 --> 10.212.160.177:6800/169156538 -- client_session(request_open) v2 -- 0x564b9c95aac0 con 0 2019-09-12 07:22:12.143087 7f200bee1700 10 client.138394001 add_update_cap issued - -> pAsLsXsFs from mds.0 on 0x200000511ba.head(faked_ino=0 ref=3 ll_ref=1 cap_refs={} open={} mode=40755 size=0/0 nlink=1 mtime=2019-09-12 07:22:04.840341 caps=pAsLsXsFs(0=pAsLsXsFs,1=pAsLsXsFs,3=pAsLsXsFs) COMPLETE quota(max_bytes = 6597069766656 max_files = 0) 0x564b9cc2a000) 2019-09-12 07:22:12.143131 7f200bee1700 10 client.138394001 remove_cap mds.1 on 0x200000511ba.head(faked_ino=0 ref=3 ll_ref=1 cap_refs={} open={} mode=40755 size=0/0 nlink=1 mtime=2019-09-12 07:22:04.840341 caps=pAsLsXsFs(0=pAsLsXsFs,1=pAsLsXsFs,3=pAsLsXsFs) COMPLETE quota(max_bytes = 6597069766656 max_files = 0) 0x564b9cc2a000) 2019-09-12 07:22:12.143159 7f200bee1700 1 -- 10.190.184.174:0/3460862904 <== mds.3 10.212.160.176:6800/3474105568 9 ==== client_caps(export ino 0x200000511ba 3772052911 seq 0 caps=- dirty=- wanted=- follows 0 size 0/0 mtime 0.000000) v11 ==== 252+0+0 (3141920983 0 0) 0x564b9c90f180 con 0x564b9cc10000 2019-09-12 07:22:12.143170 7f200bee1700 10 client.138394001 mds.3 seq now 3 2019-09-12 07:22:12.143172 7f200bee1700 5 client.138394001 handle_cap_export ino 0x200000511ba mseq 0 EXPORT from mds.3 2019-09-12 07:22:12.143174 7f200bee1700 10 client.138394001 remove_cap mds.3 on 0x200000511ba.head(faked_ino=0 ref=3 ll_ref=1 cap_refs={} open={} mode=40755 size=0/0 nlink=1 mtime=2019-09-12 07:22:04.840341 caps=pAsLsXsFs(0=pAsLsXsFs,3=pAsLsXsFs) COMPLETE quota(max_bytes = 6597069766656 max_files = 0) 0x564b9cc2a000) 2019-09-12 07:22:12.173783 7f200bee1700 10 client.138394001 ms_handle_connect on 10.212.160.177:6800/169156538 2019-09-12 07:22:12.181461 7f200bee1700 1 -- 10.190.184.174:0/3460862904 <== mds.0 10.212.160.177:6800/169156538 1 ==== client_session(open) v1 ==== 28+0+0 (2719320884 0 0) 0x564b9c958fc0 con 0x564b9cd92000 2019-09-12 07:22:12.181485 7f200bee1700 10 client.138394001 handle_client_session client_session(open) v1 from mds.0 2019-09-12 07:22:12.181489 7f200bee1700 10 client.138394001 renew_caps mds.0 2019-09-12 07:22:12.181491 7f200bee1700 1 -- 10.190.184.174:0/3460862904 --> 10.212.160.177:6800/169156538 -- client_session(request_renewcaps seq 1) v2 -- 0x564b9c95b840 con 0 2019-09-12 07:22:12.181544 7f200bee1700 10 client.138394001 connect_mds_targets for mds.0 2019-09-12 07:22:12.181553 7f200bee1700 1 -- 10.190.184.174:0/3460862904 <== mds.0 10.212.160.177:6800/169156538 2 ==== client_caps(import ino 0x200000511ba 5277779549 seq 1 caps=pAsLsXsFs dirty=- wanted=AsLsXsFs follows 0 size 0/0 ts 1/18446744073709551615 mtime 2019-09-12 07:22:04.840341 xattrs(v=1 l=4)) v11 ==== 321+4+0 (2210101217 0 0) 0x564b9c90ed00 con 0x564b9cd92000 2019-09-12 07:22:12.181571 7f200bee1700 10 client.138394001 mds.0 seq now 1 2019-09-12 07:22:12.181575 7f200bee1700 5 client.138394001 handle_cap_import ino 0x200000511ba mseq 0 IMPORT from mds.0 2019-09-12 07:22:12.181577 7f200bee1700 10 client.138394001 update_snap_trace len 48 2019-09-12 07:22:12.181580 7f200bee1700 20 client.138394001 get_snap_realm 0x1 0x564b9c934f70 14 -> 15 2019-09-12 07:22:12.181582 7f200bee1700 10 client.138394001 update_snap_trace snaprealm(0x1 nref=15 c=0 seq=1 parent=0x0 my_snaps=[] cached_snapc=1=[]) seq 1 <= 1 and same parent, SKIPPING 2019-09-12 07:22:12.182866 7f200bee1700 -1 /build/ceph-12.2.12/src/client/Client.cc: In function 'void Client::add_update_cap(Inode*, MetaSession*, uint64_t, unsigned int, unsigned int, unsigned int, unsigned int, inodeno_t, int, const UserPerm&)' thread 7f200bee1700 time 2019-09-12 07:22:12.181592 /build/ceph-12.2.12/src/client/Client.cc: 3963: FAILED assert(cap == in->auth_cap)
(gdb) bt #0 0x00007fbe58509269 in raise () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x000056372c54477e in reraise_fatal (signum=6) at /build/ceph-12.2.12/src/global/signal_handler.cc:74 #2 handle_fatal_signal (signum=6) at /build/ceph-12.2.12/src/global/signal_handler.cc:138 #3 <signal handler called> #4 0x00007fbe57295428 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #5 0x00007fbe5729702a in abort () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x000056372c15099e in ceph::__ceph_assert_fail (assertion=assertion@entry=0x56372c5d0e08 "cap == in->auth_cap", file=file@entry=0x56372c5cc6f8 "/build/ceph-12.2.12/src/client/Client.cc", line=line@entry=3963, func=func@entry=0x56372c5d4860 <Client::add_update_cap(Inode*, MetaSession*, unsigned long, unsigned int, unsigned int, unsigned int, unsigned int, inodeno_t, int, UserPerm const&)::__PRETTY_FUNCTION__> "void Client::add_update_cap(Inode*, MetaSession*, uint64_t, unsigned int, unsigned int, unsigned int, unsigned int, inodeno_t, int, const UserPerm&)") at /build/ceph-12.2.12/src/common/assert.cc:66 #7 0x000056372c0b570d in Client::add_update_cap (this=this@entry=0x563735d12000, in=in@entry=0x563735d54000, mds_session=mds_session@entry=0x563735b10d00, cap_id=5277779602, issued=341, wanted=340, seq=1, mseq=0, realm=..., flags=1, cap_perms=...) at /build/ceph-12.2.12/src/client/Client.cc:3963 #8 0x000056372c0b73c6 in Client::handle_cap_import (this=this@entry=0x563735d12000, session=session@entry=0x563735b10d00, in=in@entry=0x563735d54000, m=m@entry=0x563735a40d00) at /build/ceph-12.2.12/src/client/Client.cc:4866 #9 0x000056372c0df721 in Client::handle_caps (this=this@entry=0x563735d12000, m=m@entry=0x563735a40d00) at /build/ceph-12.2.12/src/client/Client.cc:4821 #10 0x000056372c0e33cb in Client::ms_dispatch (this=0x563735d12000, m=0x563735a40d00) at /build/ceph-12.2.12/src/client/Client.cc:2571 #11 0x000056372c4d81ea in Messenger::ms_deliver_dispatch (m=0x563735a40d00, this=0x563735c6a000) at /build/ceph-12.2.12/src/msg/Messenger.h:668 #12 DispatchQueue::entry (this=0x563735c6a180) at /build/ceph-12.2.12/src/msg/DispatchQueue.cc:197 #13 0x000056372c1f37dd in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /build/ceph-12.2.12/src/msg/DispatchQueue.h:101 #14 0x00007fbe584ff6ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #15 0x00007fbe5736741d in clone () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) f 7 #7 0x000056372c0b570d in Client::add_update_cap (this=this@entry=0x563735d12000, in=in@entry=0x563735d54000, mds_session=mds_session@entry=0x563735b10d00, cap_id=5277779602, issued=341, wanted=340, seq=1, mseq=0, realm=..., flags=1, cap_perms=...) at /build/ceph-12.2.12/src/client/Client.cc:3963 3963 assert(cap == in->auth_cap); (gdb) p in->auth_cap $1 = (Cap *) 0x0 (gdb) p seq $2 = 1 (gdb) p cap.seq $3 = 1
Updated by Xiaoxi Chen over 4 years ago
The issue affect all releases including master
Updated by Patrick Donnelly over 4 years ago
- Subject changed from FAILED assert(cap == in->auth_cap) to client: FAILED assert(cap == in->auth_cap)
- Target version set to v15.0.0
- Start date deleted (
09/12/2019) - Source set to Community (dev)
- Component(FS) Client added
- Component(FS) deleted (
ceph-fuse)
Updated by Patrick Donnelly over 4 years ago
- Status changed from New to Fix Under Review
- Assignee changed from Zheng Yan to Xiaoxi Chen
- Pull request ID set to 30402
- Labels (FS) offline added
Updated by Zheng Yan over 4 years ago
I'd like to know why the cap import message's seq is 1, mseq is 0. please use gdb to print cap import message's peer information. (f 8, p m->peer)
Updated by Xiaoxi Chen over 4 years ago
(gdb) p m->peer
$1 = {cap_id = {v = 2782052343}, seq = {v = 4}, mseq = {v = 0}, mds = {v = 1}, flags = 2 '\002'}
Updated by Xiaoxi Chen over 4 years ago
@Zheng,
any update or anything we can help?
seems like we can change from
if (ceph_seq_cmp(seq, cap.seq) <= 0) {
to
if (ceph_seq_cmp(seq, cap.seq) < 0) {
as ceph_seq_cmp(seq, cap.seq) ==0 indicating exactly the import_cap message.
Updated by Xiaoxi Chen over 4 years ago
One more victim on 13.2.5
-4> 2019-09-28 09:01:50.156 7fe6799e3700 1 -- 10.206.140.28:0/1309002991 <== mds.0 10.218.2.224:6800/2026541741 2 ==== client_caps(import ino 0x1000598f6bb 1913515435 seq 1 caps=pAsLsXsFs dirty=- wanted=- follows 0 mseq 1 size 0/0 ts 1/18446744073709551615 mtime 2019-06-05 17:21:06.348736 xattrs(v=1 l=4)) v11 ==== 321+4+0 (3926379498 0 0) 0x564a110b9680 con 0x564a111d7e00 -3> 2019-09-28 09:01:50.156 7fe6799e3700 5 client.197381312 handle_cap_import ino 0x1000598f6bb mseq 1 IMPORT from mds.0 -2> 2019-09-28 09:01:50.156 7fe67c9e9700 5 -- 10.206.140.28:0/1309002991 >> 10.156.69.192:6789/0 conn(0x564a11373000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=16068008 cs=1 l=1). rx mon.3 seq 143 0x564a113da500 osd_map(4050054..4050054 src has 4050054..4072219) v3 -1> 2019-09-28 09:01:50.156 7fe67c9e9700 5 -- 10.206.140.28:0/1309002991 >> 10.156.69.192:6789/0 conn(0x564a11373000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=16068008 cs=1 l=1). rx mon.3 seq 144 0x564a115a5b80 osd_map(4050055..4050059 src has 4050054..4072219) v3 0> 2019-09-28 09:01:50.156 7fe6799e3700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.5/rpm/el7/BUILD/ceph-13.2.5/src/client/Client.cc: In function 'void Client::add_update_cap(Inode*, MetaSession*, uint64_t, unsigned int, unsigned int, unsigned int, inodeno_t, int, const UserPerm&)' thread 7fe6799e3700 time 2019-09-28 09:01:50.158860 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.5/rpm/el7/BUILD/ceph-13.2.5/src/client/Client.cc: 3936: FAILED assert(&cap == in->auth_cap) ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xff) [0x7fe685611fbf] 2: (()+0x26d187) [0x7fe685612187] 3: (Client::add_update_cap(Inode*, MetaSession*, unsigned long, unsigned int, unsigned int, unsigned int, inodeno_t, int, UserPerm const&)+0x945) [0x564a101ccfd5] 4: (Client::handle_cap_import(MetaSession*, Inode*, MClientCaps*)+0x26a) [0x564a101ce89a] 5: (Client::handle_caps(MClientCaps*)+0x5d1) [0x564a102037f1] 6: (Client::ms_dispatch(Message*)+0x41b) [0x564a1020c0cb] 7: (DispatchQueue::entry()+0xb7a) [0x7fe6856ce39a] 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x7fe68576c2cd] 9: (()+0x7dd5) [0x7fe683648dd5] 10: (clone()+0x6d) [0x7fe682521ead] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Zheng Yan over 4 years ago
Sorry for the delay
I understand what happened. (two mds exported non-auth caps )
PR#30402 is a good workaround.
Updated by Patrick Donnelly over 4 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to nautilus,mimic
Updated by Nathan Cutler over 4 years ago
- Copied to Backport #42631: nautilus: client: FAILED assert(cap == in->auth_cap) added
Updated by Nathan Cutler over 4 years ago
- Copied to Backport #42632: mimic: client: FAILED assert(cap == in->auth_cap) added
Updated by Nathan Cutler almost 4 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".