Bug #56698
closedclient: FAILED ceph_assert(_size == 0)
50%
Description
2022-07-22T20:36:19.033 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi066.stderr:/build/ceph-17.0.0-13732-g89768db3/src/include/xlist.h: In function 'xlist<T>::~xlist() [with T = Inode*]' thread 7f91367fc700 time 2022-07-22T20:36:19.037781+0000 2022-07-22T20:36:19.033 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi066.stderr:/build/ceph-17.0.0-13732-g89768db3/src/include/xlist.h: 81: FAILED ceph_assert(_size == 0) 2022-07-22T20:36:19.036 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi066.stderr: ceph version 17.0.0-13732-g89768db3 (89768db311950607682ea2bb29f56edc324f86ac) quincy (dev) 2022-07-22T20:36:19.037 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi066.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14f) [0x7f9152d6fa4c] 2022-07-22T20:36:19.037 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi066.stderr: 2: /usr/lib/ceph/libceph-common.so.2(+0x2b9c5e) [0x7f9152d6fc5e] 2022-07-22T20:36:19.037 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi066.stderr: 3: (std::_Sp_counted_ptr<MetaSession*, (__gnu_cxx::_Lock_policy)2>::_M_dispose()+0x1cd) [0x560c9acb1fad] 2022-07-22T20:36:19.037 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi066.stderr: 4: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()+0x48) [0x560c9acaa978] 2022-07-22T20:36:19.037 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi066.stderr: 5: (Client::handle_client_session(boost::intrusive_ptr<MClientSession const> const&)+0xf8) [0x560c9ac92bf8] 2022-07-22T20:36:19.038 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi066.stderr: 6: (Client::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x513) [0x560c9ac998b3] 2022-07-22T20:36:19.038 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi066.stderr: 7: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x460) [0x7f9153010ec0] 2022-07-22T20:36:19.038 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi066.stderr: 8: (DispatchQueue::entry()+0x58f) [0x7f915300e75f] 2022-07-22T20:36:19.038 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi066.stderr: 9: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f91530ddac1] 2022-07-22T20:36:19.039 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi066.stderr: 10: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f9152a9b609] 2022-07-22T20:36:19.039 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi066.stderr: 11: clone()
From: /ceph/teuthology-archive/pdonnell-2022-07-22_19:42:58-fs-wip-pdonnell-testing-20220721.235756-distro-default-smithi/6945819/teuthology.log
Updated by Venky Shankar over 1 year ago
- Status changed from New to Triaged
- Assignee set to Rishabh Dave
Updated by Xiubo Li 10 months ago
-7> 2023-06-19T21:20:50.862+0000 7f66b37fe700 10 client.4746 remove_cap mds.3 on 0x10000002ec1.head(faked_ino=0 nref=5 ll_ref=401 cap_refs={} open={} mode=40775 size=0/0 nlink=1 btime=2023-06-19T21:15:40.143541+0000 mtime=2023-06-19T21:20:48.215040+0000 ctime=2023-06-19T21:20:48.215040+0000 change_attr=151 caps=pAsLsXsFsx(0=pAsLsXsFsx,3=pAsLsXs) parents=0x10000002c44.head["common"] 0x7f66ac40b700) -6> 2023-06-19T21:20:50.862+0000 7f66b37fe700 20 client.4746 put_inode on 0x10000002ec1.head(faked_ino=0 nref=5 ll_ref=401 cap_refs={} open={} mode=40775 size=0/0 nlink=1 btime=2023-06-19T21:15:40.143541+0000 mtime=2023-06-19T21:20:48.215040+0000 ctime=2023-06-19T21:20:48.215040+0000 change_attr=151 caps=pAsLsXsFsx(0=pAsLsXsFsx) parents=0x10000002c44.head["common"] 0x7f66ac40b700) n = 1 -5> 2023-06-19T21:20:50.862+0000 7f66b37fe700 10 client.4746 remove_cap mds.3 on 0x10000002c44.head(faked_ino=0 nref=143 ll_ref=4888 cap_refs={} open={} mode=40775 size=0/0 nlink=1 btime=2023-06-19T21:15:37.179416+0000 mtime=2023-06-19T21:20:49.829995+0000 ctime=2023-06-19T21:20:49.829995+0000 change_attr=160 caps=pAsLsXsFs(0=pAsLsXsFs,1=pAsLsXs,2=pAsLsXsFs,3=pAsLsXs) parents=0x1000000157b.head["src"] 0x7f66ac2f0900) -4> 2023-06-19T21:20:50.862+0000 7f66b37fe700 20 client.4746 put_inode on 0x10000002c44.head(faked_ino=0 nref=143 ll_ref=4888 cap_refs={} open={} mode=40775 size=0/0 nlink=1 btime=2023-06-19T21:15:37.179416+0000 mtime=2023-06-19T21:20:49.829995+0000 ctime=2023-06-19T21:20:49.829995+0000 change_attr=160 caps=pAsLsXsFs(0=pAsLsXsFs,1=pAsLsXs,2=pAsLsXsFs) parents=0x1000000157b.head["src"] 0x7f66ac2f0900) n = 1 -3> 2023-06-19T21:20:50.862+0000 7f66b37fe700 10 client.4746 kick_requests_closed for mds.3 -2> 2023-06-19T21:20:50.862+0000 7f66cc79c700 1 -- 192.168.0.1:0/1579759531 reap_dead start -1> 2023-06-19T21:20:50.863+0000 7f66b37fe700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.0.0-4483-g2a42270d/rpm/el8/BUILD/ceph-18.0.0-4483-g2a42270d/src/include/xlist.h: In function 'xlist<T>::~xlist() [with T = Inode*]' thread 7f66b37fe700 time 2023-06-19T21:20:50.862434+0000 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.0.0-4483-g2a42270d/rpm/el8/BUILD/ceph-18.0.0-4483-g2a42270d/src/include/xlist.h: 81: FAILED ceph_assert(_size == 0) ceph version 18.0.0-4483-g2a42270d (2a42270dc66b4270a8c395e9316834a4a74c094e) reef (dev) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x135) [0x7f66d8ea9dbf] 2: /usr/lib64/ceph/libceph-common.so.2(+0x2a9f85) [0x7f66d8ea9f85] 3: (MetaSession::~MetaSession()+0x1eb) [0x562df37ad37b] 4: (std::_Sp_counted_ptr<MetaSession*, (__gnu_cxx::_Lock_policy)2>::_M_dispose()+0x16) [0x562df37ad3f6] 5: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()+0x3a) [0x562df378feba] 6: (Client::handle_client_session(boost::intrusive_ptr<MClientSession const> const&)+0xf8) [0x562df3776ba8] 7: (Client::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x555) [0x562df3779a25] 8: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x478) [0x7f66d9122808] 9: (DispatchQueue::entry()+0x50f) [0x7f66d911f9af] 10: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f66d91e5ca1] 11: /lib64/libpthread.so.0(+0x81cf) [0x7f66d7a081cf] 12: clone()
Updated by Rishabh Dave 10 months ago
- Assignee changed from Rishabh Dave to Xiubo Li
Xiubo, spent half day working on a failure caused by this issue. I've not spent any time with this ticket recently and Xiubo wants to continue working on this ticket, so I am reassigning this ticket to him.
Updated by Xiubo Li 10 months ago
I have gone through all the xlist in the MetaSession:
52 xlist<Cap*> caps; 53 // dirty_list keeps all the dirty inodes before flushing in current session. 54 xlist<Inode*> dirty_list; 55 xlist<Inode*> flushing_caps; 56 xlist<MetaRequest*> requests; 57 xlist<MetaRequest*> unsafe_requests;
All are well cleaned up in Client::_closed_mds_session(), and I am not sure what I have missed. Just raised one subtask,which is https://tracker.ceph.com/issues/61913, to make it crash more gracefully. With this we can exactly know which list is not cleaned.
Updated by Venky Shankar 9 months ago
Xiubo, do we have the core for this crash. If you have the debug env, then figuring out which xlist member in MetaSession was not empty should be straightforward.
Updated by Xiubo Li 9 months ago
Venky Shankar wrote:
Xiubo, do we have the core for this crash. If you have the debug env, then figuring out which xlist member in MetaSession was not empty should be straightforward.
I think I have found the root cause, which is when the auth cap is changed it missed to move the inode to the new session.
Updated by Venky Shankar 9 months ago
- Related to Bug #56003: client: src/include/xlist.h: 81: FAILED ceph_assert(_size == 0) added
Updated by Venky Shankar 8 months ago
- Status changed from Fix Under Review to Pending Backport
- Backport changed from quincy,pacific to reef,quincy,pacific
Updated by Backport Bot 8 months ago
- Copied to Backport #62519: quincy: client: FAILED ceph_assert(_size == 0) added
Updated by Backport Bot 8 months ago
- Copied to Backport #62520: pacific: client: FAILED ceph_assert(_size == 0) added
Updated by Backport Bot 8 months ago
- Copied to Backport #62521: reef: client: FAILED ceph_assert(_size == 0) added
Updated by Konstantin Shalygin 4 months ago
- Status changed from Pending Backport to Resolved