https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2020-08-11T02:10:25Z
Ceph
CephFS - Bug #46882: client: mount abort hangs: [volumes INFO mgr_util] aborting connection from cephfs 'cephfs'
https://tracker.ceph.com/issues/46882?journal_id=172556
2020-08-11T02:10:25Z
Xiubo Li
xiubli@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul>
CephFS - Bug #46882: client: mount abort hangs: [volumes INFO mgr_util] aborting connection from cephfs 'cephfs'
https://tracker.ceph.com/issues/46882?journal_id=172558
2020-08-11T02:15:46Z
Xiubo Li
xiubli@redhat.com
<ul></ul><p>Checked it, it seems the write ops didn't finished and kept going while the _umount() just stopped the tick(), then the client got blacklisted after 300s:</p>
<p>Such as we can see the write ops is slow, for some files even after around 300s:</p>
<pre>
2020-08-10T15:22:28.081+0000 7f54882511c0 2 client.4593 unmounting
...
3358 30000000236.00000005 [write 225280~4096] v24'15524 uv15524 ondisk = 0) v8 ==== 164+0+0 (crc 0 0 0) 0x7f5464014780 con 0x7f5444008dd0
2020-08-10T15:26:33.535+0000 7f5478fe8700 1 -- 172.21.15.175:0/1466508553 <== osd.6 v2:172.21.15.116:6826/35233 30701 ==== osd_op_reply(68696 30000000219.00000012 [write 1245184~4096] v24'16219 uv16219 ondisk = 0) v8 ==== 164+0+0 (crc 0 0 0) 0x7f5464014780 con 0x7f5444008dd0
2020-08-10T15:26:33.535+0000 7f5478fe8700 1 -- 172.21.15.175:0/1466508553 <== osd.6 v2:172.21.15.116:6826/35233 30702 ==== osd_op_reply(54634 3000000020c.0000000f [write 3837952~4096] v24'14784 uv14784 ondisk = 0) v8 ==== 164+0+0 (crc 0 0 0) 0x7f5464014780 con 0x7f5444008dd0
2020-08-10T15:26:33.593+0000 7f5478fe8700 1 -- 172.21.15.175:0/1466508553 <== osd.6 v2:172.21.15.116:6826/35233 30703 ==== osd_op_reply(57125 30000000214.00000009 [write 487424~4096] v24'15290 uv15290 ondisk = 0) v8 ==== 164+0+0 (crc 0 0 0) 0x7f5464014780 con 0x7f5444008dd0
</pre>
CephFS - Bug #46882: client: mount abort hangs: [volumes INFO mgr_util] aborting connection from cephfs 'cephfs'
https://tracker.ceph.com/issues/46882?journal_id=172559
2020-08-11T02:35:00Z
Xiubo Li
xiubli@redhat.com
<ul></ul><p>Additional info:</p>
<p>From the logs we can see that the inode ref was increased by:</p>
<pre><code class="c syntaxhl"><span class="CodeRay">_flush(in, new C_Client_FlushComplete(this, in));
</span></code></pre>
<p>But never be decreased before the client was blacklisted:</p>
<pre>
2020-08-10T15:22:28.081+0000 7f54882511c0 15 inode.get on 0x7f545c062670 0x10000000352.head now 4
2020-08-10T15:22:28.081+0000 7f54882511c0 20 client.4593 _release 0x10000000352.head(faked_ino=0 ref=4 ll_ref=0 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=0,3=0} mode=100700 size=104857600/213909504 nlink=0 btime=0.000000 mtime=2020-08-10T15:16:36.423961+0000 ctime=2020-08-10T15:22:08.870067+0000 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) objectset[0x10000000352 ts 0/0 objects 25 dirty_or_tx 1347584] 0x7f545c062670)
2020-08-10T15:22:28.081+0000 7f54882511c0 15 inode.get on 0x7f545c062670 0x10000000352.head now 5
2020-08-10T15:22:28.081+0000 7f54882511c0 10 client.4593 _flush 0x10000000352.head(faked_ino=0 ref=5 ll_ref=0 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=0,3=0} mode=100700 size=104857600/213909504 nlink=0 btime=0.000000 mtime=2020-08-10T15:16:36.423961+0000 ctime=2020-08-10T15:22:08.870067+0000 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) objectset[0x10000000352 ts 0/0 objects 25 dirty_or_tx 1347584] 0x7f545c062670)
...
</pre>
<p>The following decrease was for `anchor.emplace_back(in);`</p>
<pre>
096=0,8192=1} open={2=0,3=0} mode=100700 size=104857600/213909504 nlink=0 btime=0.000000 mtime=2020-08-10T15:16:36.423961+0000 ctime=2020-08-10T15:22:08.870067+0000 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) objectset[0x10000000352 ts 0/0 objects 25 dirty_or_tx 1347584] 0x7f545c062670)
2020-08-10T15:22:28.087+0000 7f54882511c0 15 inode.put on 0x7f545c062670 0x10000000352.head now 4
</pre>
<pre>
...
2020-08-10T15:26:08.112+0000 7f54882511c0 1 client.4593 dump_cache
2020-08-10T15:26:08.112+0000 7f54882511c0 1 client.4593 dump_inode: DISCONNECTED inode 0x10000000352 #0x10000000352 ref 40x10000000352.head(faked_ino=0 ref=4 ll_ref=0 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=0,3=0} mode=100700 size=104857600/213909504 nlink=0 btime=0.000000 mtime=2020-08-10T15:16:36.423961+0000 ctime=2020-08-10T15:22:08.870067+0000 caps=-(0=pAsLsXsFsxcrwb) objectset[0x10000000352 ts 0/0 objects 25 dirty_or_tx 24576] 0x7f545c062670)
</pre>
CephFS - Bug #46882: client: mount abort hangs: [volumes INFO mgr_util] aborting connection from cephfs 'cephfs'
https://tracker.ceph.com/issues/46882?journal_id=172563
2020-08-11T06:03:25Z
Xiubo Li
xiubli@redhat.com
<ul></ul><p>The above was from /a/pdonnell-2020-08-08_02\:21\:01-multimds-wip-pdonnell-testing-20200808.001303-distro-basic-smithi/5319314 instead.</p>
<p>This tracker you are referencing is a little different.</p>
CephFS - Bug #46882: client: mount abort hangs: [volumes INFO mgr_util] aborting connection from cephfs 'cephfs'
https://tracker.ceph.com/issues/46882?journal_id=172632
2020-08-12T08:56:16Z
Xiubo Li
xiubli@redhat.com
<ul><li><strong>Pull request ID</strong> set to <i>36580</i></li></ul>
CephFS - Bug #46882: client: mount abort hangs: [volumes INFO mgr_util] aborting connection from cephfs 'cephfs'
https://tracker.ceph.com/issues/46882?journal_id=172633
2020-08-12T08:56:30Z
Xiubo Li
xiubli@redhat.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Fix Under Review</i></li></ul>
CephFS - Bug #46882: client: mount abort hangs: [volumes INFO mgr_util] aborting connection from cephfs 'cephfs'
https://tracker.ceph.com/issues/46882?journal_id=172634
2020-08-12T09:01:40Z
Xiubo Li
xiubli@redhat.com
<ul></ul><p>In case:</p>
<pre>
func()
{
InodeRef in;
std::scoped_lock cl{client_lock};
}
</pre>
<p>The InodeRef will call the put_inode() out of the client_lock scope, which will be buggy. Then the inode->_put() maybe racy in mutiple thread contexts.</p>
CephFS - Bug #46882: client: mount abort hangs: [volumes INFO mgr_util] aborting connection from cephfs 'cephfs'
https://tracker.ceph.com/issues/46882?journal_id=173329
2020-08-20T09:09:14Z
Ramana Raja
rraja@redhat.com
<ul></ul><p>Shyam spotted this issue in a recent mgr/volumes testing,<br /><a class="external" href="https://github.com/ceph/ceph/pull/35756#issuecomment-676670500">https://github.com/ceph/ceph/pull/35756#issuecomment-676670500</a></p>
CephFS - Bug #46882: client: mount abort hangs: [volumes INFO mgr_util] aborting connection from cephfs 'cephfs'
https://tracker.ceph.com/issues/46882?journal_id=173445
2020-08-21T23:10:18Z
Patrick Donnelly
pdonnell@redhat.com
<ul></ul><p>Another: /ceph/teuthology-archive/pdonnell-2020-08-21_07:42:41-fs-wip-pdonnell-testing-20200821.043335-distro-basic-smithi/5364893/remote/smithi082/log/ceph-client.libcephfs_readonly_test.403460584.111789.log.gz</p>
CephFS - Bug #46882: client: mount abort hangs: [volumes INFO mgr_util] aborting connection from cephfs 'cephfs'
https://tracker.ceph.com/issues/46882?journal_id=173759
2020-08-26T12:02:18Z
Venky Shankar
vshankar@redhat.com
<ul><li><strong>Duplicated by</strong> <i><a class="issue tracker-1 status-10 priority-5 priority-high3 closed" href="/issues/47140">Bug #47140</a>: mgr/volumes: unresponsive Client::abort_conn() when cleaning stale libcephfs handle</i> added</li></ul>
CephFS - Bug #46882: client: mount abort hangs: [volumes INFO mgr_util] aborting connection from cephfs 'cephfs'
https://tracker.ceph.com/issues/46882?journal_id=174444
2020-09-04T17:43:25Z
Patrick Donnelly
pdonnell@redhat.com
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Resolved</i></li><li><strong>Backport</strong> deleted (<del><i>octopus,nautilus</i></del>)</li></ul><p>I don't think this issue exists in Octopus or Nautilus? I think this is fallout from Xiubo's work on breaking the client_lock. So I'm removing the backports for now.</p>