https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2017-11-03T08:33:56ZCeph CephFS - Bug #22008: Processes stuck waiting for write with ceph-fusehttps://tracker.ceph.com/issues/22008?journal_id=1017902017-11-03T08:33:56ZZheng Yanukernel@gmail.com
<ul></ul><p>no idea how did it happen. please use admin socket to dump ceph-fuse's cache (ceph daemon client.xxx dump_cache)</p> CephFS - Bug #22008: Processes stuck waiting for write with ceph-fusehttps://tracker.ceph.com/issues/22008?journal_id=1018672017-11-03T19:38:40ZAndras Patakiapataki@simonsfoundation.org
<ul><li><strong>File</strong> <a href="/attachments/download/3075/ceph-fuse-cache.gz">ceph-fuse-cache.gz</a> added</li></ul><p>Attached is the ceph-fuse cache dump. This is a different instance of the problem (all the same symptoms), so the process in this case is accessing the file /mnt/ceph/users/landerson/lyalphaVaried4/PART_000/1/GroupID/000029</p>
<p>The MDS cache looks like this for this file:<br />[root@cephmon01 tmp]# grep users/landerson/lyalphaVaried4/PART_000/1/GroupID/000029 mds-cache-dump.txt <br />[inode 10007b104c7 [2,head] /users/landerson/lyalphaVaried4/PART_000/1/GroupID/000029 auth v920 s=798916 n(v0 b798916 1=1+0) (ifile xsyn) (iversion lock) cr={5545419=0-4194304@1} caps={5538376=pAsLsXsFc/-@8,5545419=pAsLsXs/pAsxXsxFsxcrwb@6},l=5538376(5545419) | ptrwaiter=0 request=0 lock=0 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=0 0x5596d5984d70]<br /> [dentry #1/users/landerson/lyalphaVaried4/PART_000/1/GroupID/000029 [2,head] auth (dversion lock) v=920 inode=0x5596d5984d70 | request=0 lock=0 inodepin=1 dirty=0 authpin=0 clientlease=0 0x5596c04e6120]</p>
<p>Also, I've updated the client to the latest release 12.2.1, and the problem is reproduced there as well.</p>
<p>Investigating the code that produces this issue - multiple processes from different nodes write to the same file concurrently, but to different parts of it.</p> CephFS - Bug #22008: Processes stuck waiting for write with ceph-fusehttps://tracker.ceph.com/issues/22008?journal_id=1019242017-11-06T09:45:26ZZheng Yanukernel@gmail.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul><p>The second one is actually different from the first one. Seems like the first one was caused by 'client session gets evicted by mds, then client reconnect". The issue should have been fixed in v12.2.1. (client re-requests caps after reconnect)</p>
<p>I'm working on fixing the second one.</p> CephFS - Bug #22008: Processes stuck waiting for write with ceph-fusehttps://tracker.ceph.com/issues/22008?journal_id=1019722017-11-07T08:48:08ZZheng Yanukernel@gmail.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Fix Under Review</i></li></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/18787">https://github.com/ceph/ceph/pull/18787</a></p> CephFS - Bug #22008: Processes stuck waiting for write with ceph-fusehttps://tracker.ceph.com/issues/22008?journal_id=1021682017-11-10T21:39:24ZAndras Patakiapataki@simonsfoundation.org
<ul></ul><p>I've applied this patch to the latest luminous branch, rebuild the MDS and tested it in a test environment with the code that causes the hangs. I'm happy to report that the code now runs well - could not reproduce any hangs! We'll keep testing. Thanks very much for the quick turnaround and hope that the patch can make its way into a release soon.</p> CephFS - Bug #22008: Processes stuck waiting for write with ceph-fusehttps://tracker.ceph.com/issues/22008?journal_id=1024992017-11-20T14:25:15ZPatrick Donnellypdonnell@redhat.com
<ul><li><strong>Assignee</strong> set to <i>Zheng Yan</i></li></ul> CephFS - Bug #22008: Processes stuck waiting for write with ceph-fusehttps://tracker.ceph.com/issues/22008?journal_id=1026542017-11-22T21:44:43ZPatrick Donnellypdonnell@redhat.com
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Pending Backport</i></li><li><strong>Backport</strong> set to <i>luminous,jewel</i></li></ul> CephFS - Bug #22008: Processes stuck waiting for write with ceph-fusehttps://tracker.ceph.com/issues/22008?journal_id=1027132017-11-24T21:57:01ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/22240">Backport #22240</a>: luminous: Processes stuck waiting for write with ceph-fuse</i> added</li></ul> CephFS - Bug #22008: Processes stuck waiting for write with ceph-fusehttps://tracker.ceph.com/issues/22008?journal_id=1027152017-11-24T21:57:03ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/22241">Backport #22241</a>: jewel: Processes stuck waiting for write with ceph-fuse</i> added</li></ul> CephFS - Bug #22008: Processes stuck waiting for write with ceph-fusehttps://tracker.ceph.com/issues/22008?journal_id=1097782018-03-28T20:44:43ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>Resolved</i></li></ul> CephFS - Bug #22008: Processes stuck waiting for write with ceph-fusehttps://tracker.ceph.com/issues/22008?journal_id=1177322018-07-25T09:31:26ZIvan Guanyunfei.guan@xtaotech.com
<ul></ul><p>Zheng Yan wrote:</p>
<blockquote>
<p>The second one is actually different from the first one. Seems like the first one was caused by 'client session gets evicted by mds, then client reconnect". The issue should have been fixed in v12.2.1. (client re-requests caps after reconnect)</p>
<p>I'm working on fixing the second one.</p>
</blockquote>
<p>Hi Zheng,</p>
<p>I may met the first bug and my ceph is in jewel 10.2.2. My client hang and log display "waiting for caps need Fw want Fb" or "waiting for caps need Fr want Fc". I know 10.2.11 have solved the problem bug i want to know which pr had solve the bug and i wander how it solved the bug?</p>