Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2022-02-28T09:45:34ZCeph
Redmine CephFS - Bug #54421 (Fix Under Review): mds: assert fail in Server::_dir_is_nonempty() because xl...https://tracker.ceph.com/issues/544212022-02-28T09:45:34ZIvan Guanyunfei.guan@xtaotech.com
<p>ENV: Jewel ceph-10.2.2</p>
<p>Description:<br />Server::_dir_is_nonempty() always expects inode has the xlocker, but sometimes, this assumption is not always true<br />even though the in::filelock::sate is LOCK_XLOCK_DONE.</p>
<p>2022-02-21 15:11:40.639247 7f1d3ef95700 -1 mds/Server.cc: In function 'bool Server::_dir_is_nonempty(MDRequestRef&, CInode*)' thread 7f1d3ef95700 time 2022-02-21 15:11:40.043876<br />mds/Server.cc: 6245: FAILED assert(in->filelock.can_read(mdr->get_client()))</p>
<pre><code>ceph version 10.2.2-Summit3.0-beta2-127-g051c2f2 (051c2f2bd027a4c29bdf3f21116dbfb1c718b3db)<br /> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f1d453e9215]<br /> 2: (Server::_dir_is_nonempty(std::shared_ptr&lt;MDRequestImpl&gt;&, CInode*)+0xcb) [0x7f1d4501906b]<br /> 3: (Server::handle_client_unlink(std::shared_ptr&lt;MDRequestImpl&gt;&)+0xe84) [0x7f1d45044b34]<br /> 4: (Server::dispatch_client_request(std::shared_ptr&lt;MDRequestImpl&gt;&)+0xe9b) [0x7f1d45064d2b]<br /> 5: (MDCache::dispatch_request(std::shared_ptr&lt;MDRequestImpl&gt;&)+0x4c) [0x7f1d450e81dc]<br /> 6: (MDSInternalContextBase::complete(int)+0x1eb) [0x7f1d4523947b]<br /> 7: (void finish_contexts&lt;MDSInternalContextBase&gt;(CephContext*, std::list&lt;MDSInternalContextBase*, std::allocator&lt;MDSInternalContextBase*&gt; >&, int)+0xac) [0x7f1d44ffe4cc]<br /> 8: (Locker::eval(CInode*, int, bool)+0x128) [0x7f1d45158f38]<br /> 9: (Locker::handle_client_caps(MClientCaps*)+0xd95) [0x7f1d4516e6e5]<br /> 10: (MDSRank::handle_deferrable_message(Message*)+0xc34) [0x7f1d44fe0624]<br /> 11: (MDSRank::_dispatch(Message*, bool)+0x205) [0x7f1d44fea9f5]<br /> 12: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x7f1d44feb8d5]<br /> 13: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x7f1d44fd1313]<br /> 14: (DispatchQueue::entry()+0x7ba) [0x7f1d454eee1a]<br /> 15: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f1d453ce02d]<br /> 16: (()+0x7dc5) [0x7f1d441abdc5]<br /> 17: (clone()+0x6d) [0x7f1d42c7828d]<br /> NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.</code></pre>
<p>e.g. the following case:<br /><img src="https://tracker.ceph.com/attachments/download/5904/rmdir.png" alt="" /></p>
<p>t1: client send setattr op for dir1 and filelock is LOCK_EXCL<br />t2: xlock_start and hold the xlock<br />t3: mds transfer filelock state to LOCK_XLOCK_DONE from LOCK_XLOCK and do early_reply<br />t4: client rmdir dir1 <br />t5: mds rdlock_start filelock and got the rdlock but can’t acquire the xlock of linklock, so wait here<br />t6. write op of setattr journal come back and triggered safe_reply of setattr. <strong>Notice: the filelock is XLOCK_DONE still but it’s xlocker is -1.</strong> <br />t7: mds retry rmdir dir1, the core of mds happened because filelock didn’t have xlocker.</p> CephFS - Feature #21888 (Fix Under Review): Adding [--repair] option for cephfs-journal-tool make...https://tracker.ceph.com/issues/218882017-10-22T06:15:49ZIvan Guanyunfei.guan@xtaotech.com
<p>As described in the document if a journal is damaged or for any reason an MDS is incapable of replaying it, attempt to recover what file metadata we can like so:<br />"cephfs-journal-tool event recover_dentries summary". I found a risk when research related code rencently which may lost journal log if it's damaged.The worse <br />case is if the length of journal is corrupt we may lost many journal.</p>
<p>Sample Graph<br /><img src="https://tracker.ceph.com/attachments/download/3061/journal_scan.1.png" alt="" /><br /><img src="https://tracker.ceph.com/attachments/download/3062/journal_scan.2.png" alt="" /></p>
<p>Of course if a journal data is corrupt we can throw it away, but if only the length or trailling is corrupt we should repair it as much as possible, in case of losting event.</p>