https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2017-09-11T17:45:48Z
Ceph
CephFS - Backport #21357: luminous: mds: segfault during `rm -rf` of large directory
https://tracker.ceph.com/issues/21357?journal_id=98814
2017-09-11T17:45:48Z
Patrick Donnelly
pdonnell@redhat.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/98814/diff?detail_id=95814">diff</a>)</li></ul>
CephFS - Backport #21357: luminous: mds: segfault during `rm -rf` of large directory
https://tracker.ceph.com/issues/21357?journal_id=98815
2017-09-11T17:46:48Z
Patrick Donnelly
pdonnell@redhat.com
<ul><li><strong>Assignee</strong> set to <i>Zheng Yan</i></li></ul><p>Zheng, please take a look.</p>
CephFS - Backport #21357: luminous: mds: segfault during `rm -rf` of large directory
https://tracker.ceph.com/issues/21357?journal_id=98952
2017-09-12T19:34:57Z
Andras Pataki
apataki@simonsfoundation.org
<ul><li><strong>File</strong> <a href="/attachments/download/2984/cephtest-mds.xooby04.log.gz">cephtest-mds.xooby04.log.gz</a> added</li></ul><p>I'm running into the same problem on luminous 12.2.0 - while removing a directory with lots of files, the MDS crashes, and even after repeated restarts it does not recover. I did not turn directory fragmentation on. I'm attaching the MDS log of the crash (cephtest-mds.xooby04.log.gz).</p>
<p>Also - any procedure to recover would be helpful.</p>
CephFS - Backport #21357: luminous: mds: segfault during `rm -rf` of large directory
https://tracker.ceph.com/issues/21357?journal_id=98995
2017-09-13T02:05:30Z
Zheng Yan
ukernel@gmail.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>12</i></li></ul><p>It's dup of <a class="external" href="http://tracker.ceph.com/issues/21070">http://tracker.ceph.com/issues/21070</a>. fix has already been fixed in master branch</p>
CephFS - Backport #21357: luminous: mds: segfault during `rm -rf` of large directory
https://tracker.ceph.com/issues/21357?journal_id=98996
2017-09-13T02:13:39Z
Zheng Yan
ukernel@gmail.com
<ul><li><strong>Status</strong> changed from <i>12</i> to <i>Fix Under Review</i></li></ul><p>PR for luminous <a class="external" href="https://github.com/ceph/ceph/pull/17686">https://github.com/ceph/ceph/pull/17686</a></p>
CephFS - Backport #21357: luminous: mds: segfault during `rm -rf` of large directory
https://tracker.ceph.com/issues/21357?journal_id=99004
2017-09-13T03:08:43Z
Nathan Cutler
ncutler@suse.cz
<ul><li><strong>Tracker</strong> changed from <i>Bug</i> to <i>Backport</i></li><li><strong>Description</strong> updated (<a title="View differences" href="/journals/99004/diff?detail_id=96041">diff</a>)</li></ul><a name="description"></a>
<h3 >description<a href="#description" class="wiki-anchor">¶</a></h3>
<pre>
-15> 2017-09-07 13:27:37.898704 7f79be11a700 4 mds.1.purge_queue push: pushing inode 0x0x1000001d350
-14> 2017-09-07 13:27:37.898713 7f79be11a700 4 mds.1.purge_queue push: pushing inode 0x0x1000001e354
-13> 2017-09-07 13:27:37.898730 7f79be11a700 4 mds.1.purge_queue push: pushing inode 0x0x20000003d35
-12> 2017-09-07 13:27:37.898740 7f79be11a700 4 mds.1.purge_queue push: pushing inode 0x0x20000003cbe
-11> 2017-09-07 13:27:37.898750 7f79be11a700 4 mds.1.purge_queue push: pushing inode 0x0x20000003d0d
-10> 2017-09-07 13:27:37.898760 7f79be11a700 4 mds.1.purge_queue push: pushing inode 0x0x20000003cd8
-9> 2017-09-07 13:27:37.898782 7f79be11a700 4 mds.1.purge_queue push: pushing inode 0x0x10000023fd1
-8> 2017-09-07 13:27:37.898791 7f79be11a700 4 mds.1.purge_queue push: pushing inode 0x0x10000023fe6
-7> 2017-09-07 13:27:37.898850 7f79be11a700 1 -- 10.8.128.105:6800/338052715 --> 10.8.128.36:6808/21995 -- osd_op(unknown.0.38:97 2.6 2:6f2c300e:::60f.00000000:head [omap-get-header,omap
-get-vals,getxattr parent] snapc 0=[] ondisk+read+known_if_redirected+full_force e42) v8 -- 0x1c934d3040 con 0
-6> 2017-09-07 13:27:37.952466 7f79c58c6700 5 -- 10.8.128.105:6800/338052715 >> 10.8.128.36:6808/21995 conn(0x1c91f75000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=189 cs=1 l=
1). rx osd.8 seq 29 0x1c91f14680 osd_op_reply(83 501.00000001 [read 458397~3735907 [fadvise_dontneed]] v0'0 uv1651 ondisk = 0) v8
-5> 2017-09-07 13:27:37.952506 7f79c58c6700 1 -- 10.8.128.105:6800/338052715 <== osd.8 10.8.128.36:6808/21995 29 ==== osd_op_reply(83 501.00000001 [read 458397~3735907 [fadvise_dontneed]
] v0'0 uv1651 ondisk = 0) v8 ==== 156+0+3735907 (1828211159 0 2860789470) 0x1c91f14680 con 0x1c91f75000
-4> 2017-09-07 13:27:37.952625 7f79c58c6700 5 -- 10.8.128.105:6800/338052715 >> 10.8.128.36:6808/21995 conn(0x1c91f75000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=189 cs=1 l=
1). rx osd.8 seq 30 0x1c91f14680 osd_op_reply(84 1000000c0da.02c00000 [omap-get-header,omap-get-vals] v0'0 uv1757 ondisk = 0) v8
-3> 2017-09-07 13:27:37.952646 7f79c58c6700 1 -- 10.8.128.105:6800/338052715 <== osd.8 10.8.128.36:6808/21995 30 ==== osd_op_reply(84 1000000c0da.02c00000 [omap-get-header,omap-get-vals]
v0'0 uv1757 ondisk = 0) v8 ==== 206+0+24480 (2245149694 0 2706660154) 0x1c91f14680 con 0x1c91f75000
-2> 2017-09-07 13:27:37.952724 7f79bd919700 1 -- 10.8.128.105:6800/338052715 --> 10.8.128.103:6800/20641 -- osd_op(unknown.0.38:98 1.2 1:55af870c:::1000000d735.00000000:head [delete] sna
pc 1=[] ondisk+write+known_if_redirected+full_force e42) v8 -- 0x1c934c0700 con 0
-1> 2017-09-07 13:27:37.953525 7f79c58c6700 2 -- 10.8.128.105:6800/338052715 >> 10.8.128.103:6800/20641 conn(0x1c91fb5000 :-1 s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=1)._process_con
nection got newly_acked_seq 0 vs out_seq 0
0> 2017-09-07 13:27:37.955163 7f79be11a700 -1 *** Caught signal (Segmentation fault) **
in thread 7f79be11a700 thread_name:mds_rank_progr
</pre>
<p>and backtrace:</p>
<pre>
ceph version 12.2.0-2redhat1xenial (b661348f156f148d764b998b65b90451f096cb27) luminous (rc)
1: (()+0x59b344) [0x1c86d06344]
2: (()+0x11390) [0x7f79c8420390]
3: (Server::handle_client_readdir(boost::intrusive_ptr<MDRequestImpl>&)+0xc05) [0x1c86a65615]
4: (Server::dispatch_client_request(boost::intrusive_ptr<MDRequestImpl>&)+0x9f1) [0x1c86a89341]
5: (MDSInternalContextBase::complete(int)+0x18b) [0x1c86c894bb]
6: (MDSRank::_advance_queues()+0x6a7) [0x1c86a0e957]
7: (MDSRank::ProgressThread::entry()+0x4a) [0x1c86a0ef7a]
8: (()+0x76ba) [0x7f79c84166ba]
9: (clone()+0x6d) [0x7f79c74823dd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
</pre>
<p>Steps to Reproduce:<br />1. Created a cluster with 3 mds<br />2. Initially 1 mds was active and 2 were standby<br />3. moved another mds to be active<br />4. so effectively 2 mds with 1 standby<br />5. created directories and files on fuse mount<br />6. one of the directory had 100k files of 0 size<br />7. I was just probing the dirfrag ls from admin socket<br />8. Initiated rm -rf on the directory<br />9. Was continuously probing "objecter_requests" from admin socket of one mds<br />10. The other mds was crashed meanwhile.<br />11. Though standby has taken over client I/Os are hung</p>
<p>BZ: <a class="external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1489461">https://bugzilla.redhat.com/show_bug.cgi?id=1489461</a></p>
CephFS - Backport #21357: luminous: mds: segfault during `rm -rf` of large directory
https://tracker.ceph.com/issues/21357?journal_id=99006
2017-09-13T03:09:00Z
Nathan Cutler
ncutler@suse.cz
<ul><li><strong>Copied from</strong> <i><a class="issue tracker-1 status-3 priority-6 priority-high2 closed" href="/issues/21070">Bug #21070</a>: MDS: MDS is laggy or crashed When deleting a large number of files</i> added</li></ul>
CephFS - Backport #21357: luminous: mds: segfault during `rm -rf` of large directory
https://tracker.ceph.com/issues/21357?journal_id=99062
2017-09-14T03:25:12Z
Patrick Donnelly
pdonnell@redhat.com
<ul><li><strong>Subject</strong> changed from <i>mds: segfault during `rm -rf` of large directory</i> to <i>luminous: mds: segfault during `rm -rf` of large directory</i></li></ul>
CephFS - Backport #21357: luminous: mds: segfault during `rm -rf` of large directory
https://tracker.ceph.com/issues/21357?journal_id=99072
2017-09-14T04:32:48Z
Patrick Donnelly
pdonnell@redhat.com
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>In Progress</i></li></ul>
CephFS - Backport #21357: luminous: mds: segfault during `rm -rf` of large directory
https://tracker.ceph.com/issues/21357?journal_id=99209
2017-09-15T20:45:02Z
Patrick Donnelly
pdonnell@redhat.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li><li><strong>Target version</strong> set to <i>v12.2.1</i></li></ul>