https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2017-03-01T15:36:26ZCeph CephFS - Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodeshttps://tracker.ceph.com/issues/19118?journal_id=869372017-03-01T15:36:26ZXiaoxi Chenxiaoxchen@ebay.com
<ul><li><strong>Priority</strong> changed from <i>Normal</i> to <i>Urgent</i></li></ul> CephFS - Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodeshttps://tracker.ceph.com/issues/19118?journal_id=869392017-03-01T15:52:59ZJohn Sprayjcspray@gmail.com
<ul><li><strong>Subject</strong> changed from <i>NDS flipping during failover, when working with large amount of caps/inodes</i> to <i>MDS heartbeat , when working with large amount of caps/inodes</i></li></ul><p>May be related to <a class="external" href="http://tracker.ceph.com/issues/18730">http://tracker.ceph.com/issues/18730</a>, although perhaps not since that one shouldn't be causing the heartbeat mechanism to trip.</p>
<p>The timeout the MDS is hitting here is the mds_beacon_grace setting. This is used on both the mon and the MDS. In this case, you would need to increase it on the MDS side to give the MDS a chance to make it through rejoin.</p>
<p>Before doing that, please could you attach a debugger to the MDS while it is in the stuck state (i.e. while it is giving "heartbeat map not healthy") and grab a backtrace of all threads. It's possible this is happening during an O(caps) loop somewhere.</p> CephFS - Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodeshttps://tracker.ceph.com/issues/19118?journal_id=869402017-03-01T15:53:20ZJohn Sprayjcspray@gmail.com
<ul><li><strong>Subject</strong> changed from <i>MDS heartbeat , when working with large amount of caps/inodes</i> to <i>MDS heartbeat timeout during rejoin, when working with large amount of caps/inodes</i></li></ul> CephFS - Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodeshttps://tracker.ceph.com/issues/19118?journal_id=869672017-03-01T23:16:43ZXiaoxi Chenxiaoxchen@ebay.com
<ul></ul><p>I have that already. I did set the beacon_grace to 600s to walk around the bug and bring the cluster back.</p>
<p>In first case(stuck on active phase), mds is busying on MDCache::export_remaining_imported_caps(), this function spends 3 mins and it looks like an O(caps*clients).</p>
<p>The second case(stuck on rejoin), it seems working on the backtrace(1) and replay the journal(2).</p>
<p>[1]<br />2017-03-01 07:54:04.231242 7f4e2188a700 10 MDSIOContextBase::complete: 32C_IO_MDC_OpenInoBacktraceFetched<br />2017-03-01 07:54:04.231244 7f4e2188a700 10 mds.0.cache _open_ino_backtrace_fetched ino 10000171d4d errno 0<br />2017-03-01 07:54:04.231248 7f4e2188a700 10 mds.0.cache old object in pool 2, retrying pool 11<br />2017-03-01 07:54:04.231389 7f4e2188a700 10 MDSIOContextBase::complete: 32C_IO_MDC_OpenInoBacktraceFetched<br />2017-03-01 07:54:04.231400 7f4e2188a700 10 mds.0.cache _open_ino_backtrace_fetched ino 10000171d50 errno 0<br />2017-03-01 07:54:04.231412 7f4e2188a700 10 mds.0.cache old object in pool 2, retrying pool 11</p>
<p>[2]<br />2017-03-01 07:56:53.179938 7f4e2188a700 12 mds.0.cache.dir(100000fe965) add_primary_dentry [dentry #1/sharefs_prod/8c285b3b59a843b6aab623314288ee36/ssl/ssl1/ca/ca/signed/slc5b01c-3712.stratus.slc.ebay.com.pem [2,head] auth (dversion lock) pv=0 v=259855 inode=0x7f4ebcb10e70 0x7f4eb60379d0]<br />2017-03-01 07:56:53.180206 7f4e2188a700 12 mds.0.cache.dir(100000fe965) _fetched got [dentry #1/sharefs_prod/8c285b3b59a843b6aab623314288ee36/ssl/ssl1/ca/ca/signed/slc5b01c-3712.stratus.slc.ebay.com.pem [2,head] auth (dversion lock) pv=0 v=259855 inode=0x7f4ebcb10e70 0x7f4eb60379d0] [inode 1000010bdfc [2,head] /sharefs_prod/8c285b3b59a843b6aab623314288ee36/ssl/ssl1/ca/ca/signed/slc5b01c-3712.stratus.slc.ebay.com.pem auth v221548 s=1992 n(v0 b1992 1=1+0) (iversion lock) 0x7f4ebcb10e70]<br />2017-03-01 07:56:53.180444 7f4e2188a700 20 mds.0.cache.dir(100000fe965) _fetched pos 54009 marker 'I' dname 'slc5b01c-3701.stratus.slc.ebay.com.pem [2,head]<br />2017-03-01 07:56:53.180447 7f4e2188a700 20 mds.0.cache.dir(100000fe965) lookup (head, 'slc5b01c-3701.stratus.slc.ebay.com.pem')<br />2017-03-01 07:56:53.180466 7f4e2188a700 20 mds.0.cache.dir(100000fe965) miss -> (slc5b02c-7hqm.stratus.slc.ebay.com.pem,head)<br />2017-03-01 07:56:53.180479 7f4e2188a700 12 mds.0.cache.dir(100000fe965) add_primary_dentry [dentry #1/sharefs_prod/8c285b3b59a843b6aab623314288ee36/ssl/ssl1/ca/ca/signed/slc5b01c-3701.stratus.slc.ebay.com.pem [2,head] auth (dversion lock) pv=0 v=259855 inode=0x7f4ebcb11440 0x7f4eb6037b60]</p> CephFS - Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodeshttps://tracker.ceph.com/issues/19118?journal_id=869692017-03-01T23:30:05ZXiaoxi Chenxiaoxchen@ebay.com
<ul></ul><p>I have that already. I did set the beacon_grace to 600s to walk around the bug and bring the cluster back.</p>
<p>Seems related with <a class="external" href="http://tracker.ceph.com/issues/18730">http://tracker.ceph.com/issues/18730</a>, truly.</p>
<p>In first case(stuck on active phase), mds is busying on MDCache::export_remaining_imported_caps(), this function spends 3 mins and it looks like an O(caps*clients). [1]</p>
<p>The second case(stuck on rejoin), it seems working on the backtrace<sup><a href="#fn2">2</a></sup> and replay the journal<sup><a href="#fn3">3</a></sup>.</p>
<p>[1]<br />2017-02-28 20:32:47.286796 7f658894b700 1 mds.0.263 reconnect_done<br />2017-02-28 20:32:48.260760 7f658894b700 1 mds.0.263 handle_mds_map i am now mds.0.263<br />2017-02-28 20:32:48.260882 7f658894b700 1 mds.0.263 handle_mds_map state change up:reconnect --> up:rejoin<br />2017-02-28 20:32:48.260957 7f658894b700 1 mds.0.263 rejoin_start<br />2017-02-28 20:33:13.823818 7f658894b700 1 mds.0.263 rejoin_joint_start<br />2017-02-28 20:33:34.744161 7f6584943700 1 mds.0.263 rejoin_done<br />2017-02-28 20:33:35.970117 7f658894b700 1 mds.0.263 handle_mds_map i am now mds.0.263<br />2017-02-28 20:33:35.970127 7f658894b700 1 mds.0.263 handle_mds_map state change up:rejoin --> up:active<br />2017-02-28 20:33:35.970178 7f658894b700 1 mds.0.263 recovery_done -- successful recovery!<br /><strong>2017-02-28 20:33:35.971970 7f658894b700 1 mds.0.263 active_start<br />2017-02-28 20:36:27.436389 7f658894b700 0 log_channel(cluster) log [WRN] : failed to reconnect caps for missing inodes:</strong><br />2017-02-28 20:36:27.436411 7f658894b700 0 log_channel(cluster) log [WRN] : ino 100000665cf<br />2017-02-28 20:36:27.436413 7f658894b700 0 log_channel(cluster) log [WRN] : ino 100000665d0<br />2017-02-28 20:36:27.436417 7f658894b700 0 log_channel(cluster) log [WRN] : ino 100000665d1</p>
<p>[2]<br />2017-03-01 07:54:04.231242 7f4e2188a700 10 MDSIOContextBase::complete: 32C_IO_MDC_OpenInoBacktraceFetched<br />2017-03-01 07:54:04.231244 7f4e2188a700 10 mds.0.cache _open_ino_backtrace_fetched ino 10000171d4d errno 0<br />2017-03-01 07:54:04.231248 7f4e2188a700 10 mds.0.cache old object in pool 2, retrying pool 11<br />2017-03-01 07:54:04.231389 7f4e2188a700 10 MDSIOContextBase::complete: 32C_IO_MDC_OpenInoBacktraceFetched<br />2017-03-01 07:54:04.231400 7f4e2188a700 10 mds.0.cache _open_ino_backtrace_fetched ino 10000171d50 errno 0<br />2017-03-01 07:54:04.231412 7f4e2188a700 10 mds.0.cache old object in pool 2, retrying pool 11</p>
<p>[3]<br />2017-03-01 07:56:53.179938 7f4e2188a700 12 mds.0.cache.dir(100000fe965) add_primary_dentry [dentry #1/sharefs_prod/8c285b3b59a843b6aab623314288ee36/ssl/ssl1/ca/ca/signed/slc5b01c-3712.stratus.slc.ebay.com.pem [2,head] auth (dversion lock) pv=0 v=259855 inode=0x7f4ebcb10e70 0x7f4eb60379d0]<br />2017-03-01 07:56:53.180206 7f4e2188a700 12 mds.0.cache.dir(100000fe965) _fetched got [dentry #1/sharefs_prod/8c285b3b59a843b6aab623314288ee36/ssl/ssl1/ca/ca/signed/slc5b01c-3712.stratus.slc.ebay.com.pem [2,head] auth (dversion lock) pv=0 v=259855 inode=0x7f4ebcb10e70 0x7f4eb60379d0] [inode 1000010bdfc [2,head] /sharefs_prod/8c285b3b59a843b6aab623314288ee36/ssl/ssl1/ca/ca/signed/slc5b01c-3712.stratus.slc.ebay.com.pem auth v221548 s=1992 n(v0 b1992 1=1+0) (iversion lock) 0x7f4ebcb10e70]<br />2017-03-01 07:56:53.180444 7f4e2188a700 20 mds.0.cache.dir(100000fe965) _fetched pos 54009 marker 'I' dname 'slc5b01c-3701.stratus.slc.ebay.com.pem [2,head]<br />2017-03-01 07:56:53.180447 7f4e2188a700 20 mds.0.cache.dir(100000fe965) lookup (head, 'slc5b01c-3701.stratus.slc.ebay.com.pem')<br />2017-03-01 07:56:53.180466 7f4e2188a700 20 mds.0.cache.dir(100000fe965) miss -> (slc5b02c-7hqm.stratus.slc.ebay.com.pem,head)<br />2017-03-01 07:56:53.180479 7f4e2188a700 12 mds.0.cache.dir(100000fe965) add_primary_dentry [dentry #1/sharefs_prod/8c285b3b59a843b6aab623314288ee36/ssl/ssl1/ca/ca/signed/slc5b01c-3701.stratus.slc.ebay.com.pem [2,head] auth (dversion lock) pv=0 v=259855 inode=0x7f4ebcb11440 0x7f4eb6037b60]</p> CephFS - Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodeshttps://tracker.ceph.com/issues/19118?journal_id=872172017-03-06T12:00:04ZJohn Sprayjcspray@gmail.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Fix Under Review</i></li></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/13807">https://github.com/ceph/ceph/pull/13807</a></p>
<p>I've added a periodic heartbeat reset in the long loop in export_remaining_imported_caps.</p>
<p>The 18730 issue should not cause the mds_lock to be held for too long (it's dropped in between IOs) but it's possible that such intense IO is simply causing the tick() to never get the lock, as I don't think we do any fairness logic on locking. I've added a heartbeat_reset to the main dispatch function so that we reset it even if we are so busy dispatching messages that we hold up tick().</p>
<p>The replay case could in principle cause a heartbeat timeout in a similar way, but it seems kind of unlikely. MDLog::_replay_thread does unlock mds_lock in between replaying events, so tick() should be getting an opportunity to run.</p>
<p>If you could try out that patch I would be very interested in the results. (warning: I just wrote it and haven't even checked it compiles)</p> CephFS - Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodeshttps://tracker.ceph.com/issues/19118?journal_id=872212017-03-06T12:54:24ZZheng Yanukernel@gmail.com
<ul></ul><p>John Spray wrote:</p>
<blockquote>
<p><a class="external" href="https://github.com/ceph/ceph/pull/13807">https://github.com/ceph/ceph/pull/13807</a></p>
<p>I've added a periodic heartbeat reset in the long loop in export_remaining_imported_caps.</p>
<p>The 18730 issue should not cause the mds_lock to be held for too long (it's dropped in between IOs) but it's possible that such intense IO is simply causing the tick() to never get the lock, as I don't think we do any fairness logic on locking. I've added a heartbeat_reset to the main dispatch function so that we reset it even if we are so busy dispatching messages that we hold up tick().</p>
</blockquote>
<p>I don't understand how intense IO can cause timer thread to starve. There are only few threads (probably one) that execute IO's finish contexts. No matter how frequently they acquire the mds_lock, it's unlike the timer thread gets starved.</p>
<blockquote>
<p>The replay case could in principle cause a heartbeat timeout in a similar way, but it seems kind of unlikely. MDLog::_replay_thread does unlock mds_lock in between replaying events, so tick() should be getting an opportunity to run.</p>
<p>If you could try out that patch I would be very interested in the results. (warning: I just wrote it and haven't even checked it compiles)</p>
</blockquote> CephFS - Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodeshttps://tracker.ceph.com/issues/19118?journal_id=875922017-03-15T21:16:27ZJohn Sprayjcspray@gmail.com
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Pending Backport</i></li><li><strong>Backport</strong> set to <i>jewel, kraken</i></li></ul> CephFS - Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodeshttps://tracker.ceph.com/issues/19118?journal_id=877722017-03-21T13:01:15ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/19334">Backport #19334</a>: jewel: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodes</i> added</li></ul> CephFS - Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodeshttps://tracker.ceph.com/issues/19118?journal_id=877742017-03-21T13:01:17ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/19335">Backport #19335</a>: kraken: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodes</i> added</li></ul> CephFS - Bug #19118: MDS heartbeat timeout during rejoin, when working with large amount of caps/inodeshttps://tracker.ceph.com/issues/19118?journal_id=983152017-09-05T15:38:09ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>Resolved</i></li></ul>