https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2017-08-07T18:55:44ZCeph Linux kernel client - Bug #20927: libceph does not give up reconnecting on broken networkhttps://tracker.ceph.com/issues/20927?journal_id=966102017-08-07T18:55:44ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Project</strong> changed from <i>RADOS</i> to <i>Linux kernel client</i></li><li><strong>Subject</strong> changed from <i>libceph socket closed (con state CONNECTING)</i> to <i>libceph does not give up reconnecting on broken network</i></li><li><strong>Category</strong> deleted (<del><i>Peering</i></del>)</li></ul><p>I assume you're referring to the kernel cephfs client?</p>
<p>There's not a lot else the client is allowed to do under the POSIX filesystem rules, but maybe somebody will come up with a way to poke at it. (And I'm not sure what happens if you force unmount it — does that not work?)</p> Linux kernel client - Bug #20927: libceph does not give up reconnecting on broken networkhttps://tracker.ceph.com/issues/20927?journal_id=966252017-08-08T03:45:01Zhyun ha
<ul></ul><p>Greg Farnum wrote:</p>
<blockquote>
<p>I assume you're referring to the kernel cephfs client?</p>
<p>There's not a lot else the client is allowed to do under the POSIX filesystem rules, but maybe somebody will come up with a way to poke at it. (And I'm not sure what happens if you force unmount it — does that not work?)</p>
</blockquote>
<p>Hi, Thank you for comment.</p>
<p>This case is for using rbd block device.<br />I can show you more details like below.</p>
##################################
<ol>
<li>Check client and ceph cluster
##################################</li>
</ol>
<p>client><br />[root@test-cephclient001]/root::test # uname -r<br />3.10.0-514.26.2.el7.x86_64</p>
<p>[root@test-cephclient001]/root::test # rpm -qa | grep ceph<br />python-cephfs-10.2.3-0.el7.x86_64<br />libcephfs1-10.2.3-0.el7.x86_64<br />ceph-common-10.2.3-0.el7.x86_64</p>
<p>[root@test-cephclient001]/root: # lsmod | egrep "rbd|ceph" <br />rbd 83938 0<br />libceph 282630 1 rbd<br />dns_resolver 13140 1 libceph<br />libcrc32c 12644 2 xfs,libceph</p>
<p>ceph cluster><br />[<a class="email" href="mailto:root@ctestceph01.pasta">root@ctestceph01.pasta</a> ~]# rpm -qa | grep ceph<br />python-cephfs-10.2.3-0.el7.x86_64<br />ceph-osd-10.2.3-0.el7.x86_64<br />ceph-selinux-10.2.3-0.el7.x86_64<br />ceph-mds-10.2.3-0.el7.x86_64<br />libcephfs1-10.2.3-0.el7.x86_64<br />ceph-base-10.2.3-0.el7.x86_64<br />ceph-10.2.3-0.el7.x86_64<br />ceph-common-10.2.3-0.el7.x86_64<br />ceph-mon-10.2.3-0.el7.x86_64</p>
####################
<ol>
<li>Reproduce steps:
####################</li>
</ol>
client><br />1. create rbd image
<ol>
<li>rbd create test-cephclient001-nvol01 -p volumes --image-format 2 --size 300G</li>
</ol>
2. map rbd image
<ol>
<li>rbd map volumes/test-cephclient001-nvol01</li>
</ol>
3. make filesystem
<ol>
<li>mkfs.xfs /dev/rbd/volumes/test-cephclient001-nvol01</li>
</ol>
4. mount rbd block device
<ol>
<li>mount /dev/rbd/volumes/test-cephclient001-nvol01 /mnt</li>
</ol>
5. write test
<ol>
<li>while true;do date >> /mnt/date.txt; sleep 1; done</li>
</ol>
6. check netstat
<ol>
<li>netstat -antup | egrep "6789|68" <br />tcp 0 0 10.113.248.133:46976 10.113.129.153:6808 ESTABLISHED -<br />tcp 0 0 10.113.248.133:55958 10.113.129.153:6789 ESTABLISHED -<br />tcp 0 9 10.113.248.133:40492 10.113.129.151:6800 ESTABLISHED -<br />tcp 0 0 10.113.248.133:35088 10.113.129.151:6808 ESTABLISHED -</li>
</ol>
7. set iptables to drop port to ceph for simulating network damages
<ol>
<li>iptables -A OUTPUT -p tcp --dport 6789 -j DROP</li>
<li>iptables -A OUTPUT -p tcp --dport 6800 -j DROP</li>
<li>iptables -A OUTPUT -p tcp --dport 6808 -j DROP</li>
</ol>
8. check netstat after few seconds
<ol>
<li>netstat -antup | egrep "6789|68" <br />tcp 0 2233 10.113.248.133:46976 10.113.129.153:6808 ESTABLISHED -<br />tcp 0 1 10.113.248.133:41646 10.113.129.152:6789 SYN_SENT -<br />tcp 0 75450 10.113.248.133:40492 10.113.129.151:6800 ESTABLISHED -</li>
</ol>
9. try to umount
<ol>
<li>umount -f /mnt<br />umount: /mnt: target is busy.<br /> (In some cases useful info about processes that use<br /> the device is found by lsof(8) or fuser(1))</li>
</ol>
10. try to rbd unmap
<ol>
<li>rbd unmap /dev/rbd0<br />rbd: sysfs write failed<br />rbd: unmap failed: (16) Device or resource busy</li>
</ol>
11. try to quit write file
<ol>
<li>while true;do date >> /mnt/date.txt; sleep 1; done<br />^C^C^C^C<br />^C^C^C^C^Cc<br />^C^C^C^C^C^C^C^C^C^C^C^C^C^C</li>
</ol>
<p>--> hang</p>
<p>12. check messages log<br />Aug 8 12:21:10 test-cephclient001.ncl kernel: [68500.735033] libceph: mon2 10.113.129.153:6789 session lost, hunting for new mon<br />Aug 8 12:21:12 test-cephclient001.ncl kernel: [68503.184434] rbd: rbd0: encountered watch error: -107<br />Aug 8 12:21:23 test-cephclient001.ncl kernel: [68513.735041] libceph: mon1 10.113.129.152:6789 socket closed (con state CONNECTING)<br />Aug 8 12:21:35 test-cephclient001.ncl kernel: [68525.735051] libceph: mon2 10.113.129.153:6789 socket closed (con state CONNECTING)<br />Aug 8 12:21:43 test-cephclient001.ncl kernel: [68533.719027] libceph: mon2 10.113.129.153:6789 socket closed (con state CONNECTING)<br />Aug 8 12:21:51 test-cephclient001.ncl kernel: [68541.719041] libceph: mon2 10.113.129.153:6789 socket closed (con state CONNECTING)<br />Aug 8 12:21:59 test-cephclient001.ncl kernel: [68549.767040] libceph: mon1 10.113.129.152:6789 socket closed (con state CONNECTING)<br />Aug 8 12:22:07 test-cephclient001.ncl kernel: [68557.719041] libceph: mon1 10.113.129.152:6789 socket closed (con state CONNECTING)<br />Aug 8 12:22:12 test-cephclient001.ncl kernel: [68563.184026] rbd: rbd0: failed to unwatch: -110<br />Aug 8 12:22:15 test-cephclient001.ncl kernel: [68565.719042] libceph: mon1 10.113.129.152:6789 socket closed (con state CONNECTING)<br />Aug 8 12:22:29 test-cephclient001.ncl kernel: [68579.783087] libceph: mon0 10.113.129.151:6789 socket closed (con state CONNECTING)<br />Aug 8 12:22:37 test-cephclient001.ncl kernel: [68587.719031] libceph: mon0 10.113.129.151:6789 socket closed (con state CONNECTING)<br />Aug 8 12:22:45 test-cephclient001.ncl kernel: [68595.719042] libceph: mon0 10.113.129.151:6789 socket closed (con state CONNECTING)<br />Aug 8 12:22:59 test-cephclient001.ncl kernel: [68609.735030] libceph: mon1 10.113.129.152:6789 socket closed (con state CONNECTING)<br />Aug 8 12:23:07 test-cephclient001.ncl kernel: [68617.719035] libceph: mon1 10.113.129.152:6789 socket closed (con state CONNECTING)<br />Aug 8 12:23:15 test-cephclient001.ncl kernel: [68625.719027] libceph: mon1 10.113.129.152:6789 socket closed (con state CONNECTING)<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68639.815029] libceph: mon2 10.113.129.153:6789 socket closed (con state CONNECTING)<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134045] INFO: task xfsaild/rbd0:53566 blocked for more than 120 seconds.<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134064] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134076] xfsaild/rbd0 D ffff8800781c4000 0 53566 2 0x00000000<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134080] ffff88007978bd58 0000000000000046 ffff8800794d6dd0 ffff88007978bfd8<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134083] ffff88007978bfd8 ffff88007978bfd8 ffff8800794d6dd0 ffff880079d66400<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134085] 0000000000000000 ffff8800794d6dd0 ffff880079740528 ffff8800781c4000<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134088] Call Trace:<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134098] [<ffffffff8168c7f9>] schedule+0x29/0x70<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134137] [<ffffffffa01ef3ed>] _xfs_log_force+0x1bd/0x2b0 [xfs]<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134142] [<ffffffff810c54e0>] ? wake_up_state+0x20/0x20<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134157] [<ffffffffa01ef506>] xfs_log_force+0x26/0x80 [xfs]<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134175] [<ffffffffa01fadd0>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134199] [<ffffffffa01faf2a>] xfsaild+0x15a/0x660 [xfs]<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134213] [<ffffffffa01fadd0>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134217] [<ffffffff810b0a4f>] kthread+0xcf/0xe0<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134220] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134223] [<ffffffff81697758>] ret_from_fork+0x58/0x90<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134226] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134230] INFO: task date:54397 blocked for more than 120 seconds.<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134240] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134251] date D ffffffff8168a850 0 54397 54047 0x00000000<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134259] ffff88007869ba00 0000000000000082 ffff880036bf4e70 ffff88007869bfd8<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134261] ffff88007869bfd8 ffff88007869bfd8 ffff880036bf4e70 ffff88007ca16c40<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134263] 0000000000000000 7fffffffffffffff ffff88007f7ac8e8 ffffffff8168a850<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134265] Call Trace:<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134269] [<ffffffff8168a850>] ? bit_wait+0x50/0x50<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134271] [<ffffffff8168c7f9>] schedule+0x29/0x70<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134274] [<ffffffff8168a239>] schedule_timeout+0x239/0x2c0<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134279] [<ffffffff811f1357>] ? __mem_cgroup_commit_charge+0xe7/0x370<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134287] [<ffffffff8101fd10>] ? xen_clocksource_get_cycles+0x20/0x30<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134293] [<ffffffff810eb0dc>] ? ktime_get_ts64+0x4c/0xf0<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134296] [<ffffffff8168a850>] ? bit_wait+0x50/0x50<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134299] [<ffffffff8168bd9e>] io_schedule_timeout+0xae/0x130<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134301] [<ffffffff8168be38>] io_schedule+0x18/0x20<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134304] [<ffffffff8168a861>] bit_wait_io+0x11/0x50<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134306] [<ffffffff8168a385>] __wait_on_bit+0x65/0x90<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134312] [<ffffffff81180221>] wait_on_page_bit+0x81/0xa0<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134315] [<ffffffff810b1be0>] ? wake_bit_function+0x40/0x40<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134319] [<ffffffff8118c17d>] wait_for_stable_page+0x3d/0x40<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134321] [<ffffffff811819fb>] grab_cache_page_write_begin+0x9b/0xd0<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134335] [<ffffffffa01c41b4>] xfs_vm_write_begin+0x34/0xe0 [xfs]<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134338] [<ffffffff81181bbe>] generic_file_buffered_write+0x11e/0x2a0<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134352] [<ffffffffa01d1875>] ? xfs_file_aio_write_checks+0x185/0x1f0 [xfs]<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134364] [<ffffffffa01d1cdb>] xfs_file_buffered_aio_write+0x10b/0x260 [xfs]<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134376] [<ffffffffa01d1fbd>] xfs_file_aio_write+0x18d/0x1a0 [xfs]<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134379] [<ffffffff811fe18d>] do_sync_write+0x8d/0xd0<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134381] [<ffffffff811fe9fd>] vfs_write+0xbd/0x1e0<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134383] [<ffffffff811ff51f>] SyS_write+0x7f/0xe0<br />Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134385] [<ffffffff81697809>] system_call_fastpath+0x16/0x1b<br />Aug 8 12:23:37 test-cephclient001.ncl kernel: [68647.719027] libceph: mon2 10.113.129.153:6789 socket closed (con state CONNECTING)<br />...</p>
<p>In this situation, we cannot umount "/mnt", rbd unmap and kill the process that write file.<br />The big problem is that in this situation cpu load average is getting increased gradually and the process goes into D state.</p>
<ol>
<li>top<br />load average: 0.13 -> 3.00</li>
</ol>
<ol>
<li>ps aux | grep date<br />root 54397 0.0 0.0 107928 668 pts/2 D+ 12:20 0:00 date<br />--> " kill -9 54397 " doesn't work.</li>
</ol>
<p>When ceph cluster is damaged or network is disconnected, ceph client goes hang status.<br />So, I think this is big issue because in client side there is nothing can do. Only rebooting helps.</p>
<p>Thank you.</p> Linux kernel client - Bug #20927: libceph does not give up reconnecting on broken networkhttps://tracker.ceph.com/issues/20927?journal_id=969522017-08-14T14:50:05ZIlya Dryomov
<ul><li><strong>Category</strong> set to <i>rbd</i></li></ul><p>Yes, currently reboot is the only option. The load average increases because your test is throwing data at the file system which has no way to write out dirty pages.<br />The plan is to complement an existing rbd unmap (soft-)force option with a full-force option -- see discussion at <a class="external" href="https://github.com/rook/rook/issues/376">https://github.com/rook/rook/issues/376</a>.</p> Linux kernel client - Bug #20927: libceph does not give up reconnecting on broken networkhttps://tracker.ceph.com/issues/20927?journal_id=969962017-08-16T02:59:40Zhyun ha
<ul></ul><p>Ilya Dryomov wrote:</p>
<blockquote>
<p>Yes, currently reboot is the only option. The load average increases because your test is throwing data at the file system which has no way to write out dirty pages.<br />The plan is to complement an existing rbd unmap (soft-)force option with a full-force option -- see discussion at <a class="external" href="https://github.com/rook/rook/issues/376">https://github.com/rook/rook/issues/376</a>.</p>
</blockquote>
<p>Thank you very much for the response :)<br />just to be clear, rbd unmap with a full-force option will works when ceph cluster is gone away and ceph client can't talk to ceph cluster?</p> Linux kernel client - Bug #20927: libceph does not give up reconnecting on broken networkhttps://tracker.ceph.com/issues/20927?journal_id=982582017-09-05T10:29:01ZIlya Dryomov
<ul></ul><blockquote>
<p>just to be clear, rbd unmap with a full-force option will works when ceph cluster is gone away and ceph client can't talk to ceph cluster?</p>
</blockquote>
<p>Yes, that's the idea.</p> Linux kernel client - Bug #20927: libceph does not give up reconnecting on broken networkhttps://tracker.ceph.com/issues/20927?journal_id=1024082017-11-20T03:43:43Zfrank moo
<ul></ul><p>solved this problem ?</p>
<p>In the production environment, if net break,<br />manually execute commands (rbd unmap -o force /dev/rbd1)each time ?</p> Linux kernel client - Bug #20927: libceph does not give up reconnecting on broken networkhttps://tracker.ceph.com/issues/20927?journal_id=1026802017-11-23T10:03:23ZIlya Dryomov
<ul></ul><p>The existing "-o force" wouldn't help. The plan is to implement "-o full-force", hopefully for kernel 4.16.</p>
<p>OTOH, what do you mean by "net break"? If it's transient, then no action is required -- libceph should be able to reconnect and continue on. If it's permanent, then you are screwed anyway because dirty state (including data pages) will be thrown out, whether with "-o full-force" or reboot.</p> Linux kernel client - Bug #20927: libceph does not give up reconnecting on broken networkhttps://tracker.ceph.com/issues/20927?journal_id=1321912019-03-15T19:47:36ZIlya Dryomov
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-5 priority-5 priority-high3 closed" href="/issues/13189">Bug #13189</a>: Reboot systerm can not set up.</i> added</li></ul> Linux kernel client - Bug #20927: libceph does not give up reconnecting on broken networkhttps://tracker.ceph.com/issues/20927?journal_id=1321942019-03-15T19:50:00ZIlya Dryomov
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-8 priority-4 priority-default closed" href="/issues/15887">Bug #15887</a>: client reboot stuck if the ceph node is not reachable or shutdown</i> added</li></ul>