Project

General

Profile

Bug #20927

libceph does not give up reconnecting on broken network

Added by hyun ha over 6 years ago. Updated over 6 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
rbd
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

When ceph cluster's network or client's network is damaged client get hang.

reproduce step.
```
client side>
1. rbd map
2. mkfs.xfs
3. mount
4. iptable settings: drop port
- iptables -A OUTPUT -p tcp --dport 6789 -j DROP
- iptables -A OUTPUT -p tcp --dport 6800 -j DROP
- iptables -A OUTPUT -p tcp --dport 6808 -j DROP
```

results>
/var/log/messages
```
libceph: mon0 10.113.129.151:6789 socket closed (con state CONNECTING)
```

There is no timeout option, so libceph keep retrying endlessly.
In my opinion, when network is damaged re-connect to ceph cluster is necessary, but not endlessly.

In that case, we cannot unload libceph or unmount the directory.
The only thing we can do is reboot the client server.

Is there any option to force stop libceph or timeout to re-connect?
Thank you in advance.


Related issues

Related to Linux kernel client - Bug #13189: Reboot systerm can not set up. Closed 09/21/2015
Related to Linux kernel client - Bug #15887: client reboot stuck if the ceph node is not reachable or shutdown Won't Fix 05/13/2016

History

#1 Updated by Greg Farnum over 6 years ago

  • Project changed from RADOS to Linux kernel client
  • Subject changed from libceph socket closed (con state CONNECTING) to libceph does not give up reconnecting on broken network
  • Category deleted (Peering)

I assume you're referring to the kernel cephfs client?

There's not a lot else the client is allowed to do under the POSIX filesystem rules, but maybe somebody will come up with a way to poke at it. (And I'm not sure what happens if you force unmount it — does that not work?)

#2 Updated by hyun ha over 6 years ago

Greg Farnum wrote:

I assume you're referring to the kernel cephfs client?

There's not a lot else the client is allowed to do under the POSIX filesystem rules, but maybe somebody will come up with a way to poke at it. (And I'm not sure what happens if you force unmount it — does that not work?)

Hi, Thank you for comment.

This case is for using rbd block device.
I can show you more details like below.

##################################
  1. Check client and ceph cluster ##################################

client>
[root@test-cephclient001]/root::test # uname -r
3.10.0-514.26.2.el7.x86_64

[root@test-cephclient001]/root::test # rpm -qa | grep ceph
python-cephfs-10.2.3-0.el7.x86_64
libcephfs1-10.2.3-0.el7.x86_64
ceph-common-10.2.3-0.el7.x86_64

[root@test-cephclient001]/root: # lsmod | egrep "rbd|ceph"
rbd 83938 0
libceph 282630 1 rbd
dns_resolver 13140 1 libceph
libcrc32c 12644 2 xfs,libceph

ceph cluster>
[ ~]# rpm -qa | grep ceph
python-cephfs-10.2.3-0.el7.x86_64
ceph-osd-10.2.3-0.el7.x86_64
ceph-selinux-10.2.3-0.el7.x86_64
ceph-mds-10.2.3-0.el7.x86_64
libcephfs1-10.2.3-0.el7.x86_64
ceph-base-10.2.3-0.el7.x86_64
ceph-10.2.3-0.el7.x86_64
ceph-common-10.2.3-0.el7.x86_64
ceph-mon-10.2.3-0.el7.x86_64

####################
  1. Reproduce steps: ####################
client>
1. create rbd image
  1. rbd create test-cephclient001-nvol01 -p volumes --image-format 2 --size 300G
2. map rbd image
  1. rbd map volumes/test-cephclient001-nvol01
3. make filesystem
  1. mkfs.xfs /dev/rbd/volumes/test-cephclient001-nvol01
4. mount rbd block device
  1. mount /dev/rbd/volumes/test-cephclient001-nvol01 /mnt
5. write test
  1. while true;do date >> /mnt/date.txt; sleep 1; done
6. check netstat
  1. netstat -antup | egrep "6789|68"
    tcp 0 0 10.113.248.133:46976 10.113.129.153:6808 ESTABLISHED -
    tcp 0 0 10.113.248.133:55958 10.113.129.153:6789 ESTABLISHED -
    tcp 0 9 10.113.248.133:40492 10.113.129.151:6800 ESTABLISHED -
    tcp 0 0 10.113.248.133:35088 10.113.129.151:6808 ESTABLISHED -
7. set iptables to drop port to ceph for simulating network damages
  1. iptables -A OUTPUT -p tcp --dport 6789 -j DROP
  2. iptables -A OUTPUT -p tcp --dport 6800 -j DROP
  3. iptables -A OUTPUT -p tcp --dport 6808 -j DROP
8. check netstat after few seconds
  1. netstat -antup | egrep "6789|68"
    tcp 0 2233 10.113.248.133:46976 10.113.129.153:6808 ESTABLISHED -
    tcp 0 1 10.113.248.133:41646 10.113.129.152:6789 SYN_SENT -
    tcp 0 75450 10.113.248.133:40492 10.113.129.151:6800 ESTABLISHED -
9. try to umount
  1. umount -f /mnt
    umount: /mnt: target is busy.
    (In some cases useful info about processes that use
    the device is found by lsof(8) or fuser(1))
10. try to rbd unmap
  1. rbd unmap /dev/rbd0
    rbd: sysfs write failed
    rbd: unmap failed: (16) Device or resource busy
11. try to quit write file
  1. while true;do date >> /mnt/date.txt; sleep 1; done
    ^C^C^C^C
    ^C^C^C^C^Cc
    ^C^C^C^C^C^C^C^C^C^C^C^C^C^C

--> hang

12. check messages log
Aug 8 12:21:10 test-cephclient001.ncl kernel: [68500.735033] libceph: mon2 10.113.129.153:6789 session lost, hunting for new mon
Aug 8 12:21:12 test-cephclient001.ncl kernel: [68503.184434] rbd: rbd0: encountered watch error: -107
Aug 8 12:21:23 test-cephclient001.ncl kernel: [68513.735041] libceph: mon1 10.113.129.152:6789 socket closed (con state CONNECTING)
Aug 8 12:21:35 test-cephclient001.ncl kernel: [68525.735051] libceph: mon2 10.113.129.153:6789 socket closed (con state CONNECTING)
Aug 8 12:21:43 test-cephclient001.ncl kernel: [68533.719027] libceph: mon2 10.113.129.153:6789 socket closed (con state CONNECTING)
Aug 8 12:21:51 test-cephclient001.ncl kernel: [68541.719041] libceph: mon2 10.113.129.153:6789 socket closed (con state CONNECTING)
Aug 8 12:21:59 test-cephclient001.ncl kernel: [68549.767040] libceph: mon1 10.113.129.152:6789 socket closed (con state CONNECTING)
Aug 8 12:22:07 test-cephclient001.ncl kernel: [68557.719041] libceph: mon1 10.113.129.152:6789 socket closed (con state CONNECTING)
Aug 8 12:22:12 test-cephclient001.ncl kernel: [68563.184026] rbd: rbd0: failed to unwatch: -110
Aug 8 12:22:15 test-cephclient001.ncl kernel: [68565.719042] libceph: mon1 10.113.129.152:6789 socket closed (con state CONNECTING)
Aug 8 12:22:29 test-cephclient001.ncl kernel: [68579.783087] libceph: mon0 10.113.129.151:6789 socket closed (con state CONNECTING)
Aug 8 12:22:37 test-cephclient001.ncl kernel: [68587.719031] libceph: mon0 10.113.129.151:6789 socket closed (con state CONNECTING)
Aug 8 12:22:45 test-cephclient001.ncl kernel: [68595.719042] libceph: mon0 10.113.129.151:6789 socket closed (con state CONNECTING)
Aug 8 12:22:59 test-cephclient001.ncl kernel: [68609.735030] libceph: mon1 10.113.129.152:6789 socket closed (con state CONNECTING)
Aug 8 12:23:07 test-cephclient001.ncl kernel: [68617.719035] libceph: mon1 10.113.129.152:6789 socket closed (con state CONNECTING)
Aug 8 12:23:15 test-cephclient001.ncl kernel: [68625.719027] libceph: mon1 10.113.129.152:6789 socket closed (con state CONNECTING)
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68639.815029] libceph: mon2 10.113.129.153:6789 socket closed (con state CONNECTING)
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134045] INFO: task xfsaild/rbd0:53566 blocked for more than 120 seconds.
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134064] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134076] xfsaild/rbd0 D ffff8800781c4000 0 53566 2 0x00000000
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134080] ffff88007978bd58 0000000000000046 ffff8800794d6dd0 ffff88007978bfd8
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134083] ffff88007978bfd8 ffff88007978bfd8 ffff8800794d6dd0 ffff880079d66400
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134085] 0000000000000000 ffff8800794d6dd0 ffff880079740528 ffff8800781c4000
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134088] Call Trace:
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134098] [<ffffffff8168c7f9>] schedule+0x29/0x70
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134137] [<ffffffffa01ef3ed>] _xfs_log_force+0x1bd/0x2b0 [xfs]
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134142] [<ffffffff810c54e0>] ? wake_up_state+0x20/0x20
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134157] [<ffffffffa01ef506>] xfs_log_force+0x26/0x80 [xfs]
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134175] [<ffffffffa01fadd0>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134199] [<ffffffffa01faf2a>] xfsaild+0x15a/0x660 [xfs]
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134213] [<ffffffffa01fadd0>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134217] [<ffffffff810b0a4f>] kthread+0xcf/0xe0
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134220] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134223] [<ffffffff81697758>] ret_from_fork+0x58/0x90
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134226] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134230] INFO: task date:54397 blocked for more than 120 seconds.
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134240] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134251] date D ffffffff8168a850 0 54397 54047 0x00000000
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134259] ffff88007869ba00 0000000000000082 ffff880036bf4e70 ffff88007869bfd8
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134261] ffff88007869bfd8 ffff88007869bfd8 ffff880036bf4e70 ffff88007ca16c40
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134263] 0000000000000000 7fffffffffffffff ffff88007f7ac8e8 ffffffff8168a850
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134265] Call Trace:
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134269] [<ffffffff8168a850>] ? bit_wait+0x50/0x50
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134271] [<ffffffff8168c7f9>] schedule+0x29/0x70
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134274] [<ffffffff8168a239>] schedule_timeout+0x239/0x2c0
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134279] [<ffffffff811f1357>] ? __mem_cgroup_commit_charge+0xe7/0x370
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134287] [<ffffffff8101fd10>] ? xen_clocksource_get_cycles+0x20/0x30
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134293] [<ffffffff810eb0dc>] ? ktime_get_ts64+0x4c/0xf0
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134296] [<ffffffff8168a850>] ? bit_wait+0x50/0x50
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134299] [<ffffffff8168bd9e>] io_schedule_timeout+0xae/0x130
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134301] [<ffffffff8168be38>] io_schedule+0x18/0x20
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134304] [<ffffffff8168a861>] bit_wait_io+0x11/0x50
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134306] [<ffffffff8168a385>] __wait_on_bit+0x65/0x90
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134312] [<ffffffff81180221>] wait_on_page_bit+0x81/0xa0
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134315] [<ffffffff810b1be0>] ? wake_bit_function+0x40/0x40
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134319] [<ffffffff8118c17d>] wait_for_stable_page+0x3d/0x40
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134321] [<ffffffff811819fb>] grab_cache_page_write_begin+0x9b/0xd0
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134335] [<ffffffffa01c41b4>] xfs_vm_write_begin+0x34/0xe0 [xfs]
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134338] [<ffffffff81181bbe>] generic_file_buffered_write+0x11e/0x2a0
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134352] [<ffffffffa01d1875>] ? xfs_file_aio_write_checks+0x185/0x1f0 [xfs]
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134364] [<ffffffffa01d1cdb>] xfs_file_buffered_aio_write+0x10b/0x260 [xfs]
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134376] [<ffffffffa01d1fbd>] xfs_file_aio_write+0x18d/0x1a0 [xfs]
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134379] [<ffffffff811fe18d>] do_sync_write+0x8d/0xd0
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134381] [<ffffffff811fe9fd>] vfs_write+0xbd/0x1e0
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134383] [<ffffffff811ff51f>] SyS_write+0x7f/0xe0
Aug 8 12:23:29 test-cephclient001.ncl kernel: [68640.134385] [<ffffffff81697809>] system_call_fastpath+0x16/0x1b
Aug 8 12:23:37 test-cephclient001.ncl kernel: [68647.719027] libceph: mon2 10.113.129.153:6789 socket closed (con state CONNECTING)
...

In this situation, we cannot umount "/mnt", rbd unmap and kill the process that write file.
The big problem is that in this situation cpu load average is getting increased gradually and the process goes into D state.

  1. top
    load average: 0.13 -> 3.00
  1. ps aux | grep date
    root 54397 0.0 0.0 107928 668 pts/2 D+ 12:20 0:00 date
    --> " kill -9 54397 " doesn't work.

When ceph cluster is damaged or network is disconnected, ceph client goes hang status.
So, I think this is big issue because in client side there is nothing can do. Only rebooting helps.

Thank you.

#3 Updated by Ilya Dryomov over 6 years ago

  • Category set to rbd

Yes, currently reboot is the only option. The load average increases because your test is throwing data at the file system which has no way to write out dirty pages.
The plan is to complement an existing rbd unmap (soft-)force option with a full-force option -- see discussion at https://github.com/rook/rook/issues/376.

#4 Updated by hyun ha over 6 years ago

Ilya Dryomov wrote:

Yes, currently reboot is the only option. The load average increases because your test is throwing data at the file system which has no way to write out dirty pages.
The plan is to complement an existing rbd unmap (soft-)force option with a full-force option -- see discussion at https://github.com/rook/rook/issues/376.

Thank you very much for the response :)
just to be clear, rbd unmap with a full-force option will works when ceph cluster is gone away and ceph client can't talk to ceph cluster?

#5 Updated by Ilya Dryomov over 6 years ago

just to be clear, rbd unmap with a full-force option will works when ceph cluster is gone away and ceph client can't talk to ceph cluster?

Yes, that's the idea.

#6 Updated by frank moo over 6 years ago

solved this problem ?

In the production environment, if net break,
manually execute commands (rbd unmap -o force /dev/rbd1)each time ?

#7 Updated by Ilya Dryomov over 6 years ago

The existing "-o force" wouldn't help. The plan is to implement "-o full-force", hopefully for kernel 4.16.

OTOH, what do you mean by "net break"? If it's transient, then no action is required -- libceph should be able to reconnect and continue on. If it's permanent, then you are screwed anyway because dirty state (including data pages) will be thrown out, whether with "-o full-force" or reboot.

#8 Updated by Ilya Dryomov about 5 years ago

  • Related to Bug #13189: Reboot systerm can not set up. added

#9 Updated by Ilya Dryomov about 5 years ago

  • Related to Bug #15887: client reboot stuck if the ceph node is not reachable or shutdown added

Also available in: Atom PDF