https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2016-01-16T02:56:23ZCeph Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=642422016-01-16T02:56:23ZStephan Wiencznystephan@wienczny.de
<ul></ul><p>In 4.4.0 this looks a like bit different:</p>
<pre>
[ 2281.989202] INFO: task kworker/0:0:1882 blocked for more than 120 seconds.
[ 2281.996208] Not tainted 4.4.0-coreos #2
[ 2282.000684] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2282.008744] kworker/0:0 D ffff8808566bbd28 0 1882 2 0x00000080
[ 2282.015973] Workqueue: events ceph_send_cap_releases [ceph]
[ 2282.021687] ffff8808566bbd28 ffff88085a3a9cc0 ffff88085a3a9cc0 ffff8808553ad640
[ 2282.029438] ffff8808566bc000 ffff880858075024 ffff8808553ad640 00000000ffffffff
[ 2282.037183] ffff880858075028 ffff8808566bbd40 ffffffff8e54fb95 ffff880858075020
[ 2282.044913] Call Trace:
[ 2282.047492] [<ffffffff8e54fb95>] schedule+0x35/0x80
[ 2282.052612] [<ffffffff8e54fe3e>] schedule_preempt_disabled+0xe/0x10
[ 2282.059146] [<ffffffff8e551ae9>] __mutex_lock_slowpath+0xb9/0x130
[ 2282.065489] [<ffffffff8e551b7f>] mutex_lock+0x1f/0x30
[ 2282.070793] [<ffffffffc04a81c1>] ceph_send_cap_releases+0x3f1/0x500 [ceph]
[ 2282.077919] [<ffffffff8e08b687>] process_one_work+0x157/0x3e0
[ 2282.083914] [<ffffffff8e08c0fe>] worker_thread+0x4e/0x470
[ 2282.089529] [<ffffffff8e08c0b0>] ? rescuer_thread+0x320/0x320
[ 2282.095528] [<ffffffff8e091478>] kthread+0xd8/0xf0
[ 2282.100579] [<ffffffff8e0913a0>] ? kthread_park+0x60/0x60
[ 2282.106193] [<ffffffff8e55428f>] ret_from_fork+0x3f/0x70
[ 2282.111723] [<ffffffff8e0913a0>] ? kthread_park+0x60/0x60
[ 2282.117362] INFO: task kworker/0:3:4463 blocked for more than 120 seconds.
[ 2282.124364] Not tainted 4.4.0-coreos #2
[ 2282.128867] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2282.136961] kworker/0:3 D ffff8807b2d0b9e8 0 4463 2 0x00000080
[ 2282.144228] Workqueue: ceph-msgr ceph_msg_new [libceph]
[ 2282.149617] ffff8807b2d0b9e8 ffffffff8e1612b3 ffffffff8ea10540 ffff8807b677d640
[ 2282.157366] ffff8807b2d0c000 ffff880858075024 ffff8807b677d640 00000000ffffffff
[ 2282.165096] ffff880858075028 ffff8807b2d0ba00 ffffffff8e54fb95 ffff880858075020
[ 2282.172835] Call Trace:
[ 2282.175411] [<ffffffff8e1612b3>] ? free_pages+0x13/0x20
[ 2282.180871] [<ffffffff8e54fb95>] schedule+0x35/0x80
[ 2282.186009] [<ffffffff8e54fe3e>] schedule_preempt_disabled+0xe/0x10
[ 2282.192486] [<ffffffff8e551ae9>] __mutex_lock_slowpath+0xb9/0x130
[ 2282.198820] [<ffffffff8e551b7f>] mutex_lock+0x1f/0x30
[ 2282.204091] [<ffffffffc049b968>] ceph_check_caps+0x448/0x18f0 [ceph]
[ 2282.210696] [<ffffffff8e17e1fd>] ? kzfree+0x2d/0x40
[ 2282.215838] [<ffffffff8e28b6b9>] ? crypto_destroy_tfm+0x49/0x80
[ 2282.221996] [<ffffffffc0352076>] ? ceph_auth_none_init+0xd56/0xe60 [libceph]
[ 2282.229305] [<ffffffffc049df94>] ceph_put_wrbuffer_cap_refs+0x114/0x2c0 [ceph]
[ 2282.236856] [<ffffffff8e1676fa>] ? test_clear_page_writeback+0xfa/0x1e0
[ 2282.243729] [<ffffffffc049342d>] ceph_encode_locks_to_buffer+0x134d/0x3600 [ceph]
[ 2282.251589] [<ffffffffc0349d1d>] ceph_osdc_handle_map+0xc5d/0x1120 [libceph]
[ 2282.258903] [<ffffffffc0340657>] ceph_msg_new+0x1a97/0x3210 [libceph]
[ 2282.265592] [<ffffffff8e015702>] ? __switch_to+0x252/0x5a0
[ 2282.271296] [<ffffffff8e08b687>] process_one_work+0x157/0x3e0
[ 2282.277302] [<ffffffff8e08c0fe>] worker_thread+0x4e/0x470
[ 2282.282987] [<ffffffff8e08c0b0>] ? rescuer_thread+0x320/0x320
[ 2282.288994] [<ffffffff8e08c0b0>] ? rescuer_thread+0x320/0x320
[ 2282.294984] [<ffffffff8e091478>] kthread+0xd8/0xf0
[ 2282.300021] [<ffffffff8e0913a0>] ? kthread_park+0x60/0x60
[ 2282.305673] [<ffffffff8e55428f>] ret_from_fork+0x3f/0x70
[ 2282.311200] [<ffffffff8e0913a0>] ? kthread_park+0x60/0x60
</pre>
<p>There are two alternating dumps. Both are in ceph_check_caps calling mutex_lock.</p>
<pre>
filename: /lib/modules/4.4.0-coreos/kernel/fs/ceph/ceph.ko
license: GPL
description: Ceph filesystem for Linux
author: Patience Warnick <patience@newdream.net>
author: Yehuda Sadeh <yehuda@hq.newdream.net>
author: Sage Weil <sage@newdream.net>
alias: fs-ceph
depends: libceph,fscache
intree: Y
vermagic: 4.4.0-coreos SMP mod_unload
</pre> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=642572016-01-18T09:43:31ZZheng Yanukernel@gmail.com
<ul><li><strong>Assignee</strong> set to <i>Zheng Yan</i></li></ul><p>ceph_send_cap_releases() never calls mutex_lock(). ceph_encode_locks_to_buffer() never calls ceph_put_wrbuffer_cap_refs() neither. The backtraces are inaccurate. Please compile kernel with frame pointers (the option is at "Kernel hacking" -> "Compile-time checks and compiler options" -> "Compile the kernel with frame pointers")</p>
<p>when this hung happen again, please run command "echo 1 > /proc/sys/kernel/sysrq; echo t > /proc/sysrq-trigger", then upload kernel message</p>
<p>Besides, what workload cause this hung?</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=642602016-01-18T11:51:06ZStephan Wiencznystephan@wienczny.de
<ul><li><strong>File</strong> <a href="/attachments/download/2160/kernel.state">kernel.state</a> added</li></ul><p>I checked the kernel build config and it already is compiled with "Compile the kernel with frame pointers".<br />The dumped kernel state is attached.<br />We are using cephfs to store our websites. The only processes acessing the filesystem are Apache2 and PHP. These processes are inside docker containers which have volume mounts into cephfs.</p>
<p>The problem was introduced between Coreos 899.1.0 and 926.0.0</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=642642016-01-18T14:25:22ZZheng Yanukernel@gmail.com
<ul></ul><p>Stephan Wienczny wrote:</p>
<blockquote>
<p>The problem was introduced between Coreos 899.1.0 and 926.0.0</p>
</blockquote>
<p>do you mean there is no problem when using 4.2 kernel?</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=642672016-01-18T14:36:24ZZheng Yanukernel@gmail.com
<ul></ul><p>this problem seems related to memory allocation. could you try enlarging memory for these container.</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=642722016-01-18T16:50:58ZStephan Wiencznystephan@wienczny.de
<ul></ul><p>I can't say anything about 4.2. The first version we tested was 4.3.x. Have there been changes in memory mangement between those two versions in ceph?</p>
<p>You are talking about the memory limit to the docker containers? Is the cgroup memory limit accounted to io requests?</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=643172016-01-19T09:25:54ZZheng Yanukernel@gmail.com
<ul></ul><p>Stephan Wienczny wrote:</p>
<blockquote>
<p>I can't say anything about 4.2. The first version we tested was 4.3.x. Have there been changes in memory mangement between those two versions in ceph?</p>
</blockquote>
<p>no change</p>
<blockquote>
<p>You are talking about the memory limit to the docker containers? Is the cgroup memory limit accounted to io requests?</p>
</blockquote>
<p>yes, I'm not familiar with cgroup, But I think it accounts memory used by IO requests. Could you try running same application outside container.</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=643192016-01-19T09:50:11ZStephan Wiencznystephan@wienczny.de
<ul></ul><p>Zheng Yan wrote:</p>
<blockquote>
<p>yes, I'm not familiar with cgroup, But I think it accounts memory used by IO requests. Could you try running same application outside container.</p>
</blockquote>
<p>I can try to run them without cgroup limits first.</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=645352016-01-25T23:32:26ZStephan Wiencznystephan@wienczny.de
<ul></ul><p>The blocked task occour even without cgroup limits. I did a test with 10M limit while reading large files. There are cgroup ooms but no lockups.<br />In the end I triggered the problem by doing an ls in the top directory.</p>
<p>Do you test cephfs against lots of concurrent requests?</p>
<p>There are some more related traces within functions really calling mutex_lock:</p>
<pre>
Jan 25 03:40:29 app001 kernel: Workqueue: ceph-msgr ceph_msg_new [libceph]
Jan 25 03:40:29 app001 kernel: ffff880855c33b68 ffffffffa64a15e3 ffff88085bf93980 ffff880028273980
Jan 25 03:40:29 app001 kernel: ffff880855c34000 ffff88002802840c ffff880028273980 00000000ffffffff
Jan 25 03:40:29 app001 kernel: ffff880028028410 ffff880855c33b80 ffffffffa654fb95 ffff880028028408
Jan 25 03:40:29 app001 kernel: Call Trace:
Jan 25 03:40:29 app001 kernel: [<ffffffffa64a15e3>] ? tcp_recvmsg+0x373/0xb50
Jan 25 03:40:29 app001 kernel: [<ffffffffa654fb95>] schedule+0x35/0x80
Jan 25 03:40:29 app001 kernel: [<ffffffffa654fe3e>] schedule_preempt_disabled+0xe/0x10
Jan 25 03:40:29 app001 kernel: [<ffffffffa6551ae9>] __mutex_lock_slowpath+0xb9/0x130
Jan 25 03:40:29 app001 kernel: [<ffffffffa6551b7f>] mutex_lock+0x1f/0x30
Jan 25 03:40:29 app001 kernel: [<ffffffffc04cbbda>] ceph_mdsc_handle_map+0xda/0x1dd0 [ceph]
Jan 25 03:40:29 app001 kernel: [<ffffffffa6437131>] ? kernel_recvmsg+0x61/0x80
Jan 25 03:40:29 app001 kernel: [<ffffffffc04a876d>] 0xffffffffc04a876d
Jan 25 03:40:29 app001 kernel: [<ffffffffc0376570>] ceph_monmap_decode+0x5c0/0x9f0 [libceph]
Jan 25 03:40:29 app001 kernel: [<ffffffffc036fc91>] ? ceph_con_keepalive+0xe1/0x630 [libceph]
Jan 25 03:40:29 app001 kernel: [<ffffffffc0372657>] ceph_msg_new+0x1a97/0x3210 [libceph]
Jan 25 03:40:29 app001 kernel: [<ffffffffa6015702>] ? __switch_to+0x252/0x5a0
Jan 25 03:40:29 app001 kernel: [<ffffffffa608b687>] process_one_work+0x157/0x3e0
Jan 25 03:40:29 app001 kernel: [<ffffffffa608c0fe>] worker_thread+0x4e/0x470
Jan 25 03:40:29 app001 kernel: [<ffffffffa608c0b0>] ? rescuer_thread+0x320/0x320
Jan 25 03:40:29 app001 kernel: [<ffffffffa608c0b0>] ? rescuer_thread+0x320/0x320
Jan 25 03:40:29 app001 kernel: [<ffffffffa6091478>] kthread+0xd8/0xf0
Jan 25 03:40:29 app001 kernel: [<ffffffffa60913a0>] ? kthread_park+0x60/0x60
Jan 25 03:40:29 app001 kernel: [<ffffffffa655428f>] ret_from_fork+0x3f/0x70
Jan 25 03:40:29 app001 kernel: [<ffffffffa60913a0>] ? kthread_park+0x60/0x60
</pre>
<pre>
Jan 25 03:40:28 app001 kernel: rm D ffff880857f43d48 0 13524 12931 0x00000084
Jan 25 03:40:29 app001 kernel: ffff880857f43d48 ffff880857f43d58 ffffffffa6a10540 ffff88085395b980
Jan 25 03:40:29 app001 kernel: ffff880857f44000 ffff88085a338894 ffff88085395b980 00000000ffffffff
Jan 25 03:40:29 app001 kernel: ffff88085a338898 ffff880857f43d60 ffffffffa654fb95 ffff88085a338890
Jan 25 03:40:29 app001 kernel: Call Trace:
Jan 25 03:40:29 app001 kernel: [<ffffffffa654fb95>] schedule+0x35/0x80
Jan 25 03:40:29 app001 kernel: [<ffffffffa654fe3e>] schedule_preempt_disabled+0xe/0x10
Jan 25 03:40:29 app001 kernel: [<ffffffffa6551ae9>] __mutex_lock_slowpath+0xb9/0x130
Jan 25 03:40:29 app001 kernel: [<ffffffffa609d910>] ? wake_up_q+0x70/0x70
Jan 25 03:40:29 app001 kernel: [<ffffffffa6551b7f>] mutex_lock+0x1f/0x30
Jan 25 03:40:29 app001 kernel: [<ffffffffc04ca824>] ceph_mdsc_do_request+0x1d4/0x270 [ceph]
Jan 25 03:40:29 app001 kernel: [<ffffffffc04af95c>] ceph_init_dentry+0xec/0x2b0 [ceph]
Jan 25 03:40:29 app001 kernel: [<ffffffffa61d8cbd>] lookup_real+0x1d/0x60
Jan 25 03:40:29 app001 kernel: [<ffffffffa61d9042>] __lookup_hash+0x42/0x60
Jan 25 03:40:29 app001 kernel: [<ffffffffa61df64a>] do_unlinkat+0x1ea/0x2a0
Jan 25 03:40:29 app001 kernel: [<ffffffffa61e00eb>] SyS_unlinkat+0x1b/0x30
Jan 25 03:40:29 app001 kernel: [<ffffffffa6553f2e>] entry_SYSCALL_64_fastpath+0x12/0x71
</pre> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=645472016-01-26T03:53:59ZZheng Yanukernel@gmail.com
<ul></ul><p>could you upload the ceph,ko.</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=645512016-01-26T07:24:37ZZheng Yanukernel@gmail.com
<ul></ul><p>backtrace of your kernel are strange, If possible, please upload the whole kernel package to somewhere.</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=645532016-01-26T09:40:09ZStephan Wiencznystephan@wienczny.de
<ul></ul><p>I've uploaded the kernel with all modules and the /proc/config.gz to <a class="external" href="https://stephan.wienczny.de/linux.tar.xz">https://stephan.wienczny.de/linux.tar.xz</a></p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=649192016-01-29T19:15:59ZChris Armstrongchris@opdemand.com
<ul></ul><p>Stephan Wienczny wrote:</p>
<blockquote>
<p>I've uploaded the kernel with all modules and the /proc/config.gz to <a class="external" href="https://stephan.wienczny.de/linux.tar.xz">https://stephan.wienczny.de/linux.tar.xz</a></p>
</blockquote>
<p>Thanks for helping to debug this, Stephan!</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=649302016-01-29T20:37:23ZStephan Wiencznystephan@wienczny.de
<ul></ul><p>Did you find something? Is there a commit you could point me to ;)</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=649442016-01-29T21:30:06ZGreg Farnumgfarnum@redhat.com
<ul></ul><p>Zheng is looking at this and said he found it "strange", but I'm not sure if he's figured anything out yet.</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=651022016-02-02T04:25:10ZZheng Yanukernel@gmail.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>12</i></li><li><strong>Priority</strong> changed from <i>Immediate</i> to <i>Normal</i></li></ul><p>It's caused by selinux. Selinux code calls a getxattr() while we are processing MClientReply.</p>
<p>mount cephfs with context="system_u:object_r:tmp_t:s0" option should solve this issue.</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=651372016-02-02T17:17:02ZGreg Farnumgfarnum@redhat.com
<ul></ul><p>Zheng, what's the scenario that causes that — just using SELinux on a CephFS mount?</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=651802016-02-03T03:38:08ZZheng Yanukernel@gmail.com
<ul></ul><p>I encountered this issue only when selinux is enabled and type is mls (Multi Level Security protection). No problem when selinux type is targeted.</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=662342016-02-24T01:24:37ZNick Owensmischief@coreos.com
<ul></ul><p>i've got a simple reproducer for CoreOS 899.7.0 with the 4.3.3 kernel.</p>
<p>in this instance selinux is in permissive mode with nothing but default coreos configuration.</p>
<pre>
core@coreos_production_qemu-899-7-0 ~ $ seinfo
Statistics for policy file: /etc/selinux/mcs/policy/policy.29
Policy Version & Type: v.29 (binary, mls)
Classes: 92 Permissions: 245
Sensitivities: 1 Categories: 1024
Types: 1218 Attributes: 105
Users: 6 Roles: 6
Booleans: 55 Cond. Expr.: 47
Allow: 12687 Neverallow: 0
Auditallow: 9 Dontaudit: 2449
Type_trans: 637 Type_change: 12
Type_member: 8 Role allow: 9
Role_trans: 0 Range_trans: 13
Constraints: 161 Validatetrans: 0
Initial SIDs: 27 Fs_use: 26
Genfscon: 84 Portcon: 468
Netifcon: 0 Nodecon: 0
Permissives: 0 Polcap: 2
core@coreos_production_qemu-899-7-0 ~ $ sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: mcs
Current mode: permissive
Mode from config file: permissive
Policy MLS status: enabled
Policy deny_unknown status: denied
Max kernel policy version: 30
</pre>
<pre>
docker run -d --net=host -v /etc/ceph:/etc/ceph -e MON_IP=127.0.0.1 -e CEPH_NETWORK=127.0.0.1/8 ceph/demo
SECRET=$(docker run --rm -v /etc/ceph:/etc/ceph --entrypoint=/usr/bin/ceph-authtool ceph/demo -p /etc/ceph/ceph.client.admin.keyring)
sudo mount -t ceph -o rw,relatime,name=admin,secret=$SECRET,fsc 127.0.0.1:/ /mnt
sudo chown :core /mnt
sudo chmod g+w /mnt
git -C /mnt clone https://anongit.gentoo.org/git/repo/gentoo.git
</pre>
<p>here's one instance of git hanging very early.</p>
<pre>
[ 480.774280] INFO: task kworker/15:2:2149 blocked for more than 120 seconds.
[ 480.775594] Not tainted 4.3.3-coreos-r2 #2
[ 480.776594] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 480.778356] kworker/15:2 D ffff88003e1f5880 0 2149 2 0x00000080
[ 480.779783] Workqueue: events ceph_send_cap_releases [ceph]
[ 480.780954] ffff880005443d28 0000000000000046 ffff88003d76ba00 ffff8800033d5700
[ 480.782885] ffff880005444000 ffff880039cb7024 ffff8800033d5700 00000000ffffffff
[ 480.784803] ffff880039cb7028 ffff880005443d40 ffffffff9e542793 ffff880039cb7020
[ 480.786772] Call Trace:
[ 480.787511] [<ffffffff9e542793>] schedule+0x33/0x80
[ 480.788528] [<ffffffff9e542a4e>] schedule_preempt_disabled+0xe/0x10
[ 480.789722] [<ffffffff9e544699>] __mutex_lock_slowpath+0xb9/0x130
[ 480.790891] [<ffffffff9e54472f>] mutex_lock+0x1f/0x30
[ 480.791932] [<ffffffffc0626e61>] ceph_send_cap_releases+0x3f1/0x500 [ceph]
[ 480.793192] [<ffffffff9e0882e7>] process_one_work+0x157/0x3e0
[ 480.794313] [<ffffffff9e088d5e>] worker_thread+0x4e/0x470
[ 480.795425] [<ffffffff9e54215e>] ? __schedule+0x37e/0x980
[ 480.796506] [<ffffffff9e088d10>] ? rescuer_thread+0x320/0x320
[ 480.797637] [<ffffffff9e088d10>] ? rescuer_thread+0x320/0x320
[ 480.798743] [<ffffffff9e08e158>] kthread+0xd8/0xf0
[ 480.799733] [<ffffffff9e08e080>] ? kthread_park+0x60/0x60
[ 480.800835] [<ffffffff9e546e5f>] ret_from_fork+0x3f/0x70
[ 480.801922] [<ffffffff9e08e080>] ? kthread_park+0x60/0x60
[ 600.802227] INFO: task git:2148 blocked for more than 120 seconds.
[ 600.803476] Not tainted 4.3.3-coreos-r2 #2
[ 600.804466] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 600.806213] git D ffff88003e235880 0 2148 1160 0x00000084
[ 600.807655] ffff880004c73b38 0000000000000086 ffff88003df9ba00 ffff880004651d00
[ 600.809571] ffff880004c74000 ffff880036b8c494 ffff880004651d00 00000000ffffffff
[ 600.811479] ffff880036b8c498 ffff880004c73b50 ffffffff9e542793 ffff880036b8c490
[ 600.813385] Call Trace:
[ 600.814102] [<ffffffff9e542793>] schedule+0x33/0x80
[ 600.815134] [<ffffffff9e542a4e>] schedule_preempt_disabled+0xe/0x10
[ 600.816337] [<ffffffff9e544699>] __mutex_lock_slowpath+0xb9/0x130
[ 600.817488] [<ffffffff9e54472f>] mutex_lock+0x1f/0x30
[ 600.818514] [<ffffffffc06274a4>] ceph_mdsc_do_request+0x1d4/0x270 [ceph]
[ 600.819736] [<ffffffffc060c90c>] ceph_init_dentry+0xec/0x2b0 [ceph]
[ 600.820912] [<ffffffff9e1d200d>] lookup_real+0x1d/0x60
[ 600.821946] [<ffffffff9e1d2392>] __lookup_hash+0x42/0x60
[ 600.823005] [<ffffffff9e1d3c6b>] walk_component+0x1eb/0x4a0
[ 600.824092] [<ffffffff9e1d4a2d>] path_lookupat+0x5d/0x110
[ 600.825163] [<ffffffff9e1d8091>] filename_lookup+0xb1/0x180
[ 600.826237] [<ffffffff9e0b9231>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[ 600.827928] [<ffffffff9e1adbd9>] ? kmem_cache_alloc+0x179/0x1f0
[ 600.829043] [<ffffffff9e1d7c96>] ? getname_flags+0x56/0x1f0
[ 600.830115] [<ffffffff9e1d8236>] user_path_at_empty+0x36/0x40
[ 600.831206] [<ffffffff9e1cd2a6>] vfs_fstatat+0x66/0xc0
[ 600.832232] [<ffffffff9e1df8f9>] ? d_delete+0x99/0x170
[ 600.833289] [<ffffffff9e1cd861>] SYSC_newlstat+0x31/0x60
[ 600.834328] [<ffffffff9e003176>] ? do_audit_syscall_entry+0x66/0x70
[ 600.835493] [<ffffffff9e0037ff>] ? syscall_trace_enter_phase1+0x15f/0x170
[ 600.836695] [<ffffffff9e003bbc>] ? syscall_return_slowpath+0x4c/0x120
[ 600.837864] [<ffffffff9e1cdafe>] SyS_newlstat+0xe/0x10
[ 600.838870] [<ffffffff9e546aee>] entry_SYSCALL_64_fastpath+0x12/0x71
[ 600.840016] INFO: task kworker/15:2:2149 blocked for more than 120 seconds.
[ 600.841223] Not tainted 4.3.3-coreos-r2 #2
[ 600.842188] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 600.843830] kworker/15:2 D ffff88003e1f5880 0 2149 2 0x00000080
[ 600.845195] Workqueue: events ceph_send_cap_releases [ceph]
[ 600.846294] ffff880005443d28 0000000000000046 ffff88003d76ba00 ffff8800033d5700
[ 600.848119] ffff880005444000 ffff880039cb7024 ffff8800033d5700 00000000ffffffff
[ 600.849940] ffff880039cb7028 ffff880005443d40 ffffffff9e542793 ffff880039cb7020
[ 600.851765] Call Trace:
[ 600.852435] [<ffffffff9e542793>] schedule+0x33/0x80
[ 600.853407] [<ffffffff9e542a4e>] schedule_preempt_disabled+0xe/0x10
[ 600.854539] [<ffffffff9e544699>] __mutex_lock_slowpath+0xb9/0x130
[ 600.855660] [<ffffffff9e54472f>] mutex_lock+0x1f/0x30
[ 600.856659] [<ffffffffc0626e61>] ceph_send_cap_releases+0x3f1/0x500 [ceph]
[ 600.859031] [<ffffffff9e0882e7>] process_one_work+0x157/0x3e0
[ 600.860148] [<ffffffff9e088d5e>] worker_thread+0x4e/0x470
[ 600.861212] [<ffffffff9e54215e>] ? __schedule+0x37e/0x980
[ 600.862267] [<ffffffff9e088d10>] ? rescuer_thread+0x320/0x320
[ 600.863365] [<ffffffff9e088d10>] ? rescuer_thread+0x320/0x320
[ 600.864469] [<ffffffff9e08e158>] kthread+0xd8/0xf0
[ 600.865470] [<ffffffff9e08e080>] ? kthread_park+0x60/0x60
[ 600.866519] [<ffffffff9e546e5f>] ret_from_fork+0x3f/0x70
[ 600.867596] [<ffffffff9e08e080>] ? kthread_park+0x60/0x60
</pre>
<p>if i can provide more information or otherwise help move this along, let me know.</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=662352016-02-24T01:50:27ZZheng Yanukernel@gmail.com
<ul></ul><p>thank you for your reproducer.</p>
<p>I have already known the root cause, but fixing it requires refactor internal locking of cephfs kernel drivers, it's not simple.</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=680602016-03-23T14:31:14ZManuel Alejandro de Brito Fontesaledbf@gmail.com
<ul></ul><p>Any update on this?<br />Thanks</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=681822016-03-25T00:50:31ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Status</strong> changed from <i>12</i> to <i>15</i></li></ul><p>Userspace side of this fix just merged: <a class="external" href="https://github.com/ceph/ceph/pull/7952">https://github.com/ceph/ceph/pull/7952</a></p>
<p>I presume the kernel patches are in our testing branch and waiting to go upstream. :)</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=684902016-03-31T13:33:01ZZheng Yanukernel@gmail.com
<ul></ul><p>The bug should be fixed in 4.6 kernel</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=693192016-04-18T07:48:46ZZheng Yanukernel@gmail.com
<ul><li><strong>Status</strong> changed from <i>15</i> to <i>Resolved</i></li></ul><p>upstream commit 315f24088048a51eed341c53be66ea477a3c7d16</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=697132016-04-28T19:44:36ZMatthew Garrettmjg59@coreos.com
<ul></ul><p>I've pulled the 4.6 ceph code back to 4.5, but any attempt to read or write to a file is now giving me EIO. Is there another dependency in 4.6 that's required here? Does userland also have to be updated for this to work?</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=697612016-04-29T12:00:58ZZheng Yanukernel@gmail.com
<ul></ul><p>Matthew Garrett wrote:</p>
<blockquote>
<p>I've pulled the 4.6 ceph code back to 4.5, but any attempt to read or write to a file is now giving me EIO. Is there another dependency in 4.6 that's required here? Does userland also have to be updated for this to work?</p>
</blockquote>
<p>are there kernel message "sync getxattr xxxxx during filling trace"?, if there are, you need to upgrade mds to jewel.</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=697952016-04-29T20:44:29ZMatthew Garrettmjg59@coreos.com
<ul></ul><p>I'm not seeing those warnings. strace shows read() and write() immediately returning EIO. I'll see if I can try a 4.6rc to verify whether this is due to a missing patch dependency.</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=708652016-05-17T01:52:09ZMatthew Garrettmjg59@coreos.com
<ul></ul><p>We're seeing exactly the same issue on 4.6 - all writes are giving EIO, but no errors are being printed.</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=710752016-05-19T22:13:39ZGreg Farnumgfarnum@redhat.com
<ul></ul><p>If you've got an issue with vanilaa 4.6 and a proper userspace release, please open a new bug with the full details; it's not necessarily related to this ticket. :)</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=711262016-05-20T14:12:23ZZheng Yanukernel@gmail.com
<ul></ul><p>Matthew Garrett wrote:</p>
<blockquote>
<p>We're seeing exactly the same issue on 4.6 - all writes are giving EIO, but no errors are being printed.</p>
</blockquote>
<p>Which distribution are you using? The selinux policy in Fedora does not support ceph. I tried coreos, but my kernel failed to boot.</p> Linux kernel client - Bug #14360: Cephfs kernel client ceph_send_cap_releases hung taskhttps://tracker.ceph.com/issues/14360?journal_id=712622016-05-23T07:42:58ZZheng Yanukernel@gmail.com
<ul></ul><p>I tried coreos 1053.2.0, no problem found.</p>