Project

General

Profile

Bug #14360

Cephfs kernel client ceph_send_cap_releases hung task

Added by Stephan Wienczny over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
01/13/2016
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
kcephfs

Description

On some of my machines cephfs hanging:

[ 2646.383076] kworker/0:1     D ffff88107fc15880     0  1998      2 0x00000080
[ 2646.390428] Workqueue: events ceph_send_cap_releases [ceph]
[ 2646.396249]  ffff880fc88c3d28 0000000000000046 ffff8810388dd700 ffff881031dbd700
[ 2646.406817]  ffff880fc88c4000 ffff8810369d9024 ffff881031dbd700 00000000ffffffff
[ 2646.414760]  ffff8810369d9028 ffff880fc88c3d40 ffffffffa4542793 ffff8810369d9020
[ 2646.422701] Call Trace:
[ 2646.425322]  [<ffffffffa4542793>] schedule+0x33/0x80
[ 2646.430476]  [<ffffffffa4542a4e>] schedule_preempt_disabled+0xe/0x10
[ 2646.437035]  [<ffffffffa4544699>] __mutex_lock_slowpath+0xb9/0x130
[ 2646.443389]  [<ffffffffa454472f>] mutex_lock+0x1f/0x30
[ 2646.448701]  [<ffffffffc06a9e61>] ceph_send_cap_releases+0x3f1/0x500 [ceph]
[ 2646.455840]  [<ffffffffa40882e7>] process_one_work+0x157/0x3e0
[ 2646.461840]  [<ffffffffa4088d5e>] worker_thread+0x4e/0x470
[ 2646.467502]  [<ffffffffa454215e>] ? __schedule+0x37e/0x980
[ 2646.473169]  [<ffffffffa4088d10>] ? rescuer_thread+0x320/0x320
[ 2646.479181]  [<ffffffffa4088d10>] ? rescuer_thread+0x320/0x320
[ 2646.485187]  [<ffffffffa408e158>] kthread+0xd8/0xf0
[ 2646.490240]  [<ffffffffa408e080>] ? kthread_park+0x60/0x60
[ 2646.495902]  [<ffffffffa4546e5f>] ret_from_fork+0x3f/0x70
[ 2646.501471]  [<ffffffffa408e080>] ? kthread_park+0x60/0x60
[ 3428.404720] perf interrupt took too long (2508 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
filename:       /lib/modules/4.3.3-coreos-r1/kernel/fs/ceph/ceph.ko
license:        GPL
description:    Ceph filesystem for Linux
author:         Patience Warnick <patience@newdream.net>
author:         Yehuda Sadeh <yehuda@hq.newdream.net>
author:         Sage Weil <sage@newdream.net>
alias:          fs-ceph
depends:        libceph,fscache
intree:         Y
vermagic:       4.3.3-coreos-r1 SMP mod_unload

kernel.state (359 KB) Stephan Wienczny, 01/18/2016 11:23 AM

Associated revisions

Revision f4bd1fc9 (diff)
Added by Yan, Zheng over 3 years ago

mds: allow client to request caps when opening file

Allow client to request xattrs while opening file. MDS adds xattrs
and corresponding caps to the reply. This is important for selinux
on kcephfs.

Fixes: #14360
Signed-off-by: Yan, Zheng <>

History

#1 Updated by Stephan Wienczny over 3 years ago

In 4.4.0 this looks a like bit different:

[ 2281.989202] INFO: task kworker/0:0:1882 blocked for more than 120 seconds.
[ 2281.996208]       Not tainted 4.4.0-coreos #2
[ 2282.000684] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2282.008744] kworker/0:0     D ffff8808566bbd28     0  1882      2 0x00000080
[ 2282.015973] Workqueue: events ceph_send_cap_releases [ceph]
[ 2282.021687]  ffff8808566bbd28 ffff88085a3a9cc0 ffff88085a3a9cc0 ffff8808553ad640
[ 2282.029438]  ffff8808566bc000 ffff880858075024 ffff8808553ad640 00000000ffffffff
[ 2282.037183]  ffff880858075028 ffff8808566bbd40 ffffffff8e54fb95 ffff880858075020
[ 2282.044913] Call Trace:
[ 2282.047492]  [<ffffffff8e54fb95>] schedule+0x35/0x80
[ 2282.052612]  [<ffffffff8e54fe3e>] schedule_preempt_disabled+0xe/0x10
[ 2282.059146]  [<ffffffff8e551ae9>] __mutex_lock_slowpath+0xb9/0x130
[ 2282.065489]  [<ffffffff8e551b7f>] mutex_lock+0x1f/0x30
[ 2282.070793]  [<ffffffffc04a81c1>] ceph_send_cap_releases+0x3f1/0x500 [ceph]
[ 2282.077919]  [<ffffffff8e08b687>] process_one_work+0x157/0x3e0
[ 2282.083914]  [<ffffffff8e08c0fe>] worker_thread+0x4e/0x470
[ 2282.089529]  [<ffffffff8e08c0b0>] ? rescuer_thread+0x320/0x320
[ 2282.095528]  [<ffffffff8e091478>] kthread+0xd8/0xf0
[ 2282.100579]  [<ffffffff8e0913a0>] ? kthread_park+0x60/0x60
[ 2282.106193]  [<ffffffff8e55428f>] ret_from_fork+0x3f/0x70
[ 2282.111723]  [<ffffffff8e0913a0>] ? kthread_park+0x60/0x60
[ 2282.117362] INFO: task kworker/0:3:4463 blocked for more than 120 seconds.
[ 2282.124364]       Not tainted 4.4.0-coreos #2
[ 2282.128867] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2282.136961] kworker/0:3     D ffff8807b2d0b9e8     0  4463      2 0x00000080
[ 2282.144228] Workqueue: ceph-msgr ceph_msg_new [libceph]
[ 2282.149617]  ffff8807b2d0b9e8 ffffffff8e1612b3 ffffffff8ea10540 ffff8807b677d640
[ 2282.157366]  ffff8807b2d0c000 ffff880858075024 ffff8807b677d640 00000000ffffffff
[ 2282.165096]  ffff880858075028 ffff8807b2d0ba00 ffffffff8e54fb95 ffff880858075020
[ 2282.172835] Call Trace:
[ 2282.175411]  [<ffffffff8e1612b3>] ? free_pages+0x13/0x20
[ 2282.180871]  [<ffffffff8e54fb95>] schedule+0x35/0x80
[ 2282.186009]  [<ffffffff8e54fe3e>] schedule_preempt_disabled+0xe/0x10
[ 2282.192486]  [<ffffffff8e551ae9>] __mutex_lock_slowpath+0xb9/0x130
[ 2282.198820]  [<ffffffff8e551b7f>] mutex_lock+0x1f/0x30
[ 2282.204091]  [<ffffffffc049b968>] ceph_check_caps+0x448/0x18f0 [ceph]
[ 2282.210696]  [<ffffffff8e17e1fd>] ? kzfree+0x2d/0x40
[ 2282.215838]  [<ffffffff8e28b6b9>] ? crypto_destroy_tfm+0x49/0x80
[ 2282.221996]  [<ffffffffc0352076>] ? ceph_auth_none_init+0xd56/0xe60 [libceph]
[ 2282.229305]  [<ffffffffc049df94>] ceph_put_wrbuffer_cap_refs+0x114/0x2c0 [ceph]
[ 2282.236856]  [<ffffffff8e1676fa>] ? test_clear_page_writeback+0xfa/0x1e0
[ 2282.243729]  [<ffffffffc049342d>] ceph_encode_locks_to_buffer+0x134d/0x3600 [ceph]
[ 2282.251589]  [<ffffffffc0349d1d>] ceph_osdc_handle_map+0xc5d/0x1120 [libceph]
[ 2282.258903]  [<ffffffffc0340657>] ceph_msg_new+0x1a97/0x3210 [libceph]
[ 2282.265592]  [<ffffffff8e015702>] ? __switch_to+0x252/0x5a0
[ 2282.271296]  [<ffffffff8e08b687>] process_one_work+0x157/0x3e0
[ 2282.277302]  [<ffffffff8e08c0fe>] worker_thread+0x4e/0x470
[ 2282.282987]  [<ffffffff8e08c0b0>] ? rescuer_thread+0x320/0x320
[ 2282.288994]  [<ffffffff8e08c0b0>] ? rescuer_thread+0x320/0x320
[ 2282.294984]  [<ffffffff8e091478>] kthread+0xd8/0xf0
[ 2282.300021]  [<ffffffff8e0913a0>] ? kthread_park+0x60/0x60
[ 2282.305673]  [<ffffffff8e55428f>] ret_from_fork+0x3f/0x70
[ 2282.311200]  [<ffffffff8e0913a0>] ? kthread_park+0x60/0x60

There are two alternating dumps. Both are in ceph_check_caps calling mutex_lock.

filename:       /lib/modules/4.4.0-coreos/kernel/fs/ceph/ceph.ko
license:        GPL
description:    Ceph filesystem for Linux
author:         Patience Warnick <patience@newdream.net>
author:         Yehuda Sadeh <yehuda@hq.newdream.net>
author:         Sage Weil <sage@newdream.net>
alias:          fs-ceph
depends:        libceph,fscache
intree:         Y
vermagic:       4.4.0-coreos SMP mod_unload

#2 Updated by Zheng Yan over 3 years ago

  • Assignee set to Zheng Yan

ceph_send_cap_releases() never calls mutex_lock(). ceph_encode_locks_to_buffer() never calls ceph_put_wrbuffer_cap_refs() neither. The backtraces are inaccurate. Please compile kernel with frame pointers (the option is at "Kernel hacking" -> "Compile-time checks and compiler options" -> "Compile the kernel with frame pointers")

when this hung happen again, please run command "echo 1 > /proc/sys/kernel/sysrq; echo t > /proc/sysrq-trigger", then upload kernel message

Besides, what workload cause this hung?

#3 Updated by Stephan Wienczny over 3 years ago

I checked the kernel build config and it already is compiled with "Compile the kernel with frame pointers".
The dumped kernel state is attached.
We are using cephfs to store our websites. The only processes acessing the filesystem are Apache2 and PHP. These processes are inside docker containers which have volume mounts into cephfs.

The problem was introduced between Coreos 899.1.0 and 926.0.0

#4 Updated by Zheng Yan over 3 years ago

Stephan Wienczny wrote:

The problem was introduced between Coreos 899.1.0 and 926.0.0

do you mean there is no problem when using 4.2 kernel?

#5 Updated by Zheng Yan over 3 years ago

this problem seems related to memory allocation. could you try enlarging memory for these container.

#6 Updated by Stephan Wienczny over 3 years ago

I can't say anything about 4.2. The first version we tested was 4.3.x. Have there been changes in memory mangement between those two versions in ceph?

You are talking about the memory limit to the docker containers? Is the cgroup memory limit accounted to io requests?

#7 Updated by Zheng Yan over 3 years ago

Stephan Wienczny wrote:

I can't say anything about 4.2. The first version we tested was 4.3.x. Have there been changes in memory mangement between those two versions in ceph?

no change

You are talking about the memory limit to the docker containers? Is the cgroup memory limit accounted to io requests?

yes, I'm not familiar with cgroup, But I think it accounts memory used by IO requests. Could you try running same application outside container.

#8 Updated by Stephan Wienczny over 3 years ago

Zheng Yan wrote:

yes, I'm not familiar with cgroup, But I think it accounts memory used by IO requests. Could you try running same application outside container.

I can try to run them without cgroup limits first.

#9 Updated by Stephan Wienczny over 3 years ago

The blocked task occour even without cgroup limits. I did a test with 10M limit while reading large files. There are cgroup ooms but no lockups.
In the end I triggered the problem by doing an ls in the top directory.

Do you test cephfs against lots of concurrent requests?

There are some more related traces within functions really calling mutex_lock:

Jan 25 03:40:29 app001 kernel: Workqueue: ceph-msgr ceph_msg_new [libceph]
Jan 25 03:40:29 app001 kernel:  ffff880855c33b68 ffffffffa64a15e3 ffff88085bf93980 ffff880028273980
Jan 25 03:40:29 app001 kernel:  ffff880855c34000 ffff88002802840c ffff880028273980 00000000ffffffff
Jan 25 03:40:29 app001 kernel:  ffff880028028410 ffff880855c33b80 ffffffffa654fb95 ffff880028028408
Jan 25 03:40:29 app001 kernel: Call Trace:
Jan 25 03:40:29 app001 kernel:  [<ffffffffa64a15e3>] ? tcp_recvmsg+0x373/0xb50
Jan 25 03:40:29 app001 kernel:  [<ffffffffa654fb95>] schedule+0x35/0x80
Jan 25 03:40:29 app001 kernel:  [<ffffffffa654fe3e>] schedule_preempt_disabled+0xe/0x10
Jan 25 03:40:29 app001 kernel:  [<ffffffffa6551ae9>] __mutex_lock_slowpath+0xb9/0x130
Jan 25 03:40:29 app001 kernel:  [<ffffffffa6551b7f>] mutex_lock+0x1f/0x30
Jan 25 03:40:29 app001 kernel:  [<ffffffffc04cbbda>] ceph_mdsc_handle_map+0xda/0x1dd0 [ceph]
Jan 25 03:40:29 app001 kernel:  [<ffffffffa6437131>] ? kernel_recvmsg+0x61/0x80
Jan 25 03:40:29 app001 kernel:  [<ffffffffc04a876d>] 0xffffffffc04a876d
Jan 25 03:40:29 app001 kernel:  [<ffffffffc0376570>] ceph_monmap_decode+0x5c0/0x9f0 [libceph]
Jan 25 03:40:29 app001 kernel:  [<ffffffffc036fc91>] ? ceph_con_keepalive+0xe1/0x630 [libceph]
Jan 25 03:40:29 app001 kernel:  [<ffffffffc0372657>] ceph_msg_new+0x1a97/0x3210 [libceph]
Jan 25 03:40:29 app001 kernel:  [<ffffffffa6015702>] ? __switch_to+0x252/0x5a0
Jan 25 03:40:29 app001 kernel:  [<ffffffffa608b687>] process_one_work+0x157/0x3e0
Jan 25 03:40:29 app001 kernel:  [<ffffffffa608c0fe>] worker_thread+0x4e/0x470
Jan 25 03:40:29 app001 kernel:  [<ffffffffa608c0b0>] ? rescuer_thread+0x320/0x320
Jan 25 03:40:29 app001 kernel:  [<ffffffffa608c0b0>] ? rescuer_thread+0x320/0x320
Jan 25 03:40:29 app001 kernel:  [<ffffffffa6091478>] kthread+0xd8/0xf0
Jan 25 03:40:29 app001 kernel:  [<ffffffffa60913a0>] ? kthread_park+0x60/0x60
Jan 25 03:40:29 app001 kernel:  [<ffffffffa655428f>] ret_from_fork+0x3f/0x70
Jan 25 03:40:29 app001 kernel:  [<ffffffffa60913a0>] ? kthread_park+0x60/0x60

Jan 25 03:40:28 app001 kernel: rm              D ffff880857f43d48     0 13524  12931 0x00000084
Jan 25 03:40:29 app001 kernel:  ffff880857f43d48 ffff880857f43d58 ffffffffa6a10540 ffff88085395b980
Jan 25 03:40:29 app001 kernel:  ffff880857f44000 ffff88085a338894 ffff88085395b980 00000000ffffffff
Jan 25 03:40:29 app001 kernel:  ffff88085a338898 ffff880857f43d60 ffffffffa654fb95 ffff88085a338890
Jan 25 03:40:29 app001 kernel: Call Trace:
Jan 25 03:40:29 app001 kernel:  [<ffffffffa654fb95>] schedule+0x35/0x80
Jan 25 03:40:29 app001 kernel:  [<ffffffffa654fe3e>] schedule_preempt_disabled+0xe/0x10
Jan 25 03:40:29 app001 kernel:  [<ffffffffa6551ae9>] __mutex_lock_slowpath+0xb9/0x130
Jan 25 03:40:29 app001 kernel:  [<ffffffffa609d910>] ? wake_up_q+0x70/0x70
Jan 25 03:40:29 app001 kernel:  [<ffffffffa6551b7f>] mutex_lock+0x1f/0x30
Jan 25 03:40:29 app001 kernel:  [<ffffffffc04ca824>] ceph_mdsc_do_request+0x1d4/0x270 [ceph]
Jan 25 03:40:29 app001 kernel:  [<ffffffffc04af95c>] ceph_init_dentry+0xec/0x2b0 [ceph]
Jan 25 03:40:29 app001 kernel:  [<ffffffffa61d8cbd>] lookup_real+0x1d/0x60
Jan 25 03:40:29 app001 kernel:  [<ffffffffa61d9042>] __lookup_hash+0x42/0x60
Jan 25 03:40:29 app001 kernel:  [<ffffffffa61df64a>] do_unlinkat+0x1ea/0x2a0
Jan 25 03:40:29 app001 kernel:  [<ffffffffa61e00eb>] SyS_unlinkat+0x1b/0x30
Jan 25 03:40:29 app001 kernel:  [<ffffffffa6553f2e>] entry_SYSCALL_64_fastpath+0x12/0x71

#10 Updated by Zheng Yan over 3 years ago

could you upload the ceph,ko.

#11 Updated by Zheng Yan over 3 years ago

backtrace of your kernel are strange, If possible, please upload the whole kernel package to somewhere.

#12 Updated by Stephan Wienczny over 3 years ago

I've uploaded the kernel with all modules and the /proc/config.gz to https://stephan.wienczny.de/linux.tar.xz

#13 Updated by Chris Armstrong over 3 years ago

Stephan Wienczny wrote:

I've uploaded the kernel with all modules and the /proc/config.gz to https://stephan.wienczny.de/linux.tar.xz

Thanks for helping to debug this, Stephan!

#14 Updated by Stephan Wienczny over 3 years ago

Did you find something? Is there a commit you could point me to ;)

#15 Updated by Greg Farnum over 3 years ago

Zheng is looking at this and said he found it "strange", but I'm not sure if he's figured anything out yet.

#16 Updated by Zheng Yan over 3 years ago

  • Status changed from New to Verified
  • Priority changed from Immediate to Normal

It's caused by selinux. Selinux code calls a getxattr() while we are processing MClientReply.

mount cephfs with context="system_u:object_r:tmp_t:s0" option should solve this issue.

#17 Updated by Greg Farnum over 3 years ago

Zheng, what's the scenario that causes that — just using SELinux on a CephFS mount?

#18 Updated by Zheng Yan over 3 years ago

I encountered this issue only when selinux is enabled and type is mls (Multi Level Security protection). No problem when selinux type is targeted.

#19 Updated by Nick Owens over 3 years ago

i've got a simple reproducer for CoreOS 899.7.0 with the 4.3.3 kernel.

in this instance selinux is in permissive mode with nothing but default coreos configuration.

core@coreos_production_qemu-899-7-0 ~ $ seinfo  

Statistics for policy file: /etc/selinux/mcs/policy/policy.29
Policy Version & Type: v.29 (binary, mls)

   Classes:            92    Permissions:       245
   Sensitivities:       1    Categories:       1024
   Types:            1218    Attributes:        105
   Users:               6    Roles:               6
   Booleans:           55    Cond. Expr.:        47
   Allow:           12687    Neverallow:          0
   Auditallow:          9    Dontaudit:        2449
   Type_trans:        637    Type_change:        12
   Type_member:         8    Role allow:          9
   Role_trans:          0    Range_trans:        13
   Constraints:       161    Validatetrans:       0
   Initial SIDs:       27    Fs_use:             26
   Genfscon:           84    Portcon:           468
   Netifcon:            0    Nodecon:             0
   Permissives:         0    Polcap:              2

core@coreos_production_qemu-899-7-0 ~ $ sestatus 
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             mcs
Current mode:                   permissive
Mode from config file:          permissive
Policy MLS status:              enabled
Policy deny_unknown status:     denied
Max kernel policy version:      30
docker run -d --net=host -v /etc/ceph:/etc/ceph -e MON_IP=127.0.0.1 -e CEPH_NETWORK=127.0.0.1/8 ceph/demo
SECRET=$(docker run --rm -v /etc/ceph:/etc/ceph --entrypoint=/usr/bin/ceph-authtool ceph/demo -p /etc/ceph/ceph.client.admin.keyring)
sudo mount -t ceph -o rw,relatime,name=admin,secret=$SECRET,fsc 127.0.0.1:/ /mnt
sudo chown :core /mnt
sudo chmod g+w /mnt
git -C /mnt clone https://anongit.gentoo.org/git/repo/gentoo.git

here's one instance of git hanging very early.

[  480.774280] INFO: task kworker/15:2:2149 blocked for more than 120 seconds.
[  480.775594]       Not tainted 4.3.3-coreos-r2 #2
[  480.776594] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  480.778356] kworker/15:2    D ffff88003e1f5880     0  2149      2 0x00000080
[  480.779783] Workqueue: events ceph_send_cap_releases [ceph]
[  480.780954]  ffff880005443d28 0000000000000046 ffff88003d76ba00 ffff8800033d5700
[  480.782885]  ffff880005444000 ffff880039cb7024 ffff8800033d5700 00000000ffffffff
[  480.784803]  ffff880039cb7028 ffff880005443d40 ffffffff9e542793 ffff880039cb7020
[  480.786772] Call Trace:
[  480.787511]  [<ffffffff9e542793>] schedule+0x33/0x80
[  480.788528]  [<ffffffff9e542a4e>] schedule_preempt_disabled+0xe/0x10
[  480.789722]  [<ffffffff9e544699>] __mutex_lock_slowpath+0xb9/0x130
[  480.790891]  [<ffffffff9e54472f>] mutex_lock+0x1f/0x30
[  480.791932]  [<ffffffffc0626e61>] ceph_send_cap_releases+0x3f1/0x500 [ceph]
[  480.793192]  [<ffffffff9e0882e7>] process_one_work+0x157/0x3e0
[  480.794313]  [<ffffffff9e088d5e>] worker_thread+0x4e/0x470
[  480.795425]  [<ffffffff9e54215e>] ? __schedule+0x37e/0x980
[  480.796506]  [<ffffffff9e088d10>] ? rescuer_thread+0x320/0x320
[  480.797637]  [<ffffffff9e088d10>] ? rescuer_thread+0x320/0x320
[  480.798743]  [<ffffffff9e08e158>] kthread+0xd8/0xf0
[  480.799733]  [<ffffffff9e08e080>] ? kthread_park+0x60/0x60
[  480.800835]  [<ffffffff9e546e5f>] ret_from_fork+0x3f/0x70
[  480.801922]  [<ffffffff9e08e080>] ? kthread_park+0x60/0x60
[  600.802227] INFO: task git:2148 blocked for more than 120 seconds.
[  600.803476]       Not tainted 4.3.3-coreos-r2 #2
[  600.804466] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  600.806213] git             D ffff88003e235880     0  2148   1160 0x00000084
[  600.807655]  ffff880004c73b38 0000000000000086 ffff88003df9ba00 ffff880004651d00
[  600.809571]  ffff880004c74000 ffff880036b8c494 ffff880004651d00 00000000ffffffff
[  600.811479]  ffff880036b8c498 ffff880004c73b50 ffffffff9e542793 ffff880036b8c490
[  600.813385] Call Trace:
[  600.814102]  [<ffffffff9e542793>] schedule+0x33/0x80
[  600.815134]  [<ffffffff9e542a4e>] schedule_preempt_disabled+0xe/0x10
[  600.816337]  [<ffffffff9e544699>] __mutex_lock_slowpath+0xb9/0x130
[  600.817488]  [<ffffffff9e54472f>] mutex_lock+0x1f/0x30
[  600.818514]  [<ffffffffc06274a4>] ceph_mdsc_do_request+0x1d4/0x270 [ceph]
[  600.819736]  [<ffffffffc060c90c>] ceph_init_dentry+0xec/0x2b0 [ceph]
[  600.820912]  [<ffffffff9e1d200d>] lookup_real+0x1d/0x60
[  600.821946]  [<ffffffff9e1d2392>] __lookup_hash+0x42/0x60
[  600.823005]  [<ffffffff9e1d3c6b>] walk_component+0x1eb/0x4a0
[  600.824092]  [<ffffffff9e1d4a2d>] path_lookupat+0x5d/0x110
[  600.825163]  [<ffffffff9e1d8091>] filename_lookup+0xb1/0x180
[  600.826237]  [<ffffffff9e0b9231>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[  600.827928]  [<ffffffff9e1adbd9>] ? kmem_cache_alloc+0x179/0x1f0
[  600.829043]  [<ffffffff9e1d7c96>] ? getname_flags+0x56/0x1f0
[  600.830115]  [<ffffffff9e1d8236>] user_path_at_empty+0x36/0x40
[  600.831206]  [<ffffffff9e1cd2a6>] vfs_fstatat+0x66/0xc0
[  600.832232]  [<ffffffff9e1df8f9>] ? d_delete+0x99/0x170
[  600.833289]  [<ffffffff9e1cd861>] SYSC_newlstat+0x31/0x60
[  600.834328]  [<ffffffff9e003176>] ? do_audit_syscall_entry+0x66/0x70
[  600.835493]  [<ffffffff9e0037ff>] ? syscall_trace_enter_phase1+0x15f/0x170
[  600.836695]  [<ffffffff9e003bbc>] ? syscall_return_slowpath+0x4c/0x120
[  600.837864]  [<ffffffff9e1cdafe>] SyS_newlstat+0xe/0x10
[  600.838870]  [<ffffffff9e546aee>] entry_SYSCALL_64_fastpath+0x12/0x71
[  600.840016] INFO: task kworker/15:2:2149 blocked for more than 120 seconds.
[  600.841223]       Not tainted 4.3.3-coreos-r2 #2
[  600.842188] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  600.843830] kworker/15:2    D ffff88003e1f5880     0  2149      2 0x00000080
[  600.845195] Workqueue: events ceph_send_cap_releases [ceph]
[  600.846294]  ffff880005443d28 0000000000000046 ffff88003d76ba00 ffff8800033d5700
[  600.848119]  ffff880005444000 ffff880039cb7024 ffff8800033d5700 00000000ffffffff
[  600.849940]  ffff880039cb7028 ffff880005443d40 ffffffff9e542793 ffff880039cb7020
[  600.851765] Call Trace:
[  600.852435]  [<ffffffff9e542793>] schedule+0x33/0x80
[  600.853407]  [<ffffffff9e542a4e>] schedule_preempt_disabled+0xe/0x10
[  600.854539]  [<ffffffff9e544699>] __mutex_lock_slowpath+0xb9/0x130
[  600.855660]  [<ffffffff9e54472f>] mutex_lock+0x1f/0x30
[  600.856659]  [<ffffffffc0626e61>] ceph_send_cap_releases+0x3f1/0x500 [ceph]
[  600.859031]  [<ffffffff9e0882e7>] process_one_work+0x157/0x3e0
[  600.860148]  [<ffffffff9e088d5e>] worker_thread+0x4e/0x470
[  600.861212]  [<ffffffff9e54215e>] ? __schedule+0x37e/0x980
[  600.862267]  [<ffffffff9e088d10>] ? rescuer_thread+0x320/0x320
[  600.863365]  [<ffffffff9e088d10>] ? rescuer_thread+0x320/0x320
[  600.864469]  [<ffffffff9e08e158>] kthread+0xd8/0xf0
[  600.865470]  [<ffffffff9e08e080>] ? kthread_park+0x60/0x60
[  600.866519]  [<ffffffff9e546e5f>] ret_from_fork+0x3f/0x70
[  600.867596]  [<ffffffff9e08e080>] ? kthread_park+0x60/0x60

if i can provide more information or otherwise help move this along, let me know.

#20 Updated by Zheng Yan over 3 years ago

thank you for your reproducer.

I have already known the root cause, but fixing it requires refactor internal locking of cephfs kernel drivers, it's not simple.

#21 Updated by Manuel Alejandro de Brito Fontes over 3 years ago

Any update on this?
Thanks

#22 Updated by Greg Farnum over 3 years ago

  • Status changed from Verified to Pending Upstream

Userspace side of this fix just merged: https://github.com/ceph/ceph/pull/7952

I presume the kernel patches are in our testing branch and waiting to go upstream. :)

#23 Updated by Zheng Yan about 3 years ago

The bug should be fixed in 4.6 kernel

#24 Updated by Zheng Yan about 3 years ago

  • Status changed from Pending Upstream to Resolved

upstream commit 315f24088048a51eed341c53be66ea477a3c7d16

#25 Updated by Matthew Garrett about 3 years ago

I've pulled the 4.6 ceph code back to 4.5, but any attempt to read or write to a file is now giving me EIO. Is there another dependency in 4.6 that's required here? Does userland also have to be updated for this to work?

#26 Updated by Zheng Yan about 3 years ago

Matthew Garrett wrote:

I've pulled the 4.6 ceph code back to 4.5, but any attempt to read or write to a file is now giving me EIO. Is there another dependency in 4.6 that's required here? Does userland also have to be updated for this to work?

are there kernel message "sync getxattr xxxxx during filling trace"?, if there are, you need to upgrade mds to jewel.

#27 Updated by Matthew Garrett about 3 years ago

I'm not seeing those warnings. strace shows read() and write() immediately returning EIO. I'll see if I can try a 4.6rc to verify whether this is due to a missing patch dependency.

#28 Updated by Matthew Garrett about 3 years ago

We're seeing exactly the same issue on 4.6 - all writes are giving EIO, but no errors are being printed.

#29 Updated by Greg Farnum about 3 years ago

If you've got an issue with vanilaa 4.6 and a proper userspace release, please open a new bug with the full details; it's not necessarily related to this ticket. :)

#30 Updated by Zheng Yan about 3 years ago

Matthew Garrett wrote:

We're seeing exactly the same issue on 4.6 - all writes are giving EIO, but no errors are being printed.

Which distribution are you using? The selinux policy in Fedora does not support ceph. I tried coreos, but my kernel failed to boot.

#31 Updated by Zheng Yan about 3 years ago

I tried coreos 1053.2.0, no problem found.

Also available in: Atom PDF