Bug #18798
closedFS activity hung, MDS reports client "failing to respond to capability release"
0%
Description
I've had two occurrences in the past 3 weeks where filesystem activity hangs, with the MDS report a client "failing to respond to capability release". I'm using the Kernel cephfs client. Ceph is version 10.2.5 and Kernel is version 4.9.0. I've since downgraded the kernel to 4.7.5, as I was running that Kernel previously and did not have any issues (which could be just luck, or the bug was introduced somewhere between 4.7.5 and 4.9.0).
More details below:
[root@kb-ceph03 ~]# ceph status cluster 3bd52056-9ed8-4ff6-9359-6a3ea437d1e9 health HEALTH_WARN mds0: Client kb-ceph03.knc.local: failing to respond to capability release monmap e2: 3 mons at {kb-ceph01=10.100.212.2:6789/0,kb-ceph02=10.100.212.3:6789/0,kb-ceph03=10.100.212.4:6789/0} election epoch 362, quorum 0,1,2 kb-ceph01,kb-ceph02,kb-ceph03 fsmap e215: 1/1/1 up {0=kb-ceph03=up:active}, 2 up:standby osdmap e1293: 24 osds: 24 up, 24 in flags sortbitwise,require_jewel_osds pgmap v2612120: 1296 pgs, 4 pools, 5782 GB data, 1446 kobjects 15314 GB used, 115 TB / 130 TB avail 1296 active+clean client io 8127 kB/s wr, 0 op/s rd, 1 op/s wr [root@kb-ceph03 ~]# ceph mds stat e215: 1/1/1 up {0=kb-ceph03=up:active}, 2 up:standby [root@kb-ceph03 ~]# ceph daemon mds.kb-ceph03 dump_ops_in_flight { "ops": [ { "description": "client_request(client.48718333:240331824 getattr pAsLsXsFs #1000000498c 2017-02-01 14:03:38.913579)", "initiated_at": "2017-02-01 14:03:38.914845", "age": 1446.955546, "duration": 1446.955708, "type_data": [ "failed to rdlock, waiting", "client.48718333:240331824", "client_request", { "client": "client.48718333", "tid": 240331824 }, [ { "time": "2017-02-01 14:03:38.914845", "event": "initiated" }, { "time": "2017-02-01 14:03:38.914958", "event": "failed to rdlock, waiting" } ] ] }, { "description": "client_request(client.48718333:240322739 getattr pAsLsXsFs #1000000498c 2017-02-01 14:01:27.049810)", "initiated_at": "2017-02-01 14:01:27.049678", "age": 1578.820714, "duration": 1578.820904, "type_data": [ "failed to rdlock, waiting", "client.48718333:240322739", "client_request", { "client": "client.48718333", "tid": 240322739 }, [ { "time": "2017-02-01 14:01:27.049678", "event": "initiated" }, { "time": "2017-02-01 14:01:27.049869", "event": "failed to rdlock, waiting" } ] ] } ], "num_ops": 2 } [root@kb-ceph03 ~]# ceph daemon mds.kb-ceph03 session ls [ { "id": 68523984, "num_leases": 1, "num_caps": 820, "state": "open", "replay_requests": 0, "completed_requests": 0, "reconnecting": false, "inst": "client.68523984 10.100.212.3:0\/2863784256", "client_metadata": { "entity_id": "admin", "hostname": "kb-ceph02.knc.local", "kernel_version": "4.9.0-1.el7.elrepo.x86_64", "root": "\/" } }, { "id": 48805081, "num_leases": 0, "num_caps": 843, "state": "open", "replay_requests": 0, "completed_requests": 0, "reconnecting": false, "inst": "client.48805081 10.100.212.4:0\/1660598714", "client_metadata": { "entity_id": "admin", "hostname": "kb-ceph03.knc.local", "kernel_version": "4.9.0-1.el7.elrepo.x86_64", "root": "\/" } }, { "id": 48718333, "num_leases": 0, "num_caps": 840, "state": "open", "replay_requests": 0, "completed_requests": 2, "reconnecting": false, "inst": "client.48718333 10.100.212.2:0\/2586033108", "client_metadata": { "entity_id": "admin", "hostname": "kb-ceph01.knc.local", "kernel_version": "4.9.0-1.el7.elrepo.x86_64", "root": "\/" } } ]
Updated by Zheng Yan about 7 years ago
please run 'ceph daemon mds.kb-ceph03 dump cache /tmp/cachedump.0' and upload /tmp/cachedump.0. Besides, please check if there is any hang OSD requests in /sys/kernel/debug/ceph/xxxx/osdc
Updated by Zheng Yan about 7 years ago
Updated by Darrell Enns about 7 years ago
Does the mds cache dump need to be done while it is hung? It's a production system, so I wasn't able to leave it in a hung state. I'd have to wait to see if it happens again. If a cache dump while operating normal is useful, I can grab one tomorrow.
Is there a typo in that github url? It's returning 404 for me.
Updated by Darrell Enns about 7 years ago
Cache dump is here:
It was too large to attach to the ticket. The link is good for 14 days. The cluster was operating normally when the dump was taken (no capability release error).
Updated by Darrell Enns about 7 years ago
Zheng Yan wrote:
probably fixed by https://github.com/ceph/ceph-client/commit/10a2699426a732cbf3fc9e835187e8b914f0c61cy
Have you got a working URL for this commit? I'd like to figure out which Kernel versions (if any) may contain a potential fix.
Updated by elder one about 7 years ago
Hit the same bug today with 4.4.59 kernel client.
2 MDS servers (1 standby) 10.2.6-1 on Ubuntu Trusty.
From mds log:
2017-04-04 17:32:37.705768 7f08c5342700 0 log_channel(cluster) log [WRN] : client.41746305 isn't responding to mclientcaps(revoke), ino 100008eb60c pending pAsLsXsFr issued pAsLsXsFscr, sent 60.141398 seconds ago 2017-04-04 17:33:37.706640 7f08c5342700 0 log_channel(cluster) log [WRN] : client.41746305 isn't responding to mclientcaps(revoke), ino 100008eb60c pending pAsLsXsFr issued pAsLsXsFscr, sent 120.142098 seconds ago 2017-04-04 17:34:57.706859 7f08c5342700 0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 32.794437 secs 2017-04-04 17:34:57.706872 7f08c5342700 0 log_channel(cluster) log [WRN] : slow request 32.794437 seconds old, received at 2017-04-04 17:34:24.912336: clien t_request(client.41746305:884324 getattr pAsLsXsFs #100008eb60c 2017-04-04 17:34:24.909674) currently failed to rdlock, waiting
Updated by Darrell Enns about 7 years ago
Very interesting! That means we have:
4.4.59 - bug
4.7.5 - no bug (or at least rare enough that I'm not seeing it)
4.9.0 - bug
Is the 4.4.59 kernel you are using the stock Ubuntu one? I wonder if they backported some ceph code that included the bug.
Updated by Zheng Yan about 7 years ago
Darrell Enns wrote:
Zheng Yan wrote:
probably fixed by https://github.com/ceph/ceph-client/commit/10a2699426a732cbf3fc9e835187e8b914f0c61cy
Have you got a working URL for this commit? I'd like to figure out which Kernel versions (if any) may contain a potential fix.
Sorry, it's https://github.com/ceph/ceph-client/commit/d641df819db8b80198fd85d9de91137e8a823b07
Updated by Zheng Yan about 7 years ago
elder one wrote:
Hit the same bug today with 4.4.59 kernel client.
2 MDS servers (1 standby) 10.2.6-1 on Ubuntu Trusty.From mds log:
[...]
please check if there is any hang OSD requests in /sys/kernel/debug/ceph/xxxx/osdc
Updated by elder one about 7 years ago
My cephfs clients (Ubuntu Xenial) are running kernel from Ubuntu PPA: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4.59/
Ceph cluster (Ubuntu Trusty) with MD servers runs 4.4.39 kernel also from the same page.
I don't have this client node any more, killed it on privete cloud and booted next client from the same base image.
However I got all the application's logs (Postfix/Dovecot) in syslog server, from these I don't see any errors accessing cephfs.
I do monitor ceph cluster and it's disks - no high disk activity/iops durig time this error happened.
Updated by Zheng Yan about 7 years ago
elder one wrote:
My cephfs clients (Ubuntu Xenial) are running kernel from Ubuntu PPA: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4.59/
Ceph cluster (Ubuntu Trusty) with MD servers runs 4.4.39 kernel also from the same page.I don't have this client node any more, killed it on privete cloud and booted next client from the same base image.
However I got all the application's logs (Postfix/Dovecot) in syslog server, from these I don't see any errors accessing cephfs.I do monitor ceph cluster and it's disks - no high disk activity/iops durig time this error happened.
Seems like different bug. If it happens again, please dump mds cache (ceph daemon mds.xxx dump cache /tmp/cachedump.0) and check if there is any hang OSD requests in /sys/kernel/debug/ceph/xxxx/osdc
Updated by Darrell Enns about 7 years ago
Thanks for the link Zheng. Looks like that fix is in mainline as of 4.10.2. When I have a chance, I'll try upgrading and see how it goes.
Updated by elder one about 7 years ago
Error happened again:
ceph -s cluster a5ac4128-5b84-4c15-b760-2621a011d55d health HEALTH_WARN mds0: Client mailnode-6-eMail-38-2223:mailserver failing to respond to capability release monmap e1: 3 mons at {cph01=192.168.30.71:6789/0,cph02=192.168.30.72:6789/0,cph03=192.168.30.73:6789/0} election epoch 1128, quorum 0,1,2 cph01,cph02,cph03 fsmap e845: 1/1/1 up {0=cph05=up:active}, 1 up:standby-replay osdmap e67159: 55 osds: 55 up, 55 in flags require_jewel_osds pgmap v74062434: 6512 pgs, 13 pools, 6525 GB data, 3051 kobjects 19903 GB used, 29888 GB / 49791 GB avail 6512 active+clean client io 1760 kB/s rd, 2924 kB/s wr, 33 op/s rd, 337 op/s wr
Can't dump mds cache during a day as it locked my fs for a minute last time dumping/reloading 1.5M files.
Empty output from /sys/kernel/debug/ceph/xxxx/osdc file on all 3 fs kernel clients.
Checked other files in the same dir as well: on the node cluster reports error (mailnode-6-eMail-38-2223)
there is extra line: want next osdmap in /sys/kernel/debug/ceph/xxxx/monc. Msdmap and osdmap numbers in this file are the same on all nodes.
Also /sys/kernel/debug/ceph/xxxx/mdsc on the client ceph reports error, is empty but on other 2 healty nodes there's a list with references reported in mds log.
779015 mds0 getattr #10000908ca7 780164 mds0 getattr #10000908ca7 781191 mds0 getattr #10000908ca7 781758 mds0 getattr #10000908ca7 782175 mds0 getattr #10000908ca7 783760 mds0 getattr #10000908ca7 786552 mds0 getattr #10000908ca7 789168 mds0 getattr #10000908ca7 790370 mds0 getattr #10000908ca7
Fragment from mds log:
2017-04-10 12:56:06.469300 7f1d0b5f6700 0 log_channel(cluster) log [WRN] : 28 slow requests, 2 included below; oldest blocked for > 4013.082964 secs 2017-04-10 12:56:06.469341 7f1d0b5f6700 0 log_channel(cluster) log [WRN] : slow request 3842.578557 seconds old, received at 2017-04-10 11:52:03.890467: client_request(client.41923726:781758 getattr pAsLsXsFs #10000908ca7 2017-04-10 11:52:03.887023) currently failed to rdlock, waiting 2017-04-10 12:56:06.469676 7f1d0b5f6700 0 log_channel(cluster) log [WRN] : slow request 482.127906 seconds old, received at 2017-04-10 12:48:04.341118: client_request(client.41932292:734242 getattr pAsLsXsFs #10000908ca7 2017-04-10 12:48:04.337616) currently failed to rdlock, waiting
Updated by Zheng Yan about 7 years ago
Can you try updating the cephfs client to 4.10.x kernel
Updated by elder one about 7 years ago
Upgraded cephfs clients to 4.10.9 kernel.
Updated by elder one about 7 years ago
Some wierd issues raised with my applications (unrelated to cephfs) with 4.10.x kernel.
The same fix is in 4.9.x kernel tree since version 4.9.14
Downgraded kernel to 4.9.21.
Let's see how it works...
Updated by elder one about 7 years ago
Got another error with 4.9.21 kernel.
On the cephfs node /sys/kernel/debug/ceph/xxx/:
cat osdc REQUESTS 0 homeless 0 LINGER REQUESTS
cat mdsc 699592 mds0 getattr #100009133ce
In the MDS log:
2017-04-11 16:20:49.021727 7f1d0b5f6700 0 log_channel(cluster) log [WRN] : client.42086552 isn't responding to mclientcaps(revoke), ino 100009133ce pending pAsLsXsFr issued pAsLsXsFscr, sent 123.722723 seconds ago 2017-04-11 16:20:59.010267 7f1d0b5f6700 0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 32.989459 secs 2017-04-11 16:20:59.010336 7f1d0b5f6700 0 log_channel(cluster) log [WRN] : slow request 32.989459 seconds old, received at 2017-04-11 16:20:26.020377: client_request(client.42086552:699592 getattr pAsLsXsFs #100009133ce 2017-04-11 16:20:26.013495) currently failed to rdlock, waiting
Seems to me that Dovecot server on cephfs node locks things up - As I stopped Dovecot service - didn't stop normally:
service dovecot stop .. syslog .. systemd[1]: Closed Dovecot IMAP/POP3 email server activation socket. systemd[1]: Stopping Dovecot IMAP/POP3 email server... systemd[1]: dovecot.service: State 'stop-sigterm' timed out. Killing. systemd[1]: Stopped Dovecot IMAP/POP3 email server. systemd[1]: dovecot.service: Unit entered failed state. systemd[1]: dovecot.service: Failed with result 'timeout'.
After Dovecot terminated, Ceph cluster reported OK status.
Updated by elder one about 7 years ago
Dumped also mds cache (healhty cluster state 8 hours later)
Searched inode in mds error log:
[inode 100009133ce [2,head] /mail/data/virtual/domain/username/cur/1491916721.M239018P30451.mailnode-19-eMail-38-2241,S=134272,W=136238:2,S auth v26371 s=134272 n(v0 b134272 1=1+0) (iversion lock) caps={42066516=pAsLsXsFscr/-@1,42086552=pAsXs/-@12} | ptrwaiter=0 request=0 lock=0 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=0 0x55900a71f680]
Updated by Zheng Yan about 7 years ago
elder one wrote:
Got another error with 4.9.21 kernel.
On the cephfs node /sys/kernel/debug/ceph/xxx/:
[...]
[...]In the MDS log:
[...]Seems to me that Dovecot server on cephfs node locks things up - As I stopped Dovecot service - didn't stop normally:
[...]
After Dovecot terminated, Ceph cluster reported OK status.
It's caused by different bug. could you please try adding "mmap_disable = yes" to dovecot config
Updated by elder one about 7 years ago
I do have mmap disabled in Dovecot conf.
Relevant bits from Dovecot conf:
mmap_disable = yes
mail_nfs_index = yes
mail_nfs_storage = yes
mail_fsync = always
lock_method = fcntl
Updated by Zheng Yan about 7 years ago
could you try adding commit 2b1ac852 (ceph: try getting buffer capability for readahead/fadvise) to your 4.9.x kernel. If it still does not work, need kernel debugging to find what happened.(which may be not convenience for your case)
Updated by elder one about 7 years ago
Well, managed to patch ceph kernel module with 2b1ac852 commit, but my Ubuntu 4.9 kernel will not load unsigned modules...
error in syslog on cephfs mount
libceph: loaded (mon/osd proto 15/24) PKCS#7 signature not signed with a trusted key ceph: loading out-of-tree module taints kernel. ceph: module verification failed: signature and/or required key missing - tainting kernel
Updated by Zheng Yan about 7 years ago
elder one wrote:
Well, managed to patch ceph kernel module with 2b1ac852 commit, but my Ubuntu 4.9 kernel will not load unsigned modules...
error in syslog on cephfs mount
[...]
try installing complied kernel and modules?
Updated by elder one about 7 years ago
Yeah, did that.
My test server with compiled 4.9 kernel and patched ceph module mounted cephfs just fine.
It might take a while to run it on production servers.
Any chance this 2b1ac852 patch gets packported to 4.9 mainline kernel?
Updated by elder one about 7 years ago
One difference I noticed between 4.4 and 4.9 kernels
- with 4.4 kernel on cephfs directory sizes (total bytes of all files inside dir) are shown but 4.9 kernel shows file count inside directory next level.
Is it bug of feature?
Updated by Darrell Enns about 7 years ago
FYI, I don't think the module signature stuff is an issue. It's just notifying you that you've loaded a module that didn't come with your distribution. It should still work just fine.
Updated by elder one about 7 years ago
Yeah, you're right Darrell, operator error :) I guess VM did not have correct storage network attached when I tried to mount CephFS on stock kernel with patched ceph module.
Anyway, running now with recompiled 4.9.22 kernel and patched ceph module.
Updated by elder one almost 7 years ago
Just reporting that no errors after using patched (commit 2b1ac852) cephfs kernel module.
Also upgraded ceph to 10.2.7 two weeks ago.
Could this commit be backported to 4.9 longterm kernel?
Updated by Zheng Yan almost 7 years ago
elder one wrote:
One difference I noticed between 4.4 and 4.9 kernels
- with 4.4 kernel on cephfs directory sizes (total bytes of all files inside dir) are shown but 4.9 kernel shows file count inside directory next level.Is it bug of feature?
feature. controlled by rbytes/norbytes mount option
Updated by Zheng Yan almost 7 years ago
elder one wrote:
Just reporting that no errors after using patched (commit 2b1ac852) cephfs kernel module.
Also upgraded ceph to 10.2.7 two weeks ago.
Could this commit be backported to 4.9 longterm kernel?
I have sent request to stable@vger.kernel.org
Updated by elder one almost 7 years ago
Zheng Yan wrote:
elder one wrote:
One difference I noticed between 4.4 and 4.9 kernels
- with 4.4 kernel on cephfs directory sizes (total bytes of all files inside dir) are shown but 4.9 kernel shows file count inside directory next level.Is it bug of feature?
feature. controlled by rbytes/norbytes mount option
Hmm, manual states that rbytes is ON by default. My cephfs is not mounted with norbytes option.
Updated by Zheng Yan almost 7 years ago
- Status changed from New to Resolved
"ceph: try getting buffer capability for readahead/fadvise" has backported into 4.9.x