Project

General

Profile

Actions

Bug #18798

closed

FS activity hung, MDS reports client "failing to respond to capability release"

Added by Darrell Enns about 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
cephfs, mds
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I've had two occurrences in the past 3 weeks where filesystem activity hangs, with the MDS report a client "failing to respond to capability release". I'm using the Kernel cephfs client. Ceph is version 10.2.5 and Kernel is version 4.9.0. I've since downgraded the kernel to 4.7.5, as I was running that Kernel previously and did not have any issues (which could be just luck, or the bug was introduced somewhere between 4.7.5 and 4.9.0).

More details below:

[root@kb-ceph03 ~]# ceph status
    cluster 3bd52056-9ed8-4ff6-9359-6a3ea437d1e9
     health HEALTH_WARN
            mds0: Client kb-ceph03.knc.local: failing to respond to capability release
     monmap e2: 3 mons at {kb-ceph01=10.100.212.2:6789/0,kb-ceph02=10.100.212.3:6789/0,kb-ceph03=10.100.212.4:6789/0}
            election epoch 362, quorum 0,1,2 kb-ceph01,kb-ceph02,kb-ceph03
      fsmap e215: 1/1/1 up {0=kb-ceph03=up:active}, 2 up:standby
     osdmap e1293: 24 osds: 24 up, 24 in
            flags sortbitwise,require_jewel_osds
      pgmap v2612120: 1296 pgs, 4 pools, 5782 GB data, 1446 kobjects
            15314 GB used, 115 TB / 130 TB avail
                1296 active+clean
  client io 8127 kB/s wr, 0 op/s rd, 1 op/s wr

[root@kb-ceph03 ~]# ceph mds stat
e215: 1/1/1 up {0=kb-ceph03=up:active}, 2 up:standby

[root@kb-ceph03 ~]# ceph daemon mds.kb-ceph03 dump_ops_in_flight
{
    "ops": [
        {
            "description": "client_request(client.48718333:240331824 getattr pAsLsXsFs #1000000498c 2017-02-01 14:03:38.913579)",
            "initiated_at": "2017-02-01 14:03:38.914845",
            "age": 1446.955546,
            "duration": 1446.955708,
            "type_data": [
                "failed to rdlock, waiting",
                "client.48718333:240331824",
                "client_request",
                {
                    "client": "client.48718333",
                    "tid": 240331824
                },
                [
                    {
                        "time": "2017-02-01 14:03:38.914845",
                        "event": "initiated" 
                    },
                    {
                        "time": "2017-02-01 14:03:38.914958",
                        "event": "failed to rdlock, waiting" 
                    }
                ]
            ]
        },
        {
            "description": "client_request(client.48718333:240322739 getattr pAsLsXsFs #1000000498c 2017-02-01 14:01:27.049810)",
            "initiated_at": "2017-02-01 14:01:27.049678",
            "age": 1578.820714,
            "duration": 1578.820904,
            "type_data": [
                "failed to rdlock, waiting",
                "client.48718333:240322739",
                "client_request",
                {
                    "client": "client.48718333",
                    "tid": 240322739
                },
                [
                    {
                        "time": "2017-02-01 14:01:27.049678",
                        "event": "initiated" 
                    },
                    {
                        "time": "2017-02-01 14:01:27.049869",
                        "event": "failed to rdlock, waiting" 
                    }
                ]
            ]
        }
    ],
    "num_ops": 2
}

[root@kb-ceph03 ~]# ceph daemon mds.kb-ceph03 session ls
[
    {
        "id": 68523984,
        "num_leases": 1,
        "num_caps": 820,
        "state": "open",
        "replay_requests": 0,
        "completed_requests": 0,
        "reconnecting": false,
        "inst": "client.68523984 10.100.212.3:0\/2863784256",
        "client_metadata": {
            "entity_id": "admin",
            "hostname": "kb-ceph02.knc.local",
            "kernel_version": "4.9.0-1.el7.elrepo.x86_64",
            "root": "\/" 
        }
    },
    {
        "id": 48805081,
        "num_leases": 0,
        "num_caps": 843,
        "state": "open",
        "replay_requests": 0,
        "completed_requests": 0,
        "reconnecting": false,
        "inst": "client.48805081 10.100.212.4:0\/1660598714",
        "client_metadata": {
            "entity_id": "admin",
            "hostname": "kb-ceph03.knc.local",
            "kernel_version": "4.9.0-1.el7.elrepo.x86_64",
            "root": "\/" 
        }
    },
    {
        "id": 48718333,
        "num_leases": 0,
        "num_caps": 840,
        "state": "open",
        "replay_requests": 0,
        "completed_requests": 2,
        "reconnecting": false,
        "inst": "client.48718333 10.100.212.2:0\/2586033108",
        "client_metadata": {
            "entity_id": "admin",
            "hostname": "kb-ceph01.knc.local",
            "kernel_version": "4.9.0-1.el7.elrepo.x86_64",
            "root": "\/" 
        }
    }
]
Actions #1

Updated by Zheng Yan about 7 years ago

please run 'ceph daemon mds.kb-ceph03 dump cache /tmp/cachedump.0' and upload /tmp/cachedump.0. Besides, please check if there is any hang OSD requests in /sys/kernel/debug/ceph/xxxx/osdc

Actions #3

Updated by Darrell Enns about 7 years ago

Does the mds cache dump need to be done while it is hung? It's a production system, so I wasn't able to leave it in a hung state. I'd have to wait to see if it happens again. If a cache dump while operating normal is useful, I can grab one tomorrow.

Is there a typo in that github url? It's returning 404 for me.

Actions #4

Updated by Darrell Enns about 7 years ago

Cache dump is here:

https://knowledgenetworkbc-my.sharepoint.com/personal/darrelle_knowledge_ca/_layouts/15/guestaccess.aspx?docid=03aaaa62646f84d9b9760d9baeff0dbbc&authkey=AaMPXkH1G1KmDg4NehVtYoE&expiration=2017-02-17T17:27:49.000Z

It was too large to attach to the ticket. The link is good for 14 days. The cluster was operating normally when the dump was taken (no capability release error).

Actions #5

Updated by Darrell Enns about 7 years ago

Zheng Yan wrote:

probably fixed by https://github.com/ceph/ceph-client/commit/10a2699426a732cbf3fc9e835187e8b914f0c61cy

Have you got a working URL for this commit? I'd like to figure out which Kernel versions (if any) may contain a potential fix.

Actions #6

Updated by elder one about 7 years ago

Hit the same bug today with 4.4.59 kernel client.
2 MDS servers (1 standby) 10.2.6-1 on Ubuntu Trusty.

From mds log:

2017-04-04 17:32:37.705768 7f08c5342700  0 log_channel(cluster) log [WRN] : client.41746305 isn't responding to mclientcaps(revoke), ino 100008eb60c pending
pAsLsXsFr issued pAsLsXsFscr, sent 60.141398 seconds ago
2017-04-04 17:33:37.706640 7f08c5342700  0 log_channel(cluster) log [WRN] : client.41746305 isn't responding to mclientcaps(revoke), ino 100008eb60c pending
pAsLsXsFr issued pAsLsXsFscr, sent 120.142098 seconds ago
2017-04-04 17:34:57.706859 7f08c5342700  0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 32.794437 secs
2017-04-04 17:34:57.706872 7f08c5342700  0 log_channel(cluster) log [WRN] : slow request 32.794437 seconds old, received at 2017-04-04 17:34:24.912336: clien
t_request(client.41746305:884324 getattr pAsLsXsFs #100008eb60c 2017-04-04 17:34:24.909674) currently failed to rdlock, waiting

Actions #7

Updated by Darrell Enns about 7 years ago

Very interesting! That means we have:

4.4.59 - bug
4.7.5 - no bug (or at least rare enough that I'm not seeing it)
4.9.0 - bug

Is the 4.4.59 kernel you are using the stock Ubuntu one? I wonder if they backported some ceph code that included the bug.

Actions #8

Updated by Zheng Yan about 7 years ago

Darrell Enns wrote:

Zheng Yan wrote:

probably fixed by https://github.com/ceph/ceph-client/commit/10a2699426a732cbf3fc9e835187e8b914f0c61cy

Have you got a working URL for this commit? I'd like to figure out which Kernel versions (if any) may contain a potential fix.

Sorry, it's https://github.com/ceph/ceph-client/commit/d641df819db8b80198fd85d9de91137e8a823b07

Actions #9

Updated by Zheng Yan about 7 years ago

elder one wrote:

Hit the same bug today with 4.4.59 kernel client.
2 MDS servers (1 standby) 10.2.6-1 on Ubuntu Trusty.

From mds log:
[...]

please check if there is any hang OSD requests in /sys/kernel/debug/ceph/xxxx/osdc

Actions #10

Updated by elder one about 7 years ago

My cephfs clients (Ubuntu Xenial) are running kernel from Ubuntu PPA: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4.59/
Ceph cluster (Ubuntu Trusty) with MD servers runs 4.4.39 kernel also from the same page.

I don't have this client node any more, killed it on privete cloud and booted next client from the same base image.
However I got all the application's logs (Postfix/Dovecot) in syslog server, from these I don't see any errors accessing cephfs.

I do monitor ceph cluster and it's disks - no high disk activity/iops durig time this error happened.

Actions #11

Updated by Zheng Yan about 7 years ago

elder one wrote:

My cephfs clients (Ubuntu Xenial) are running kernel from Ubuntu PPA: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4.59/
Ceph cluster (Ubuntu Trusty) with MD servers runs 4.4.39 kernel also from the same page.

I don't have this client node any more, killed it on privete cloud and booted next client from the same base image.
However I got all the application's logs (Postfix/Dovecot) in syslog server, from these I don't see any errors accessing cephfs.

I do monitor ceph cluster and it's disks - no high disk activity/iops durig time this error happened.

Seems like different bug. If it happens again, please dump mds cache (ceph daemon mds.xxx dump cache /tmp/cachedump.0) and check if there is any hang OSD requests in /sys/kernel/debug/ceph/xxxx/osdc

Actions #12

Updated by Darrell Enns about 7 years ago

Thanks for the link Zheng. Looks like that fix is in mainline as of 4.10.2. When I have a chance, I'll try upgrading and see how it goes.

Actions #13

Updated by elder one about 7 years ago

Error happened again:

ceph -s
    cluster a5ac4128-5b84-4c15-b760-2621a011d55d
     health HEALTH_WARN
            mds0: Client mailnode-6-eMail-38-2223:mailserver failing to respond to capability release
     monmap e1: 3 mons at {cph01=192.168.30.71:6789/0,cph02=192.168.30.72:6789/0,cph03=192.168.30.73:6789/0}
            election epoch 1128, quorum 0,1,2 cph01,cph02,cph03
      fsmap e845: 1/1/1 up {0=cph05=up:active}, 1 up:standby-replay
     osdmap e67159: 55 osds: 55 up, 55 in
            flags require_jewel_osds
      pgmap v74062434: 6512 pgs, 13 pools, 6525 GB data, 3051 kobjects
            19903 GB used, 29888 GB / 49791 GB avail
                6512 active+clean
  client io 1760 kB/s rd, 2924 kB/s wr, 33 op/s rd, 337 op/s wr

Can't dump mds cache during a day as it locked my fs for a minute last time dumping/reloading 1.5M files.

Empty output from /sys/kernel/debug/ceph/xxxx/osdc file on all 3 fs kernel clients.

Checked other files in the same dir as well: on the node cluster reports error (mailnode-6-eMail-38-2223)
there is extra line: want next osdmap in /sys/kernel/debug/ceph/xxxx/monc. Msdmap and osdmap numbers in this file are the same on all nodes.

Also /sys/kernel/debug/ceph/xxxx/mdsc on the client ceph reports error, is empty but on other 2 healty nodes there's a list with references reported in mds log.

779015  mds0    getattr  #10000908ca7
780164  mds0    getattr  #10000908ca7
781191  mds0    getattr  #10000908ca7
781758  mds0    getattr  #10000908ca7
782175  mds0    getattr  #10000908ca7
783760  mds0    getattr  #10000908ca7
786552  mds0    getattr  #10000908ca7
789168  mds0    getattr  #10000908ca7
790370  mds0    getattr  #10000908ca7

Fragment from mds log:

2017-04-10 12:56:06.469300 7f1d0b5f6700  0 log_channel(cluster) log [WRN] : 28 slow requests, 2 included below; oldest blocked for > 4013.082964 secs
2017-04-10 12:56:06.469341 7f1d0b5f6700  0 log_channel(cluster) log [WRN] : slow request 3842.578557 seconds old, received at 2017-04-10 11:52:03.890467: client_request(client.41923726:781758 getattr pAsLsXsFs #10000908ca7 2017-04-10 11:52:03.887023) currently failed to rdlock, waiting
2017-04-10 12:56:06.469676 7f1d0b5f6700  0 log_channel(cluster) log [WRN] : slow request 482.127906 seconds old, received at 2017-04-10 12:48:04.341118: client_request(client.41932292:734242 getattr pAsLsXsFs #10000908ca7 2017-04-10 12:48:04.337616) currently failed to rdlock, waiting

Actions #14

Updated by Zheng Yan about 7 years ago

Can you try updating the cephfs client to 4.10.x kernel

Actions #15

Updated by elder one about 7 years ago

Upgraded cephfs clients to 4.10.9 kernel.

Actions #16

Updated by elder one about 7 years ago

Some wierd issues raised with my applications (unrelated to cephfs) with 4.10.x kernel.

The same fix is in 4.9.x kernel tree since version 4.9.14
Downgraded kernel to 4.9.21.

Let's see how it works...

Actions #17

Updated by elder one about 7 years ago

Got another error with 4.9.21 kernel.

On the cephfs node /sys/kernel/debug/ceph/xxx/:

cat osdc
REQUESTS 0 homeless 0
LINGER REQUESTS

cat mdsc
699592  mds0    getattr  #100009133ce

In the MDS log:

2017-04-11 16:20:49.021727 7f1d0b5f6700  0 log_channel(cluster) log [WRN] : client.42086552 isn't responding to mclientcaps(revoke), ino 100009133ce pending pAsLsXsFr issued pAsLsXsFscr, sent 123.722723 seconds ago
2017-04-11 16:20:59.010267 7f1d0b5f6700  0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 32.989459 secs
2017-04-11 16:20:59.010336 7f1d0b5f6700  0 log_channel(cluster) log [WRN] : slow request 32.989459 seconds old, received at 2017-04-11 16:20:26.020377: client_request(client.42086552:699592 getattr pAsLsXsFs #100009133ce 2017-04-11 16:20:26.013495) currently failed to rdlock, waiting

Seems to me that Dovecot server on cephfs node locks things up - As I stopped Dovecot service - didn't stop normally:

service dovecot stop
.. syslog ..
systemd[1]: Closed Dovecot IMAP/POP3 email server activation socket.
systemd[1]: Stopping Dovecot IMAP/POP3 email server...
systemd[1]: dovecot.service: State 'stop-sigterm' timed out. Killing.
systemd[1]: Stopped Dovecot IMAP/POP3 email server.
systemd[1]: dovecot.service: Unit entered failed state.
systemd[1]: dovecot.service: Failed with result 'timeout'.

After Dovecot terminated, Ceph cluster reported OK status.

Actions #18

Updated by elder one about 7 years ago

Dumped also mds cache (healhty cluster state 8 hours later)
Searched inode in mds error log:

[inode 100009133ce [2,head] /mail/data/virtual/domain/username/cur/1491916721.M239018P30451.mailnode-19-eMail-38-2241,S=134272,W=136238:2,S 
auth v26371 s=134272 n(v0 b134272 1=1+0) (iversion lock) caps={42066516=pAsLsXsFscr/-@1,42086552=pAsXs/-@12} |
 ptrwaiter=0 request=0 lock=0 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=0 0x55900a71f680]
Actions #19

Updated by Zheng Yan about 7 years ago

elder one wrote:

Got another error with 4.9.21 kernel.

On the cephfs node /sys/kernel/debug/ceph/xxx/:

[...]
[...]

In the MDS log:
[...]

Seems to me that Dovecot server on cephfs node locks things up - As I stopped Dovecot service - didn't stop normally:

[...]

After Dovecot terminated, Ceph cluster reported OK status.

It's caused by different bug. could you please try adding "mmap_disable = yes" to dovecot config

Actions #20

Updated by elder one about 7 years ago

I do have mmap disabled in Dovecot conf.

Relevant bits from Dovecot conf:

mmap_disable = yes
mail_nfs_index = yes
mail_nfs_storage = yes
mail_fsync = always
lock_method = fcntl

Actions #21

Updated by Zheng Yan about 7 years ago

could you try adding commit 2b1ac852 (ceph: try getting buffer capability for readahead/fadvise) to your 4.9.x kernel. If it still does not work, need kernel debugging to find what happened.(which may be not convenience for your case)

Actions #22

Updated by elder one about 7 years ago

Well, managed to patch ceph kernel module with 2b1ac852 commit, but my Ubuntu 4.9 kernel will not load unsigned modules...

error in syslog on cephfs mount

libceph: loaded (mon/osd proto 15/24)
PKCS#7 signature not signed with a trusted key
ceph: loading out-of-tree module taints kernel.
ceph: module verification failed: signature and/or required key missing - tainting kernel

Actions #23

Updated by Zheng Yan about 7 years ago

elder one wrote:

Well, managed to patch ceph kernel module with 2b1ac852 commit, but my Ubuntu 4.9 kernel will not load unsigned modules...

error in syslog on cephfs mount
[...]

try installing complied kernel and modules?

Actions #24

Updated by elder one about 7 years ago

Yeah, did that.
My test server with compiled 4.9 kernel and patched ceph module mounted cephfs just fine.
It might take a while to run it on production servers.

Any chance this 2b1ac852 patch gets packported to 4.9 mainline kernel?

Actions #25

Updated by elder one about 7 years ago

One difference I noticed between 4.4 and 4.9 kernels
- with 4.4 kernel on cephfs directory sizes (total bytes of all files inside dir) are shown but 4.9 kernel shows file count inside directory next level.

Is it bug of feature?

Actions #26

Updated by Darrell Enns about 7 years ago

FYI, I don't think the module signature stuff is an issue. It's just notifying you that you've loaded a module that didn't come with your distribution. It should still work just fine.

Actions #27

Updated by elder one about 7 years ago

Yeah, you're right Darrell, operator error :) I guess VM did not have correct storage network attached when I tried to mount CephFS on stock kernel with patched ceph module.

Anyway, running now with recompiled 4.9.22 kernel and patched ceph module.

Actions #28

Updated by Josh Durgin about 7 years ago

  • Project changed from Ceph to CephFS
Actions #29

Updated by elder one almost 7 years ago

Just reporting that no errors after using patched (commit 2b1ac852) cephfs kernel module.

Also upgraded ceph to 10.2.7 two weeks ago.

Could this commit be backported to 4.9 longterm kernel?

Actions #30

Updated by Zheng Yan almost 7 years ago

elder one wrote:

One difference I noticed between 4.4 and 4.9 kernels
- with 4.4 kernel on cephfs directory sizes (total bytes of all files inside dir) are shown but 4.9 kernel shows file count inside directory next level.

Is it bug of feature?

feature. controlled by rbytes/norbytes mount option

Actions #31

Updated by Zheng Yan almost 7 years ago

elder one wrote:

Just reporting that no errors after using patched (commit 2b1ac852) cephfs kernel module.

Also upgraded ceph to 10.2.7 two weeks ago.

Could this commit be backported to 4.9 longterm kernel?

I have sent request to

Actions #32

Updated by elder one almost 7 years ago

Zheng Yan wrote:

elder one wrote:

One difference I noticed between 4.4 and 4.9 kernels
- with 4.4 kernel on cephfs directory sizes (total bytes of all files inside dir) are shown but 4.9 kernel shows file count inside directory next level.

Is it bug of feature?

feature. controlled by rbytes/norbytes mount option

Hmm, manual states that rbytes is ON by default. My cephfs is not mounted with norbytes option.

Actions #33

Updated by Zheng Yan almost 7 years ago

  • Status changed from New to Resolved

"ceph: try getting buffer capability for readahead/fadvise" has backported into 4.9.x

Actions

Also available in: Atom PDF