Bug #45663: luminous to nautilus upgrade - CephFS - Ceph

Actions

Copy link

Bug #45663

open

luminous to nautilus upgrade

Added by none none almost 4 years ago. Updated almost 2 years ago.

Status:

Triaged

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

pacific,octopus,nautilus

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v14.2.9

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I have been using snapshots on cephfs since luminous, 1xfs and
1xactivemds and used an rsync on it for backup (mounted cephfs on osd node).
Under luminious I did not encounter any problems with this setup. I
think I was even snapshotting user dirs every 7 days having thousands
of snapshots (which I later heard, is not recommend and one should
stick below 400 or so?)

When upgrading to nautilus, this snapshot feature was disabled (that
is default in the upgrade). Did not notice nor expected this. When I
enabled again snapshotting. I had problems with the rsync backup. So I
reverted back to the slower ceph-fuse mount. I also brought down the
snapshots to 36, but I am still stuck with "clients failing to respond
to capability release", "clients failing to respond to cache pressure"
and "MDSs report slow requests"
Which is odd, since my use did not change since luminous.

I know about the issue of running ceph kernel clients on osd nodes.
But this node has enough memory.

[@ ~]# free -m
total used free shared buff/cache available
Mem: 160989 42452 73562 110 44974 117745
Swap: 10239 534 9705

Since the upgrade, I think I also have problems with nfs-ganesha, I don't think this is related to bad connection.

MDS_CLIENT_LATE_RELEASE 1 clients failing to respond to capability release
mdsa(mds.0): Client c01:cephfs.nfs failing to respond to capability release client_id: 4012632
MDS_CLIENT_RECALL 2 clients failing to respond to cache pressure
mdsa(mds.0): Client c04:cephfs.backup failing to respond to cache pressure client_id: 3905222
mdsa(mds.0): Client c01:cephfs.nfs failing to respond to cache pressure client_id: 4012617
MDS_SLOW_REQUEST 1 MDSs report slow requests
mdsa(mds.0): 1 slow requests are blocked > 30 secs

as requested here, opened tracker ticket
https://www.mail-archive.com/ceph-users@ceph.io/msg03592.html

Actions

Copy link

Updated by none none almost 4 years ago

Related to this issue
https://tracker.ceph.com/issues/44100

Actions

Copy link

Updated by Patrick Donnelly almost 4 years ago

Status changed from New to Triaged
Assignee set to Zheng Yan
Target version set to v16.0.0

Actions

Copy link

Updated by none none almost 4 years ago

I think the xlock causes this

2020-06-13 17:32:24.920 7fb5edd82700 0 log_channel(cluster) log [WRN] :
slow request 240.294505 seconds old, received at 2020-06-13
17:28:24.626608: client_request(client.4021284:2468 setattr
mtime=2020-04-22 12:58:47.000000 atime=2020-06-13 17:28:24.000000
#0x100001b9177 2020-06-13 17:28:24.626527 caller_uid=500,
caller_gid=500{500,1,2,3,4,6,10,}) currently failed to xlock, waiting

https://www.mail-archive.com/ceph-users@ceph.io/msg04527.html

Actions

Copy link