Bug #58376: CephFS Snapshots are accessible even when it's deleted from the other client - CephFS - Ceph

Actions

Copy link

Bug #58376

open

CephFS Snapshots are accessible even when it's deleted from the other client

Added by Kotresh Hiremath Ravishankar over 1 year ago. Updated 5 months ago.

Status:

Triaged

Priority:

Normal

Assignee:

Venky Shankar

Category:

Correctness/Safety

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

pacific,quincy

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v17.2.1

ceph-qa-suite:

Component(FS):

Client, MDS

Labels (FS):

snapshots

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

The issue is seen by upstream user. The snapshot is still accessible from a client which was copying it even when it's deleted from the other client.
Local reproducer:

#Create 1GiB file and create a snapshot. Fuse mounts are being used.
$ dd if=/dev/urandom of=/mnt/file-1GB bs=4M count=250
250+0 records in
250+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 3.19042 s, 329 MB/s
[kotresh@fedora mnt]$ ls -lrth
total 1000M
-rw-r--r--. 1 kotresh kotresh 1000M Jan  4 14:54 file-1GB
$ mkdir /mnt/.snap/snapshot1

#Copy snapshot from client /mnt
$ cp -p /mnt/.snap/snapshot1/file-1GB ~/

# Delete from other client /mnt1
$ rmdir /mnt1/.snap/snapshot1
$ ls -l /mnt1/.snap
total 0

# Copy is still successful from client /mnt
$ cp -p /mnt/.snap/snapshot1/file-1GB ~/file3
$ ls -l /mnt/.snap/
total 0
$ cp -p /mnt/.snap/snapshot1/file-1GB ~/file4
$ ls -l /mnt/.snap/
total 0
$ cp -p /mnt/.snap/snapshot1/file-1GB ~/file5
$ cp -p /mnt/.snap/snapshot1/file-1GB ~/file6

Here is the copy of the mail from ceph users list (https://www.spinics.net/lists/ceph-users/msg75291.html).

Dear ceph community,

We have two ceph cluster of equal size, one main and one mirror, both using cephadm and on version

ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy (stable)

We are stuck with copying a large file (~ 64G) between the CephFS file systems of the two clusters.

The source path is a snapshot (i.e. something like /my/path/.snap/schedule_some-date/…).
But I don't think that should make any difference.

First, I was thinking that I need to adapt some rsync parameters to work better with bigger files on CephFS.
But when confirming by just copying the file with cp, the transfer get's also stuck.
Without any error message, the process just keeps running (rsync or cp).
But the file size on the target doesn't increase anymore at some point (almost 85%).

Main:
-rw------- 1 cockpit-ws printadmin 68360698297 16. Nov 13:40 LB22_2764_dragen.bam

Mirror:
-rw------- 1 root root 58099499008 22. Dez 15:54 LB22_2764_dragen.bam

Our CephFS file size limit is with 10 TB more than generous.
And as far as I know from clients there are indeed files in TB ranges on the cluster without issues.

I don't know if this is the file's fault or if this is some issue with either of the CephFS' or cluster.
And I don't know how to look and troubleshoot this.
Can anybody give me a tip where I can start looking and debug these kind of issues?

---

Trying to exclude clusters and/or clients might have gotten me on the right track. It might have been a client issue or actually a snapshot retention issue. As it turned out when I tried other routes for the data using a different client, the data was not available anymore since the snapshot had been trimmed.

We got behind syncing our snapshots a while ago (due to other issues). And now we are somewhere in between our weekly (16 weeks) and daily (30 days) snapshots. So, I assume before we catch up with daily (<30), there is a general risk that snapshots disappear while we are syncing them.

The funny/weird thing is though (and why I didn't catch up on this), the particular file (and potentially others) of this trimmed snapshot was apparently still available for the client I initially used for the transfer. I'm wondering if the client somehow cached the data until the snapshot got trimmed. And then just re-tried copying the incompletely cached data.

Continuing with the next available snapshot, mirroring/syncing is now catching up again. I expect it might happen again once we catch up to the 30-days threshold. If the time point of snapshot trimming falls into the syncinc time frame. But then I know to just cancel/skip the current snapshot and continue with the next one. Syncing time is short enough to get me over the hill then before the next trimming.

Note to myself: Next time something similar things happens, check if different clients AND different snapshots or original data behave the same.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Kotresh Hiremath Ravishankar over 1 year ago

Subject changed from CephFS Snapshot accessible even when it's deleted from the other client to CephFS Snapshots are accessible even when it's deleted from the other client

Actions

Copy link

Updated by Venky Shankar over 1 year ago

Kotresh Hiremath Ravishankar wrote:

The issue is seen by upstream user. The snapshot is still accessible from a client which was copying it even when it's deleted from the other client.

I cannot reproduce this with a smaller file size. Would you mind trying? If that's the case, I think it maybe just the inode lingering around in the client till the OSDs trim the snapshot and with the client having the appropriate caps, they are able to read the data from the OSDs.

Actions

Copy link

Updated by Kotresh Hiremath Ravishankar over 1 year ago

Venky Shankar wrote:

Kotresh Hiremath Ravishankar wrote:

The issue is seen by upstream user. The snapshot is still accessible from a client which was copying it even when it's deleted from the other client.

I cannot reproduce this with a smaller file size. Would you mind trying? If that's the case, I think it maybe just the inode lingering around in the client till the OSDs trim the snapshot and with the client having the appropriate caps, they are able to read the data from the OSDs.

I think there is some race. I am not able to reproduce it now. I could do it yesterday. I will check and update.

Actions

Copy link

Updated by Kotresh Hiremath Ravishankar over 1 year ago

Now you can still copy the snapshot file from /mnt
[kotresh@fedora build]$ cp -p /mnt/dir1/.snap/snapshot1/file1-4GB ~/file2

So, client is able to lookup(/dir1/.snap/snapshot1/file1-4GB) post deletion - do we know why? Should be evident from the logs.

Actions

Copy link

Updated by Venky Shankar over 1 year ago

Status changed from New to Triaged
Assignee set to Kotresh Hiremath Ravishankar
Target version set to v18.0.0
Backport set to pacific,quincy
Component(FS) Client, MDS added

Actions

Copy link

Updated by Greg Farnum over 1 year ago

Kotresh Hiremath Ravishankar wrote:

Now you can still copy the snapshot file from /mnt
[kotresh@fedora build]$ cp -p /mnt/dir1/.snap/snapshot1/file1-4GB ~/file2

So, client is able to lookup(/dir1/.snap/snapshot1/file1-4GB) post deletion - do we know why? Should be evident from the logs.

We’re a lot less careful with caps for snapshots (as they are read-only) versus normal files. We may not be able to revoke them properly to begin with?

Actions

Copy link

Updated by Kotresh Hiremath Ravishankar over 1 year ago

The client log analysis. The lookup on '.snap/snapshot1' is successful from the dentry cache. The caps and the lease was valid and hence didn't ask mds. Please check the logs below.

158 2023-01-11T15:33:41.187+0530 7f1e3b7fe6c0  3 client.4134 ll_lookup 0x10000000000.snapdir snapshot1
159 2023-01-11T15:33:41.187+0530 7f1e3b7fe6c0 20 client.4134 may_lookup 0x10000000000.snapdir(faked_ino=0 nref=6 ll_ref=29 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2023-01-11T15:23:58.011371+0530 mtime=2023-01-11T15:24:05.398010+0530 ctime=2023-01-11T15:26:10.528194+0530 change_attr=1 caps=- 0x7f1e2ada7    b40); UserPerm(uid: 1000, gid: 1000)
160 2023-01-11T15:33:41.187+0530 7f1e3b7fe6c0 10 client.4134 _getattr mask As issued=0
...
...
193 2023-01-11T15:33:41.190+0530 7f1e3b7fe6c0 10 client.4134 _getattr result=0
194 2023-01-11T15:33:41.190+0530 7f1e3b7fe6c0  3 client.4134 may_lookup 0x7f1e2ada7b40 = 0
195 2023-01-11T15:33:41.190+0530 7f1e3b7fe6c0 20 client.4134 _lookup have 0x10000000000.snapdir["snapshot1"] from mds.0 ttl 2023-03-02T08:27:56.510051+0530 seq 0
196 2023-01-11T15:33:41.190+0530 7f1e3b7fe6c0 10 client.4134 _lookup 0x10000000000.snapdir(faked_ino=0 nref=7 ll_ref=29 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2023-01-11T15:23:58.011371+0530 mtime=2023-01-11T15:24:05.398010+0530 ctime=2023-01-11T15:26:10.528194+0530 change_attr=1 caps=- 0x7f1e2ada7b40    ) snapshot1 = 0x10000000000.2(faked_ino=0 nref=6 ll_ref=20 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2023-01-11T15:23:58.011371+0530 mtime=2023-01-11T15:24:05.398010+0530 ctime=2023-01-11T15:25:03.490655+0530 change_attr=1 caps=pAsLsXsFs COMPLETE parents=0x10000000000.snapdir["snapshot1"] 0x7f1e50008    100)
197 2023-01-11T15:33:41.190+0530 7f1e3b7fe6c0 10 client.4134 fill_stat on 0x10000000000 snap/dev2 mode 040755 mtime 2023-01-11T15:24:05.398010+0530 ctime 2023-01-11T15:25:03.490655+0530
198 2023-01-11T15:33:41.190+0530 7f1e3b7fe6c0 20 client.4134 _ll_get 0x7f1e50008100 0x10000000000 -> 21
199 2023-01-11T15:33:41.190+0530 7f1e3b7fe6c0  3 client.4134 ll_lookup 0x10000000000.snapdir snapshot1 -> 0 (10000000000)   

200 2023-01-11T15:33:41.190+0530 7f1e3b7fe6c0 20 client.4134 put_inode on 0x10000000000.2(faked_ino=0 nref=6 ll_ref=21 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2023-01-11T15:23:58.011371+0530 mtime=2023-01-11T15:24:05.398010+0530 ctime=2023-01-11T15:25:03.490655+0530 change_attr=1 caps=pAsLsXsFs COMPLET    E parents=0x10000000000.snapdir["snapshot1"] 0x7f1e50008100) n = 1
201 2023-01-11T15:33:41.190+0530 7f1e3b7fe6c0  8 client.4134 _ll_forget 0x10000000000 1
202 2023-01-11T15:33:41.190+0530 7f1e3b7fe6c0 20 client.4134 _ll_put 0x7f1e2ada7b40 0x10000000000 1 -> 28
203 2023-01-11T15:33:41.190+0530 7f1e6b7fe6c0 20 client.4134 put_inode on 0x10000000000.head(faked_ino=0 nref=11 ll_ref=32 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2023-01-11T15:23:58.011371+0530 mtime=2023-01-11T15:24:05.398010+0530 ctime=2023-01-11T15:26:10.528194+0530 change_attr=1 caps=pAsLsXsFs(0=p    AsLsXsFs) COMPLETE parents=0x1.head["dir1"] 0x7f1e50005660) n = 1
204 2023-01-11T15:33:41.190+0530 7f1e6b7fe6c0 20 client.4134 put_inode on 0x10000000000.snapdir(faked_ino=0 nref=7 ll_ref=28 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2023-01-11T15:23:58.011371+0530 mtime=2023-01-11T15:24:05.398010+0530 ctime=2023-01-11T15:26:10.528194+0530 change_attr=1 caps=- 0x7f1e2ad    a7b40) n = 1
205 2023-01-11T15:33:41.190+0530 7f1e6b7fe6c0 20 client.4134 put_inode on 0x10000000000.head(faked_ino=0 nref=11 ll_ref=32 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2023-01-11T15:23:58.011371+0530 mtime=2023-01-11T15:24:05.398010+0530 ctime=2023-01-11T15:26:10.528194+0530 change_attr=1 caps=pAsLsXsFs(0=p    AsLsXsFs) COMPLETE parents=0x1.head["dir1"] 0x7f1e50005660) n = 1
206 2023-01-11T15:33:41.190+0530 7f1e397fa6c0 20 client.4134 _ll_get 0x7f1e50008100 0x10000000000 -> 22

Similarly the lookup on 'snapshot1/file1-4GB' is also successful.

207 2023-01-11T15:33:41.191+0530 7f1e397fa6c0  3 client.4134 ll_lookup 0x10000000000.2 file1-4GB
208 2023-01-11T15:33:41.191+0530 7f1e397fa6c0 20 client.4134 may_lookup 0x10000000000.2(faked_ino=0 nref=6 ll_ref=22 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2023-01-11T15:23:58.011371+0530 mtime=2023-01-11T15:24:05.398010+0530 ctime=2023-01-11T15:25:03.490655+0530 change_attr=1 caps=pAsLsXsFs COMPLETE     parents=0x10000000000.snapdir["snapshot1"] 0x7f1e50008100); UserPerm(uid: 1000, gid: 1000)
209 2023-01-11T15:33:41.191+0530 7f1e397fa6c0 10 client.4134 _getattr mask As issued=1
210 2023-01-11T15:33:41.191+0530 7f1e397fa6c0  3 client.4134 may_lookup 0x7f1e50008100 = 0
211 2023-01-11T15:33:41.191+0530 7f1e397fa6c0 20 client.4134 _lookup have 0x10000000000.2["file1-4GB"] from mds.-1 ttl 0.000000 seq 0
212 2023-01-11T15:33:41.191+0530 7f1e397fa6c0 10 client.4134 _lookup 0x10000000000.2(faked_ino=0 nref=6 ll_ref=22 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2023-01-11T15:23:58.011371+0530 mtime=2023-01-11T15:24:05.398010+0530 ctime=2023-01-11T15:25:03.490655+0530 change_attr=1 caps=pAsLsXsFs COMPLETE par    ents=0x10000000000.snapdir["snapshot1"] 0x7f1e50008100) file1-4GB = 0x10000000001.2(faked_ino=0 nref=5 ll_ref=9 cap_refs={1024=0,2048=0} open={1=3} mode=100644 size=4194304000/0 nlink=1 btime=2023-01-11T15:24:05.398010+0530 mtime=2023-01-11T15:24:37.324388+0530 ctime=2023-01-11T15:24:37.324388+0530 change_attr=3    2000 caps=pAsLsXsFscrl objectset[0x10000000001 ts 0/0 objects 1000 dirty_or_tx 0] parents=0x10000000000.2["file1-4GB"] 0x7f1e500086d0)
213 2023-01-11T15:33:41.191+0530 7f1e397fa6c0 10 client.4134 fill_stat on 0x10000000001 snap/dev2 mode 0100644 mtime 2023-01-11T15:24:37.324388+0530 ctime 2023-01-11T15:24:37.324388+0530
214 2023-01-11T15:33:41.191+0530 7f1e397fa6c0 20 client.4134 _ll_get 0x7f1e500086d0 0x10000000001 -> 10
215 2023-01-11T15:33:41.191+0530 7f1e397fa6c0  3 client.4134 ll_lookup 0x10000000000.2 file1-4GB -> 0 (10000000001)

The open is also successful because it had caps.

220 2023-01-11T15:33:41.192+0530 7f1e3bfff6c0  3 client.4134 ll_open 0x10000000001.2 0
221 2023-01-11T15:33:41.192+0530 7f1e3bfff6c0 20 client.4134 may_open 0x10000000001.2(faked_ino=0 nref=5 ll_ref=11 cap_refs={1024=0,2048=0} open={1=3} mode=100644 size=4194304000/0 nlink=1 btime=2023-01-11T15:24:05.398010+0530 mtime=2023-01-11T15:24:37.324388+0530 ctime=2023-01-11T15:24:37.324388+0530 change_attr=32    000 caps=pAsLsXsFscrl objectset[0x10000000001 ts 0/0 objects 1000 dirty_or_tx 0] parents=0x10000000000.2["file1-4GB"] 0x7f1e500086d0); UserPerm(uid: 1000, gid: 1000)
222 2023-01-11T15:33:41.192+0530 7f1e3bfff6c0 10 client.4134 _getattr mask As issued=1
223 2023-01-11T15:33:41.192+0530 7f1e3bfff6c0  3 client.4134 may_open 0x7f1e500086d0 = 0
224 2023-01-11T15:33:41.192+0530 7f1e3bfff6c0 10 break_deleg: breaking delegs on 0x10000000001.2(faked_ino=0 nref=5 ll_ref=11 cap_refs={1024=0,2048=0} open={1=4} mode=100644 size=4194304000/0 nlink=1 btime=2023-01-11T15:24:05.398010+0530 mtime=2023-01-11T15:24:37.324388+0530 ctime=2023-01-11T15:24:37.324388+0530 cha    nge_attr=32000 caps=pAsLsXsFscrl objectset[0x10000000001 ts 0/0 objects 1000 dirty_or_tx 0] parents=0x10000000000.2["file1-4GB"] 0x7f1e500086d0)
225 2023-01-11T15:33:41.192+0530 7f1e3bfff6c0 10 delegations_broken: delegations empty on 0x10000000001.2(faked_ino=0 nref=5 ll_ref=11 cap_refs={1024=0,2048=0} open={1=4} mode=100644 size=4194304000/0 nlink=1 btime=2023-01-11T15:24:05.398010+0530 mtime=2023-01-11T15:24:37.324388+0530 ctime=2023-01-11T15:24:37.324388    +0530 change_attr=32000 caps=pAsLsXsFscrl objectset[0x10000000001 ts 0/0 objects 1000 dirty_or_tx 0] parents=0x10000000000.2["file1-4GB"] 0x7f1e500086d0)
226 2023-01-11T15:33:41.192+0530 7f1e3bfff6c0 10 client.4134 check_caps on 0x10000000001.2(faked_ino=0 nref=5 ll_ref=11 cap_refs={1024=0,2048=0} open={1=4} mode=100644 size=4194304000/0 nlink=1 btime=2023-01-11T15:24:05.398010+0530 mtime=2023-01-11T15:24:37.324388+0530 ctime=2023-01-11T15:24:37.324388+0530 change_at    tr=32000 caps=pAsLsXsFscrl objectset[0x10000000001 ts 0/0 objects 1000 dirty_or_tx 0] parents=0x10000000000.2["file1-4GB"] 0x7f1e500086d0) wanted pFscr used Fc issued pAsLsXsFscrl revoking - flags=1
227 2023-01-11T15:33:41.192+0530 7f1e3bfff6c0 20 client.4134 put_inode on 0x10000000001.2(faked_ino=0 nref=7 ll_ref=11 cap_refs={1024=0,2048=0} open={1=4} mode=100644 size=4194304000/0 nlink=1 btime=2023-01-11T15:24:05.398010+0530 mtime=2023-01-11T15:24:37.324388+0530 ctime=2023-01-11T15:24:37.324388+0530 change_att    r=32000 caps=pAsLsXsFscrl objectset[0x10000000001 ts 0/0 objects 1000 dirty_or_tx 0] parents=0x10000000000.2["file1-4GB"] 0x7f1e500086d0) n = 1
228 2023-01-11T15:33:41.192+0530 7f1e3bfff6c0 10 client.4134 _create_fh 0x10000000001 mode 1
229 2023-01-11T15:33:41.192+0530 7f1e3bfff6c0  5 client.4134 open success, fh is 0x7f1e2ffd9820 combined IMMUTABLE SNAP caps pAsLsXsFscrl
230 2023-01-11T15:33:41.192+0530 7f1e3bfff6c0 20 client.4134 trim_cache size 4 max 16384
231 2023-01-11T15:33:41.192+0530 7f1e3bfff6c0  3 client.4134 ll_open 0x10000000001.2 0 = 0 (0x7f1e2ffd9820)

Actions

Copy link

Updated by Venky Shankar over 1 year ago

The client should receive an updated snap trace from the MDS. Using that some of the snap inodes should be invalidated. You'd see the snap trace message sent in the MDS logs as well as in the client (logs).

Actions

Copy link