Project

General

Profile

Bug #51256

pybind/mgr/volumes: purge queue seems to block operating on cephfs connection required by dispatch thread

Added by Patrick Donnelly over 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
other
Tags:
Backport:
pacific
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/volumes
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

debug 2021-06-16 13:38:35.555 7f724dcde700  3 client.96349150 read(10, 0x55a35f3ae020, 8192, -1) = 158
...
debug 2021-06-16 13:38:35.555 7f7218edd700 10 client.96349150 path_walk #0x1/volumes/_deleting/path/to/dir
...
debug 2021-06-16 13:38:45.625 7f7218edd700 10 client.96349150  marking I_COMPLETE on 0x1001b524ede.head(faked_ino=0 ref=5 ll_ref=0 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2020-10-18 19:09:08.080929 mtime=2021-06-16 13:38:35.557210 ctime=2021-06-16 13:38:35.557210 caps=pAsLsXsFsx(0=pAsLsXsFsx) parents=0x1001b524ed3.head["end"] 0x55a3607c6680)
...
debug 2021-06-16 13:38:45.626 7f724dcde700 10 client.96349150 lock_fh_pos 0x55a3604c4420
debug 2021-06-16 13:38:45.626 7f724dcde700 10 client.96349150 get_caps 0x1006a9ebb21.head(faked_ino=0 ref=2 ll_ref=0 cap_refs={4=0,1024=0,2048=0,4096=0,8192=0} open={1=1,2=0} mode=100640 size=158/4194304 nlink=1 btime=2021-06-02 21:25:26.961098 mtime=2021-06-16 13:38:32.702209 ctime=2021-06-16 13:38:32.702209 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x1006a9ebb21 ts 2/0 objects 1 dirty_or_tx 0] parents=0x1006a9ebb1f.head[".meta"] 0x55a3605cbb80) have pAsxLsXsxFsxcrwb need Fr want Fc revoking -

7f724dcde700 is the finisher thread. 7f7218edd700 appears to be one of the purge queue threads. There seems to be a 10 second gap in processing by the finisher thread executing a subvolume API operation. The client instance "client.96349150" is used by both threads.


Related issues

Related to CephFS - Bug #51271: mgr/volumes: use a dedicated libcephfs handle for subvolume API calls Pending Backport
Copied to CephFS - Backport #51411: pacific: pybind/mgr/volumes: purge queue seems to block operating on cephfs connection required by dispatch thread Resolved

History

#1 Updated by Venky Shankar over 1 year ago

  • Status changed from New to In Progress

#2 Updated by Venky Shankar over 1 year ago

Purge threads (or any async job in mg/volumes) operate in two steps:

1. Perform a file system call to fetch an entry to process (purge the directory in trashcan for purge threads). This stage does not acquire global lock, so there is no contention for it.

2. Once an entry is fetched, recursively delete the directory tree. Again, this step does not acquire the global lock for the most part. The only place it does acquire the global lock is to handle per-subvolume trash feature which was added as a workaround for a subvolume not being able to rename across subvolume boundary (ceph.dir.subvolume xattr). Moreover, this is only done when a subvolume is deleted with `--retain-snapshots` and should generally not keep the global lock blocked as all that needs to be done is to cleanup a directory with very few entries.

Now, w.r.t the ~10 seconds delay that is seen::

Purge thread (7f7218edd700) fetches trash entry and starts rmtree'ing:

debug 2021-06-16 13:32:07.986 7f7218edd700 20 mgr[volumes] purging trash entry 'uuid-yanked'' for volume '<volume-name-yanked>'
debug 2021-06-16 13:32:07.993 7f7218edd700 20 mgr[volumes] purging entry pointing to trash: b'/volumes/_deleting/uuid-yanked'
debug 2021-06-16 13:32:07.993 7f7218edd700 20 mgr[volumes] rmtree b'/volumes/_deleting/uuid-yanked/yank0'
debug 2021-06-16 13:32:08.019 7f7218edd700 20 mgr[volumes] rmtree b'/volumes/_deleting/uuid-yanked/yank0/yank1'

Things progress as normal till now (rmtree'ing other parts of the directory tree). path_walk initiated for a directory tree::

debug 2021-06-16 13:38:35.555 7f7218edd700 10 client.96349150 path_walk #0x1/volumes/_deleting/uuid-yanked/yank0/yank1/yank2/yank3/yank4

lookup for each directory path (till yank4 directory)::

debug 2021-06-16 13:38:35.555 7f7218edd700 10 client.96349150  0 0x1.head(faked_ino=0 ref=4 ll_ref=1 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2020-04-29 17:38:19.084517 mtime=2020-04-29 17:38:20.102557 ctime=2020-04-29 17:38:20.102557 caps=pAsLsXsFs(0=pAsLsXsFs) has_dir_layout 0x55a35cc9cc00) volumes
...
debug 2021-06-16 13:38:35.555 7f7218edd700 10 client.96349150 _lookup 0x1.head(faked_ino=0 ref=4 ll_ref=1 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2020-04-29 17:38:19.084517 mtime=2020-04-29 17:38:20.102557 ctime=2020-04-29 17:38:20.102557 caps=pAsLsXsFs(0=pAsLsXsFs) has_dir_layout 0x55a35cc9cc00) volumes = 0x10000000000.head(faked_ino=0 ref=3 ll_ref=0 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2020-04-29 17:38:20.102557 mtime=2021-06-16 13:27:28.505535 ctime=2021-06-16 13:27:28.505535 caps=pAsLsXs(0=pAsLsXs) parents=0x1.head["volumes"] 0x55a35cc9d700)
...
debug 2021-06-16 13:38:35.555 7f7218edd700 10 client.96349150 _lookup 0x10000000000.head(faked_ino=0 ref=3 ll_ref=0 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2020-04-29 17:38:20.102557 mtime=2021-06-16 13:27:28.505535 ctime=2021-06-16 13:27:28.505535 caps=pAsLsXs(0=pAsLsXs) parents=0x1.head["volumes"] 0x55a35cc9d700) _deleting = 0x10000000003.head(faked_ino=0 ref=3 ll_ref=0 cap_refs={} open={} mode=40700 size=0/0 nlink=1 btime=2020-05-04 20:07:29.868004 mtime=2021-06-16 13:24:14.515903 ctime=2021-06-16 13:24:14.515903 caps=pAsLsXsFsx(0=pAsLsXsFsx) COMPLETE parents=0x10000000000.head["_deleting"] 0x55a35e576b00)
...
debug 2021-06-16 13:38:35.555 7f7218edd700 10 client.96349150 _lookup 0x10000000003.head(faked_ino=0 ref=3 ll_ref=0 cap_refs={} open={} mode=40700 size=0/0 nlink=1 btime=2020-05-04 20:07:29.868004 mtime=2021-06-16 13:24:14.515903 ctime=2021-06-16 13:24:14.515903 caps=pAsLsXsFsx(0=pAsLsXsFsx) COMPLETE parents=0x10000000000.head["_deleting"] 0x55a35e576b00) a15138e4-75d2-4835-a580-719c953708bd = 0x10004339f72.head(faked_ino=0 ref=4 ll_ref=0 cap_refs={} open={} mode=40777 size=0/0 nlink=1 btime=0.000000 mtime=2021-06-15 01:04:06.025906 ctime=2021-06-15 01:04:06.025906 caps=pAsLsXsFsx(0=pAsLsXsFsx) parents=0x10000000003.head["uuid-yanked"] has_dir_layout quota(max_bytes = 10737418240 max_files = 0) 0x55a35e577b80)
...
debug 2021-06-16 13:38:35.555 7f7218edd700 10 client.96349150 _lookup 0x10004339f72.head(faked_ino=0 ref=4 ll_ref=0 cap_refs={} open={} mode=40777 size=0/0 nlink=1 btime=0.000000 mtime=2021-06-15 01:04:06.025906 ctime=2021-06-15 01:04:06.025906 caps=pAsLsXsFsx(0=pAsLsXsFsx) parents=0x10000000003.head["uuid-yanked"] has_dir_layout quota(max_bytes = 10737418240 max_files = 0) 0x55a35e577b80) artifacts = 0x100045e836a.head(faked_ino=0 ref=5 ll_ref=0 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=0.000000 mtime=2020-07-28 10:50:45.811775 ctime=2020-07-28 10:50:45.811775 caps=pAsLsXsFsx(0=pAsLsXsFsx) parents=0x10004339f72.head["yank0"] 0x55a35e579180)
...
debug 2021-06-16 13:38:35.555 7f7218edd700 10 client.96349150 _lookup 0x100045e836a.head(faked_ino=0 ref=5 ll_ref=0 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=0.000000 mtime=2020-07-28 10:50:45.811775 ctime=2020-07-28 10:50:45.811775 caps=pAsLsXsFsx(0=pAsLsXsFsx) parents=0x10004339f72.head["yank0"] 0x55a35e579180) pipelines = 0x100045fe6ac.head(faked_ino=0 ref=5 ll_ref=0 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=0.000000 mtime=2021-06-16 08:25:19.061829 ctime=2021-06-16 08:25:19.061829 caps=pAsLsXsFsx(0=pAsLsXsFsx) parents=0x100045e836a.head["yank1"] 0x55a35e579700)
...
debug 2021-06-16 13:38:35.555 7f7218edd700 10 client.96349150 _lookup 0x100045fe6ac.head(faked_ino=0 ref=5 ll_ref=0 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=0.000000 mtime=2021-06-16 08:25:19.061829 ctime=2021-06-16 08:25:19.061829 caps=pAsLsXsFsx(0=pAsLsXsFsx) parents=0x100045e836a.head["yank1"] 0x55a35e579700) dl-template = 0x1001a8e6222.head(faked_ino=0 ref=5 ll_ref=0 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2020-10-18 18:14:05.251299 mtime=2021-06-16 13:37:01.869385 ctime=2021-06-16 13:37:01.869385 caps=pAsLsXsFsx(0=pAsLsXsFsx) parents=0x100045fe6ac.head["yank2"] 0x55a35e578680)
...
debug 2021-06-16 13:38:35.555 7f7218edd700 10 client.96349150 _lookup 0x1001a8e6222.head(faked_ino=0 ref=5 ll_ref=0 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2020-10-18 18:14:05.251299 mtime=2021-06-16 13:37:01.869385 ctime=2021-06-16 13:37:01.869385 caps=pAsLsXsFsx(0=pAsLsXsFsx) parents=0x100045fe6ac.head["yank2"] 0x55a35e578680) 7 = 0x1001b524ed3.head(faked_ino=0 ref=5 ll_ref=0 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2020-10-18 19:08:20.064847 mtime=2021-06-16 13:38:01.944691 ctime=2021-06-16 13:38:01.944691 caps=pAsLsXsFsx(0=pAsLsXsFsx) parents=0x1001a8e6222.head["yank3"] 0x55a35e937600)
...
debug 2021-06-16 13:38:35.556 7f7218edd700 10 client.96349150 _lookup 0x1001b524ed3.head(faked_ino=0 ref=5 ll_ref=0 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2020-10-18 19:08:20.064847 mtime=2021-06-16 13:38:01.944691 ctime=2021-06-16 13:38:01.944691 caps=pAsLsXsFsx(0=pAsLsXsFsx) parents=0x1001a8e6222.head["yank3"] 0x55a35e937600) end = 0x1001b524ede.head(faked_ino=0 ref=5 ll_ref=0 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2020-10-18 19:09:08.080929 mtime=2020-10-18 19:09:08.084929 ctime=2020-10-18 19:09:08.084929 caps=pAsLsXsFsx(0=pAsLsXsFsx) parents=0x1001b524ed3.head["yank4"] 0x55a3607c6680)

rmdir request sent to mds for directory entry under yank4 (time of request: 13:38:35.556)::

debug 2021-06-16 13:38:35.556 7f7218edd700  3 client.96349150 may_delete 0x55a3607c6680 = 0
debug 2021-06-16 13:38:35.556 7f7218edd700  8 client.96349150 _rmdir(0x1001b524ede yank5 uid 0 gid 0)
debug 2021-06-16 13:38:35.556 7f7218edd700 10 client.96349150 _lookup 0x1001b524ede.head(faked_ino=0 ref=6 ll_ref=0 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2020-10-18 19:09:08.080929 mtime=2020-10-18 19:09:08.084929 ctime=2020-10-18 19:09:08.084929 caps=pAsLsXsFsx(0=pAsLsXsFsx) parents=0x1001b524ed3.head["yank4"] 0x55a3607c6680) 1 = 0x1001b524edf.head(faked_ino=0 ref=5 ll_ref=0 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2020-10-18 19:09:08.084929 mtime=2021-06-16 13:38:32.762354 ctime=2021-06-16 13:38:32.762354 caps=pAsLsXsFsx(0=pAsLsXsFsx) COMPLETE parents=0x1001b524ede.head["yank5"] 0x55a3607c5b80)
debug 2021-06-16 13:38:35.556 7f7218edd700 10 client.96349150 choose_target_mds from caps on inode 0x1001b524ede.head(faked_ino=0 ref=6 ll_ref=0 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2020-10-18 19:09:08.080929 mtime=2020-10-18 19:09:08.084929 ctime=2020-10-18 19:09:08.084929 caps=pAsLsXsFsx(0=pAsLsXsFsx) parents=0x1001b524ed3.head["yank4"] 0x55a3607c6680)
debug 2021-06-16 13:38:35.556 7f7218edd700 10 client.96349150 send_request rebuilding request 675 for mds.0
debug 2021-06-16 13:38:35.556 7f7218edd700 10 client.96349150 send_request client_request(unknown.0:675 rmdir #0x1001b524ede/yank5 2021-06-16 13:38:35.557210 caller_uid=0, caller_gid=0{}) v4 to mds.0
debug 2021-06-16 13:38:35.556 7f7218edd700  1 -- 172.20.28.95:0/3217647441 --> [v2:172.20.34.28:6800/2017237128,v1:172.20.34.28:6801/2017237128] -- client_request(unknown.0:675 rmdir #0x1001b524ede/yank5 2021-06-16 13:38:35.557210 caller_uid=0, caller_gid=0{}) v4 -- 0x55a35c8c3800 con 0x55a35c9ef600

rmdir ack from mds (time of response: 13:38:45.623)::

debug 2021-06-16 13:38:45.623 7f7218edd700  8 client.96349150 rmdir(#0x1001b524ede/yank5) = 0
debug 2021-06-16 13:38:45.623 7f7218edd700 10 client.96349150 put_inode on 0x1001b524edf.head(faked_ino=0 ref=4 ll_ref=0 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=0.000000 mtime=2021-06-16 13:38:32.762354 ctime=2021-06-16 13:38:32.762354 caps=pAsXsFs(0=pAsXsFs) COMPLETE 0x55a3607c5b80)
debug 2021-06-16 13:38:45.623 7f7218edd700 10 client.96349150 put_inode on 0x1001b524ede.head(faked_ino=0 ref=6 ll_ref=0 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2020-10-18 19:09:08.080929 mtime=2021-06-16 13:38:35.557210 ctime=2021-06-16 13:38:35.557210 caps=pAsLsXsFsx(0=pAsLsXsFsx) parents=0x1001b524ed3.head["yank4"] 0x55a3607c6680)

So, it took ~10 seconds for a empty directory to be purged by the mds. This is not the global lock being held. Its the `client_lock` (in src/client) that is held up for 10 odd seconds (till it receives a rmdir reply from the mds).

We could start maintaining a bunch of libcephfs connections (i.e., move from a single shared libcephfs connection to a truly concurrent connection pool), however, we would still be limited by `mds_lock`. It would help when directory trees are handled by other active MDSs along with a concurrent libcephfs connection pool in mgr/volumes.

#4 Updated by Patrick Donnelly over 1 year ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 41917

#5 Updated by Venky Shankar over 1 year ago

  • Related to Bug #51271: mgr/volumes: use a dedicated libcephfs handle for subvolume API calls added

#6 Updated by Patrick Donnelly over 1 year ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport changed from pacific,octopus,nautilus to pacific

This is no longer urgent so I'm changing this to just Pacific to reduce the risk associated with this change.

#7 Updated by Backport Bot over 1 year ago

  • Copied to Backport #51411: pacific: pybind/mgr/volumes: purge queue seems to block operating on cephfs connection required by dispatch thread added

#8 Updated by Loïc Dachary about 1 year ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF