Bug #48679
closedclient: items pinned in cache preventing unmount
100%
Description
2020-12-18T02:17:36.017+0000 7f967501e200 2 client.4623 unmounting ... 2020-12-18T02:17:36.018+0000 7f967501e200 2 client.4623 cache still has 0+25 items, waiting (for caps to release?) ...
From: /ceph/teuthology-archive/pdonnell-2020-12-17_23:13:08-fs-wip-pdonnell-testing-20201217.205941-distro-basic-smithi/5715878/remote/smithi091/log/ceph-client.0.36435.log.gz
Symptom:
2020-12-18T02:32:45.191 ERROR:tasks.cephfs.fuse_mount:process failed to terminate after unmount. This probably indicates a bug within ceph-fuse. 2020-12-18T02:32:45.192 ERROR:teuthology.run_tasks:Manager failed: ceph-fuse Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 173, in run_tasks suppress = manager.__exit__(*exc_info) File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__ next(self.gen) File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20201217.205941/qa/tasks/ceph_fuse.py", line 175, in task mount.umount_wait() File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20201217.205941/qa/tasks/cephfs/fuse_mount.py", line 380, in umount_wait run.wait([self.fuse_daemon], timeout) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 473, in wait check_time() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 133, in __call__ raise MaxWhileTries(error_msg) teuthology.exceptions.MaxWhileTries: reached maximum tries (150) after waiting for 900 seconds
From: /ceph/teuthology-archive/pdonnell-2020-12-17_23:13:08-fs-wip-pdonnell-testing-20201217.205941-distro-basic-smithi/5715878/teuthology.log
Updated by Patrick Donnelly over 3 years ago
description: fs/thrash/workloads/{begin clusters/1-mds-1-client-coloc conf/{client mds mon osd} distro/{centos_8} mount/fuse msgr-failures/none objectstore-ec/filestore-xfs overrides/{frag_enable session_timeout thrashosds-health whitelist_health whitelist_wrongly_marked_down} tasks/{1-thrash/osd 2-workunit/cfuse_workunit_snaptests}}
Updated by Xiubo Li over 3 years ago
Patrick, this one seems similiar to the one I have fixed before, I will take it.
Thanks.
Updated by Xiubo Li over 3 years ago
- Status changed from New to In Progress
- Assignee set to Xiubo Li
Updated by Xiubo Li over 3 years ago
For example for the inode 0x10000000e51:
1426613:2020-12-18T01:31:54.411+0000 7f964f7fe700 10 client.4623 queue_cap_snap 0x10000000e51.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=0} mode=100664 size=62/4194304 nlink=1 btime=2020-12-18T01:31:49.263043+0000 mtime=2020-12-18T01:31:49.264384+0000 ctime=2020-12-18T01:31:49.264384+0000 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) flushing_caps=Fw objectset[0x10000000e51 ts 0/0 objects 1 dirty_or_tx 62] parents=0x100000005fa.head["34"] 0x7f9648532110) snapc 5e=[5e,8] used Fcb 1426614:2020-12-18T01:31:54.411+0000 7f964f7fe700 15 inode.get on 0x7f9648532110 0x10000000e51.head now 7 ... 1426615:2020-12-18T01:31:54.411+0000 7f964f7fe700 10 client.4623 finish_cap_snap 0x10000000e51.head(faked_ino=0 ref=7 ll_ref=1 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=0} mode=100664 size=62/4194304 nlink=1 btime=2020-12-18T01:31:49.263043+0000 mtime=2020-12-18T01:31:49.264384+0000 ctime=2020-12-18T01:31:49.264384+0000 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) flushing_caps=Fw objectset[0x10000000e51 ts 0/0 objects 1 dirty_or_tx 62] parents=0x100000005fa.head["34"] 0x7f9648532110) capsnap 0x7f9648869a08 used Fcb 1426616:2020-12-18T01:31:54.411+0000 7f964f7fe700 10 client.4623 finish_cap_snap 0x10000000e51.head(faked_ino=0 ref=7 ll_ref=1 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=0} mode=100664 size=62/4194304 nlink=1 btime=2020-12-18T01:31:49.263043+0000 mtime=2020-12-18T01:31:49.264384+0000 ctime=2020-12-18T01:31:49.264384+0000 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) flushing_caps=Fw objectset[0x10000000e51 ts 0/0 objects 1 dirty_or_tx 62] parents=0x100000005fa.head["34"] 0x7f9648532110) cap_snap 0x7f9648869a08 used 9216 WRBUFFER, delaying
Because it has the Fb cap, so the flush snaps was delayed, but it seemed never do it later. So the inode still kept in the cap_snaps by increasing a ref.
76 struct CapSnap { 77 //snapid_t follows; // map key 78 InodeRef in; ===> an inode ref 79 SnapContext context; 80 int issued, dirty;
Updated by Xiubo Li over 3 years ago
Xiubo Li wrote:
For example for the inode 0x10000000e51:
[...]
Because it has the Fb cap, so the flush snaps was delayed, but it seemed never do it later. So the inode still kept in the cap_snaps by increasing a ref.
[...]
After the delaying to flush the snaps cap, the inode "10000000e51" didn'r run the "finish_cap_snap()":
2020-12-18T01:32:03.525+0000 7f9665d37700 1 -- 192.168.0.1:0/722175349 <== osd.6 v2:172.21.15.199:6819/33452 35 ==== osd_op_reply(3021 10000000e51.00000000 [write 0~62] v181'385 uv385 ondisk = 0) v8 ==== 164+0+0 (crc 0 0 0) 0x7f9658052040 con 0x7f962000a110 2020-12-18T01:32:03.525+0000 7f9654ff9700 10 client.4623 _flushed 0x10000000e51.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=0} mode=100664 size=62/4194304 nlink=1 btime=2020-12-18T01:31:49.263043+0000 mtime=2020-12-18T01:31:49.264384+0000 ctime=2020-12-18T01:31:49.264384+0000 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) objectset[0x10000000e51 ts 0/0 objects 1 dirty_or_tx 0] parents=0x100000005fa.head["34"] 0x7f9648532110) 2020-12-18T01:32:03.525+0000 7f9654ff9700 5 client.4623 put_cap_ref dropped last FILE_BUFFER ref on 0x10000000e51.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=0} mode=100664 size=62/4194304 nlink=1 btime=2020-12-18T01:31:49.263043+0000 mtime=2020-12-18T01:31:49.264384+0000 ctime=2020-12-18T01:31:49.264384+0000 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) objectset[0x10000000e51 ts 0/0 objects 1 dirty_or_tx 0] parents=0x100000005fa.head["34"] 0x7f9648532110) 2020-12-18T01:32:03.525+0000 7f9654ff9700 5 client.4623 put_cap_ref dropped last FILE_CACHE ref on 0x10000000e51.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=0} mode=100664 size=62/4194304 nlink=1 btime=2020-12-18T01:31:49.263043+0000 mtime=2020-12-18T01:31:49.264384+0000 ctime=2020-12-18T01:31:49.264384+0000 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) objectset[0x10000000e51 ts 0/0 objects 1 dirty_or_tx 0] parents=0x100000005fa.head["34"] 0x7f9648532110) 2020-12-18T01:32:03.525+0000 7f9654ff9700 20 client.4623 put_inode on 0x10000000e51.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=0} mode=100664 size=62/4194304 nlink=1 btime=2020-12-18T01:31:49.263043+0000 mtime=2020-12-18T01:31:49.264384+0000 ctime=2020-12-18T01:31:49.264384+0000 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) objectset[0x10000000e51 ts 0/0 objects 1 dirty_or_tx 0] parents=0x100000005fa.head["34"] 0x7f9648532110) n = 2 2020-12-18T01:32:03.525+0000 7f9654ff9700 20 client.4623 put_inode on 0x10000000e51.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=0} mode=100664 size=62/4194304 nlink=1 btime=2020-12-18T01:31:49.263043+0000 mtime=2020-12-18T01:31:49.264384+0000 ctime=2020-12-18T01:31:49.264384+0000 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) objectset[0x10000000e51 ts 0/0 objects 1 dirty_or_tx 0] parents=0x100000005fa.head["34"] 0x7f9648532110) n = 1
We should check the "CEPH_CAP_FILE_BUFFER" cap and call the "finish_cap_snap()".
Updated by Xiubo Li over 3 years ago
- Status changed from In Progress to Fix Under Review
Updated by Patrick Donnelly over 3 years ago
- Target version changed from v16.0.0 to v17.0.0
- Backport set to pacific,octopus,nautilus
Updated by Ilya Dryomov about 3 years ago
- Related to Bug #48640: qa: snapshot mismatch during mds thrashing added
Updated by Patrick Donnelly about 3 years ago
- Related to Bug #49928: client: items pinned in cache preventing unmount x2 added
Updated by Patrick Donnelly about 3 years ago
- Status changed from Fix Under Review to Pending Backport
- Source set to Q/A
Updated by Backport Bot about 3 years ago
- Copied to Backport #50025: pacific: client: items pinned in cache preventing unmount added
Updated by Backport Bot about 3 years ago
- Copied to Backport #50026: nautilus: client: items pinned in cache preventing unmount added
Updated by Backport Bot about 3 years ago
- Copied to Backport #50027: octopus: client: items pinned in cache preventing unmount added
Updated by Loïc Dachary almost 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".