Project

General

Profile

Bug #48679

client: items pinned in cache preventing unmount

Added by Patrick Donnelly 12 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

100%

Source:
Q/A
Tags:
Backport:
pacific,octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-12-18T02:17:36.017+0000 7f967501e200  2 client.4623 unmounting
...
2020-12-18T02:17:36.018+0000 7f967501e200  2 client.4623 cache still has 0+25 items, waiting (for caps to release?)
...

From: /ceph/teuthology-archive/pdonnell-2020-12-17_23:13:08-fs-wip-pdonnell-testing-20201217.205941-distro-basic-smithi/5715878/remote/smithi091/log/ceph-client.0.36435.log.gz

Symptom:

2020-12-18T02:32:45.191 ERROR:tasks.cephfs.fuse_mount:process failed to terminate after unmount. This probably indicates a bug within ceph-fuse.
2020-12-18T02:32:45.192 ERROR:teuthology.run_tasks:Manager failed: ceph-fuse
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 173, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20201217.205941/qa/tasks/ceph_fuse.py", line 175, in task
    mount.umount_wait()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20201217.205941/qa/tasks/cephfs/fuse_mount.py", line 380, in umount_wait
    run.wait([self.fuse_daemon], timeout)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 473, in wait
    check_time()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 133, in __call__
    raise MaxWhileTries(error_msg)
teuthology.exceptions.MaxWhileTries: reached maximum tries (150) after waiting for 900 seconds

From: /ceph/teuthology-archive/pdonnell-2020-12-17_23:13:08-fs-wip-pdonnell-testing-20201217.205941-distro-basic-smithi/5715878/teuthology.log


Subtasks

Bug #49928: client: items pinned in cache preventing unmount x2DuplicateXiubo Li


Related issues

Related to CephFS - Bug #48640: qa: snapshot mismatch during mds thrashing Fix Under Review
Copied to CephFS - Backport #50025: pacific: client: items pinned in cache preventing unmount Resolved
Copied to CephFS - Backport #50026: nautilus: client: items pinned in cache preventing unmount Resolved
Copied to CephFS - Backport #50027: octopus: client: items pinned in cache preventing unmount Resolved

History

#1 Updated by Patrick Donnelly 12 months ago

  description: fs/thrash/workloads/{begin clusters/1-mds-1-client-coloc conf/{client
    mds mon osd} distro/{centos_8} mount/fuse msgr-failures/none objectstore-ec/filestore-xfs
    overrides/{frag_enable session_timeout thrashosds-health whitelist_health whitelist_wrongly_marked_down}
    tasks/{1-thrash/osd 2-workunit/cfuse_workunit_snaptests}}

#2 Updated by Xiubo Li 12 months ago

Patrick, this one seems similiar to the one I have fixed before, I will take it.
Thanks.

#3 Updated by Xiubo Li 12 months ago

  • Status changed from New to In Progress
  • Assignee set to Xiubo Li

#4 Updated by Xiubo Li 12 months ago

For example for the inode 0x10000000e51:

1426613:2020-12-18T01:31:54.411+0000 7f964f7fe700 10 client.4623 queue_cap_snap 0x10000000e51.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=0} mode=100664 size=62/4194304 nlink=1 btime=2020-12-18T01:31:49.263043+0000 mtime=2020-12-18T01:31:49.264384+0000 ctime=2020-12-18T01:31:49.264384+0000 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) flushing_caps=Fw objectset[0x10000000e51 ts 0/0 objects 1 dirty_or_tx 62] parents=0x100000005fa.head["34"] 0x7f9648532110) snapc 5e=[5e,8] used Fcb
1426614:2020-12-18T01:31:54.411+0000 7f964f7fe700 15 inode.get on 0x7f9648532110 0x10000000e51.head now 7
...
1426615:2020-12-18T01:31:54.411+0000 7f964f7fe700 10 client.4623 finish_cap_snap 0x10000000e51.head(faked_ino=0 ref=7 ll_ref=1 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=0} mode=100664 size=62/4194304 nlink=1 btime=2020-12-18T01:31:49.263043+0000 mtime=2020-12-18T01:31:49.264384+0000 ctime=2020-12-18T01:31:49.264384+0000 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) flushing_caps=Fw objectset[0x10000000e51 ts 0/0 objects 1 dirty_or_tx 62] parents=0x100000005fa.head["34"] 0x7f9648532110) capsnap 0x7f9648869a08 used Fcb
1426616:2020-12-18T01:31:54.411+0000 7f964f7fe700 10 client.4623 finish_cap_snap 0x10000000e51.head(faked_ino=0 ref=7 ll_ref=1 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=0} mode=100664 size=62/4194304 nlink=1 btime=2020-12-18T01:31:49.263043+0000 mtime=2020-12-18T01:31:49.264384+0000 ctime=2020-12-18T01:31:49.264384+0000 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) flushing_caps=Fw objectset[0x10000000e51 ts 0/0 objects 1 dirty_or_tx 62] parents=0x100000005fa.head["34"] 0x7f9648532110) cap_snap 0x7f9648869a08 used 9216 WRBUFFER, delaying

Because it has the Fb cap, so the flush snaps was delayed, but it seemed never do it later. So the inode still kept in the cap_snaps by increasing a ref.


 76 struct CapSnap {                                                                                                                      
 77   //snapid_t follows;  // map key
 78   InodeRef in;              ===> an inode ref               
 79   SnapContext context;         
 80   int issued, dirty; 

#5 Updated by Xiubo Li 11 months ago

Xiubo Li wrote:

For example for the inode 0x10000000e51:

[...]

Because it has the Fb cap, so the flush snaps was delayed, but it seemed never do it later. So the inode still kept in the cap_snaps by increasing a ref.
[...]

After the delaying to flush the snaps cap, the inode "10000000e51" didn'r run the "finish_cap_snap()":

2020-12-18T01:32:03.525+0000 7f9665d37700  1 -- 192.168.0.1:0/722175349 <== osd.6 v2:172.21.15.199:6819/33452 35 ==== osd_op_reply(3021 10000000e51.00000000 [write 0~62] v181'385 uv385 ondisk = 0) v8 ==== 164+0+0 (crc 0 0 0) 0x7f9658052040 con 0x7f962000a110
2020-12-18T01:32:03.525+0000 7f9654ff9700 10 client.4623 _flushed 0x10000000e51.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={4=0,1024=1,4096=0,8192=1} open={2=0} mode=100664 size=62/4194304 nlink=1 btime=2020-12-18T01:31:49.263043+0000 mtime=2020-12-18T01:31:49.264384+0000 ctime=2020-12-18T01:31:49.264384+0000 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) objectset[0x10000000e51 ts 0/0 objects 1 dirty_or_tx 0] parents=0x100000005fa.head["34"] 0x7f9648532110)
2020-12-18T01:32:03.525+0000 7f9654ff9700  5 client.4623 put_cap_ref dropped last FILE_BUFFER ref on 0x10000000e51.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=0} mode=100664 size=62/4194304 nlink=1 btime=2020-12-18T01:31:49.263043+0000 mtime=2020-12-18T01:31:49.264384+0000 ctime=2020-12-18T01:31:49.264384+0000 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) objectset[0x10000000e51 ts 0/0 objects 1 dirty_or_tx 0] parents=0x100000005fa.head["34"] 0x7f9648532110)
2020-12-18T01:32:03.525+0000 7f9654ff9700  5 client.4623 put_cap_ref dropped last FILE_CACHE ref on 0x10000000e51.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=0} mode=100664 size=62/4194304 nlink=1 btime=2020-12-18T01:31:49.263043+0000 mtime=2020-12-18T01:31:49.264384+0000 ctime=2020-12-18T01:31:49.264384+0000 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) objectset[0x10000000e51 ts 0/0 objects 1 dirty_or_tx 0] parents=0x100000005fa.head["34"] 0x7f9648532110)
2020-12-18T01:32:03.525+0000 7f9654ff9700 20 client.4623 put_inode on 0x10000000e51.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=0} mode=100664 size=62/4194304 nlink=1 btime=2020-12-18T01:31:49.263043+0000 mtime=2020-12-18T01:31:49.264384+0000 ctime=2020-12-18T01:31:49.264384+0000 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) objectset[0x10000000e51 ts 0/0 objects 1 dirty_or_tx 0] parents=0x100000005fa.head["34"] 0x7f9648532110) n = 2
2020-12-18T01:32:03.525+0000 7f9654ff9700 20 client.4623 put_inode on 0x10000000e51.head(faked_ino=0 ref=6 ll_ref=1 cap_refs={4=0,1024=0,4096=0,8192=0} open={2=0} mode=100664 size=62/4194304 nlink=1 btime=2020-12-18T01:31:49.263043+0000 mtime=2020-12-18T01:31:49.264384+0000 ctime=2020-12-18T01:31:49.264384+0000 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) objectset[0x10000000e51 ts 0/0 objects 1 dirty_or_tx 0] parents=0x100000005fa.head["34"] 0x7f9648532110) n = 1

We should check the "CEPH_CAP_FILE_BUFFER" cap and call the "finish_cap_snap()".

#6 Updated by Xiubo Li 11 months ago

  • Pull request ID set to 38732

#7 Updated by Xiubo Li 11 months ago

  • Status changed from In Progress to Fix Under Review

#8 Updated by Patrick Donnelly 11 months ago

  • Target version changed from v16.0.0 to v17.0.0
  • Backport set to pacific,octopus,nautilus

#9 Updated by Ilya Dryomov 10 months ago

  • Related to Bug #48640: qa: snapshot mismatch during mds thrashing added

#10 Updated by Patrick Donnelly 9 months ago

  • Related to Bug #49928: client: items pinned in cache preventing unmount x2 added

#11 Updated by Patrick Donnelly 8 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Source set to Q/A

#12 Updated by Backport Bot 8 months ago

  • Copied to Backport #50025: pacific: client: items pinned in cache preventing unmount added

#13 Updated by Backport Bot 8 months ago

  • Copied to Backport #50026: nautilus: client: items pinned in cache preventing unmount added

#14 Updated by Backport Bot 8 months ago

  • Copied to Backport #50027: octopus: client: items pinned in cache preventing unmount added

#15 Updated by Loïc Dachary 7 months ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF