Project

General

Profile

Bug #22008

Processes stuck waiting for write with ceph-fuse

Added by Andras Pataki over 1 year ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
-
Start date:
11/02/2017
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
luminous,jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, ceph-fuse
Labels (FS):
Pull request ID:

Description

We've been running into a strange problem with Ceph using ceph-fuse and the filesystem. All the back end nodes are on 10.2.10, the fuse clients are on 10.2.7. The cluster is healthy:

[root@cephmon02 ~]# ceph -s
cluster d7b33135-0940-4e48-8aa6-1d2026597c2f
health HEALTH_WARN
noout flag(s) set
monmap e31: 3 mons at {cephmon00=10.128.128.100:6789/0,cephmon01=10.128.128.101:6789/0,cephmon02=10.128.128.102:6789/0}
election epoch 47634, quorum 0,1,2 cephmon00,cephmon01,cephmon02
fsmap e46721: 1/1/1 up {0=cephmon02=up:active}, 2 up:standby
osdmap e728467: 1584 osds: 1584 up, 1584 in
flags noout,sortbitwise,require_jewel_osds
pgmap v63575585: 42496 pgs, 6 pools, 2310 TB data, 684 Mobjects
6939 TB used, 2936 TB / 9875 TB avail
42464 active+clean
28 active+clean+scrubbing+deep
4 active+clean+scrubbing
client io 64957 kB/s rd, 10163 kB/s wr, 33 op/s rd, 47 op/s wr

After some hours of runs, some processes get stuck waiting for fuse like:

[root@worker1144 ~]# cat /proc/58193/stack
[<ffffffffa08cd241>] wait_answer_interruptible+0x91/0xe0 [fuse]
[<ffffffffa08cd653>] __fuse_request_send+0x253/0x2c0 [fuse]
[<ffffffffa08cd6d2>] fuse_request_send+0x12/0x20 [fuse]
[<ffffffffa08d69d6>] fuse_send_write+0xd6/0x110 [fuse]
[<ffffffffa08d84d5>] fuse_perform_write+0x2f5/0x5a0 [fuse]
[<ffffffffa08d8a21>] fuse_file_aio_write+0x2a1/0x340 [fuse]
[<ffffffff811fdfbd>] do_sync_write+0x8d/0xd0
[<ffffffff811fe82d>] vfs_write+0xbd/0x1e0
[<ffffffff811ff34f>] SyS_write+0x7f/0xe0
[<ffffffff816975c9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

The ceph-fuse process claims it has no operations pending and is generally happy:

[root@worker1144 ~]# ceph daemon /var/run/ceph/ceph-client.admin.asok status {
"metadata": {
"ceph_sha1": "50e863e0f4bc8f4b9e31156de690d765af245185",
"ceph_version": "ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)",
"entity_id": "admin",
"hostname": "worker1144",
"mount_point": "\/mnt\/ceph",
"root": "\/"
},
"dentry_count": 1242,
"dentry_pinned_count": 81,
"inode_count": 1041,
"mds_epoch": 46684,
"osd_epoch": 728466,
"osd_epoch_barrier": 728466
}

[root@worker1144 ~]# ceph daemon /var/run/ceph/ceph-client.admin.asok mds_sessions {
"id": 5435495,
"sessions": [ {
"mds": 0,
"addr": "10.128.128.102:6800\/2501015856",
"seq": 584228,
"cap_gen": 0,
"cap_ttl": "2017-11-02 10:52:40.234770",
"last_cap_renew_request": "2017-11-02 10:51:40.234770",
"cap_renew_seq": 106171,
"num_caps": 16937,
"state": "open"
}
],
"mdsmap_epoch": 46684
}

[root@worker1144 ~]# ceph daemon /var/run/ceph/ceph-client.admin.asok mds_requests {}

The active MDS also claims nothing in flight:

[root@cephmon02 ~]# ceph --admin-daemon /var/run/ceph/ceph-mds.*.asok dump_ops_in_flight {
"ops": [],
"num_ops": 0
}

On the client if I attach the debugger to ceph-fuse and pick the right thread, I see that it is stuck in Client::ll_write() calling Client::_write() calling Client::get_caps().

Finding the inode number from the debugger (in this case: 1099639990180 0x10007a6a7a4), I did an MDS cache dump and see this in the MDS cache for the inode:

[root@cephmon02 tmp]# grep users/landerson/lyalphaVaried3/PIG_004/FOFGroups/BlackholeAccretionRate/00001A cache-dump.txt
[inode 10007a6a7a4 [2,head] /users/landerson/lyalphaVaried3/PIG_004/FOFGroups/BlackholeAccretionRate/00001A auth v354 s=1512 n(v0 b1512 1=1+0) (iversion lock) caps={5437689=pAsLsXsFc/-@12} | ptrwaiter=0 request=0 lock=0 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=0 0x55\
e320cd3a80]
[dentry #1/users/landerson/lyalphaVaried3/PIG_004/FOFGroups/BlackholeAccretionRate/00001A [2,head] auth (dversion lock) v=354 inode=0x55e320cd3a80 | request=0 lock=0 inodepin=1 dirty=0 waiter=0 authpin=0 clientlease=0 0x55e2fcee2ab0]

[root@cephmon02 tmp]# grep 0x55e320cd3a80 cache-dump.txt
[inode 10007a6a7a4 [2,head] /users/landerson/lyalphaVaried3/PIG_004/FOFGroups/BlackholeAccretionRate/00001A auth v354 s=1512 n(v0 b1512 1=1+0) (iversion lock) caps={5437689=pAsLsXsFc/-@12} | ptrwaiter=0 request=0 lock=0 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=0 0x55\
e320cd3a80]
[dentry #1/users/landerson/lyalphaVaried3/PIG_004/FOFGroups/BlackholeAccretionRate/00001A [2,head] auth (dversion lock) v=354 inode=0x55e320cd3a80 | request=0 lock=0 inodepin=1 dirty=0 waiter=0 authpin=0 clientlease=0 0x55e2fcee2ab0]

[root@cephmon02 tmp]# grep 10007a6a7a4 cache-dump.txt
[inode 10007a6a7a4 [2,head] /users/landerson/lyalphaVaried3/PIG_004/FOFGroups/BlackholeAccretionRate/00001A auth v354 s=1512 n(v0 b1512 1=1+0) (iversion lock) caps={5437689=pAsLsXsFc/-@12} | ptrwaiter=0 request=0 lock=0 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=0 0x55\
e320cd3a80]

It seems like there is a "disconnect" between the MDS and the client about a cap request: the client is holding a thread waiting for caps to arrive, but the MDS doesn't think it needs to do anything.

If I bounce the MDS, everything gets resolved. It seems like the client loses its connection, reconnects to the active MDS and re-negotiates the caps, which fixes the hang.

Any ideas?

ceph-fuse-cache.gz - ceph-fuse cache dump - stuck file is users/landerson/lyalphaVaried4/PART_000/1/GroupID/000029 (32.6 KB) Andras Pataki, 11/03/2017 07:30 PM


Related issues

Copied to fs - Backport #22240: luminous: Processes stuck waiting for write with ceph-fuse Resolved
Copied to fs - Backport #22241: jewel: Processes stuck waiting for write with ceph-fuse Resolved

History

#1 Updated by Zheng Yan over 1 year ago

no idea how did it happen. please use admin socket to dump ceph-fuse's cache (ceph daemon client.xxx dump_cache)

#2 Updated by Andras Pataki over 1 year ago

Attached is the ceph-fuse cache dump. This is a different instance of the problem (all the same symptoms), so the process in this case is accessing the file /mnt/ceph/users/landerson/lyalphaVaried4/PART_000/1/GroupID/000029

The MDS cache looks like this for this file:
[root@cephmon01 tmp]# grep users/landerson/lyalphaVaried4/PART_000/1/GroupID/000029 mds-cache-dump.txt
[inode 10007b104c7 [2,head] /users/landerson/lyalphaVaried4/PART_000/1/GroupID/000029 auth v920 s=798916 n(v0 b798916 1=1+0) (ifile xsyn) (iversion lock) cr={5545419=0-4194304@1} caps={5538376=pAsLsXsFc/-@8,5545419=pAsLsXs/pAsxXsxFsxcrwb@6},l=5538376(5545419) | ptrwaiter=0 request=0 lock=0 caps=1 dirtyparent=0 dirty=0 waiter=0 authpin=0 0x5596d5984d70]
[dentry #1/users/landerson/lyalphaVaried4/PART_000/1/GroupID/000029 [2,head] auth (dversion lock) v=920 inode=0x5596d5984d70 | request=0 lock=0 inodepin=1 dirty=0 authpin=0 clientlease=0 0x5596c04e6120]

Also, I've updated the client to the latest release 12.2.1, and the problem is reproduced there as well.

Investigating the code that produces this issue - multiple processes from different nodes write to the same file concurrently, but to different parts of it.

#3 Updated by Zheng Yan over 1 year ago

  • Status changed from New to In Progress

The second one is actually different from the first one. Seems like the first one was caused by 'client session gets evicted by mds, then client reconnect". The issue should have been fixed in v12.2.1. (client re-requests caps after reconnect)

I'm working on fixing the second one.

#4 Updated by Zheng Yan over 1 year ago

  • Status changed from In Progress to Need Review

#5 Updated by Andras Pataki over 1 year ago

I've applied this patch to the latest luminous branch, rebuild the MDS and tested it in a test environment with the code that causes the hangs. I'm happy to report that the code now runs well - could not reproduce any hangs! We'll keep testing. Thanks very much for the quick turnaround and hope that the patch can make its way into a release soon.

#6 Updated by Patrick Donnelly over 1 year ago

  • Assignee set to Zheng Yan

#7 Updated by Patrick Donnelly over 1 year ago

  • Status changed from Need Review to Pending Backport
  • Backport set to luminous,jewel

#8 Updated by Nathan Cutler over 1 year ago

  • Copied to Backport #22240: luminous: Processes stuck waiting for write with ceph-fuse added

#9 Updated by Nathan Cutler over 1 year ago

  • Copied to Backport #22241: jewel: Processes stuck waiting for write with ceph-fuse added

#10 Updated by Nathan Cutler about 1 year ago

  • Status changed from Pending Backport to Resolved

#11 Updated by Ivan Guan 9 months ago

Zheng Yan wrote:

The second one is actually different from the first one. Seems like the first one was caused by 'client session gets evicted by mds, then client reconnect". The issue should have been fixed in v12.2.1. (client re-requests caps after reconnect)

I'm working on fixing the second one.

Hi Zheng,

I may met the first bug and my ceph is in jewel 10.2.2. My client hang and log display "waiting for caps need Fw want Fb" or "waiting for caps need Fr want Fc". I know 10.2.11 have solved the problem bug i want to know which pr had solve the bug and i wander how it solved the bug?

Also available in: Atom PDF