Project

General

Profile

Actions

Bug #21078

closed

df hangs in ceph-fuse

Added by John Spray over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
ceph-fuse
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

See "[ceph-users] ceph-fuse hanging on df with ceph luminous >= 12.1.3".

The filesystem works normally, except for "df" hangs. Broke between 12.1.2 and 12.1.3.


Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #21099: luminous: client: df hangs in ceph-fuseResolvedPatrick DonnellyActions
Actions #1

Updated by John Spray over 6 years ago

Loops like this:

2017-08-23 14:44:28.978048 7fc00a804700  1 -- 192.168.18.100:0/2588995716 <== mon.0 192.168.18.1:6789/0 1 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (2575765019 0 0) 0x7fc01ff4b480 con 0x7fc01fcee800
2017-08-23 14:44:28.978130 7fc00a804700  1 -- 192.168.18.100:0/2588995716 --> 192.168.18.1:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x7fc01ff4b700 con 0
2017-08-23 14:44:28.978150 7fc00a804700  1 -- 192.168.18.100:0/2588995716 <== mon.2 192.168.18.3:6789/0 1 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (1852249480 0 0) 0x7fc01ff4a300 con 0x7fc01fd99000
2017-08-23 14:44:28.978233 7fc00a804700  1 -- 192.168.18.100:0/2588995716 --> 192.168.18.3:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x7fc01ff4b480 con 0
2017-08-23 14:44:28.979312 7fc00a804700  1 -- 192.168.18.100:0/2588995716 <== mon.0 192.168.18.1:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 206+0+0 (3304410501 0 0) 0x7fc01ff4b700 con 0x7fc01fcee800
2017-08-23 14:44:28.979434 7fc00a804700  1 -- 192.168.18.100:0/2588995716 --> 192.168.18.1:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- 0x7fc01ff4a300 con 0
2017-08-23 14:44:28.979452 7fc00a804700  1 -- 192.168.18.100:0/2588995716 <== mon.2 192.168.18.3:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 206+0+0 (689196011 0 0) 0x7fc01ff4b480 con 0x7fc01fd99000
2017-08-23 14:44:28.979497 7fc00a804700  1 -- 192.168.18.100:0/2588995716 --> 192.168.18.3:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- 0x7fc01ff4b700 con 0
2017-08-23 14:44:28.980856 7fc00a804700  1 -- 192.168.18.100:0/2588995716 <== mon.2 192.168.18.3:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 409+0+0 (4018737366 0 0) 0x7fc01ff4b700 con 0x7fc01fd99000
2017-08-23 14:44:28.980925 7fc00a804700  1 -- 192.168.18.100:0/2588995716 >> 192.168.18.1:6789/0 conn(0x7fc01fcee800 :-1 s=STATE_OPEN pgs=26315 cs=1 l=1).mark_down
2017-08-23 14:44:28.981038 7fc00a804700  1 -- 192.168.18.100:0/2588995716 --> 192.168.18.3:6789/0 -- mon_subscribe({mdsmap=1098+,monmap=4+}) v2 -- 0x7fc01fe54000 con 0
2017-08-23 14:44:28.981055 7fc00a804700  1 -- 192.168.18.100:0/2588995716 --> 192.168.18.3:6789/0 -- statfs(1 pool -1 v0) v2 -- 0x7fc01fe54240 con 0
2017-08-23 14:44:28.981065 7fc00a804700  1 -- 192.168.18.100:0/2588995716 --> 192.168.18.3:6789/0 -- statfs(1 pool -1 v0) v2 -- 0x7fc01fe54480 con 0
2017-08-23 14:44:28.981671 7fc00f00d700  1 -- 192.168.18.100:0/2588995716 >> 192.168.18.3:6789/0 conn(0x7fc01fd99000 :-1 s=STATE_OPEN pgs=26774 cs=1 l=1).read_bulk peer close file descriptor 0
2017-08-23 14:44:28.981684 7fc00f00d700  1 -- 192.168.18.100:0/2588995716 >> 192.168.18.3:6789/0 conn(0x7fc01fd99000 :-1 s=STATE_OPEN pgs=26774 cs=1 l=1).read_until read failed
2017-08-23 14:44:28.981689 7fc00f00d700  1 -- 192.168.18.100:0/2588995716 >> 192.168.18.3:6789/0 conn(0x7fc01fd99000 :-1 s=STATE_OPEN pgs=26774 cs=1 l=1).process read tag failed
2017-08-23 14:44:28.981693 7fc00f00d700  1 -- 192.168.18.100:0/2588995716 >> 192.168.18.3:6789/0 conn(0x7fc01fd99000 :-1 s=STATE_OPEN pgs=26774 cs=1 l=1).fault on lossy channel, failing
2017-08-23 14:44:28.981733 7fc00a804700  1 -- 192.168.18.100:0/2588995716 >> 192.168.18.3:6789/0 conn(0x7fc01fd99000 :-1 s=STATE_CLOSED pgs=26774 cs=1 l=1).mark_down
2017-08-23 14:44:28.982310 7fc00a804700  1 -- 192.168.18.100:0/2588995716 --> 192.168.18.1:6789/0 -- auth(proto 0 35 bytes epoch 3) v1 -- 0x7fc01ff4b700 con 0
2017-08-23 14:44:28.982327 7fc00a804700  1 -- 192.168.18.100:0/2588995716 --> 192.168.18.3:6789/0 -- auth(proto 0 35 bytes epoch 3) v1 -- 0x7fc01ff4b480 con 0
2017-08-23 14:44:28.982341 7fc00a804700  0 client.2640316 ms_handle_reset on 192.168.18.3:6789/0
2017-08-23 14:44:28.983235 7fc00a804700 10 client.2640316.objecter ms_handle_connect 0x7fc01fcee800
2017-08-23 14:44:28.983244 7fc00a804700 10 client.2640316.objecter resend_mon_ops
2017-08-23 14:44:28.983245 7fc00a804700 10 client.2640316.objecter fs_stats_submit1
2017-08-23 14:44:28.983248 7fc00a804700 10 client.2640316 ms_handle_connect on 192.168.18.1:6789/0
2017-08-23 14:44:28.983387 7fc00a804700 10 client.2640316.objecter ms_handle_connect 0x7fc01fd96000
2017-08-23 14:44:28.983394 7fc00a804700 10 client.2640316.objecter resend_mon_ops
2017-08-23 14:44:28.983396 7fc00a804700 10 client.2640316.objecter fs_stats_submit1
2017-08-23 14:44:28.983419 7fc00a804700 10 client.2640316 ms_handle_connect on 192.168.18.3:6789/0

Note that my client was upgraded but my server's weren't yet. So could be compatibility issue with the MStatfs message change.

Actions #2

Updated by John Spray over 6 years ago

yep. mon says:

2017-08-23 13:44:31.455063 7fd5045ba700  0 will not decode message of type 13 version 2 because compat_version 2 > supported version 1
Actions #3

Updated by John Spray over 6 years ago

  • Status changed from New to Fix Under Review
  • Backport set to luminous
Actions #4

Updated by Patrick Donnelly over 6 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #5

Updated by Patrick Donnelly over 6 years ago

Actions #6

Updated by Patrick Donnelly over 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF