Actions
Bug #21078
closeddf hangs in ceph-fuse
Status:
Resolved
Priority:
Immediate
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
ceph-fuse
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
See "[ceph-users] ceph-fuse hanging on df with ceph luminous >= 12.1.3".
The filesystem works normally, except for "df" hangs. Broke between 12.1.2 and 12.1.3.
Updated by John Spray over 6 years ago
Loops like this:
2017-08-23 14:44:28.978048 7fc00a804700 1 -- 192.168.18.100:0/2588995716 <== mon.0 192.168.18.1:6789/0 1 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (2575765019 0 0) 0x7fc01ff4b480 con 0x7fc01fcee800 2017-08-23 14:44:28.978130 7fc00a804700 1 -- 192.168.18.100:0/2588995716 --> 192.168.18.1:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x7fc01ff4b700 con 0 2017-08-23 14:44:28.978150 7fc00a804700 1 -- 192.168.18.100:0/2588995716 <== mon.2 192.168.18.3:6789/0 1 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (1852249480 0 0) 0x7fc01ff4a300 con 0x7fc01fd99000 2017-08-23 14:44:28.978233 7fc00a804700 1 -- 192.168.18.100:0/2588995716 --> 192.168.18.3:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x7fc01ff4b480 con 0 2017-08-23 14:44:28.979312 7fc00a804700 1 -- 192.168.18.100:0/2588995716 <== mon.0 192.168.18.1:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 206+0+0 (3304410501 0 0) 0x7fc01ff4b700 con 0x7fc01fcee800 2017-08-23 14:44:28.979434 7fc00a804700 1 -- 192.168.18.100:0/2588995716 --> 192.168.18.1:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- 0x7fc01ff4a300 con 0 2017-08-23 14:44:28.979452 7fc00a804700 1 -- 192.168.18.100:0/2588995716 <== mon.2 192.168.18.3:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 206+0+0 (689196011 0 0) 0x7fc01ff4b480 con 0x7fc01fd99000 2017-08-23 14:44:28.979497 7fc00a804700 1 -- 192.168.18.100:0/2588995716 --> 192.168.18.3:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- 0x7fc01ff4b700 con 0 2017-08-23 14:44:28.980856 7fc00a804700 1 -- 192.168.18.100:0/2588995716 <== mon.2 192.168.18.3:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 409+0+0 (4018737366 0 0) 0x7fc01ff4b700 con 0x7fc01fd99000 2017-08-23 14:44:28.980925 7fc00a804700 1 -- 192.168.18.100:0/2588995716 >> 192.168.18.1:6789/0 conn(0x7fc01fcee800 :-1 s=STATE_OPEN pgs=26315 cs=1 l=1).mark_down 2017-08-23 14:44:28.981038 7fc00a804700 1 -- 192.168.18.100:0/2588995716 --> 192.168.18.3:6789/0 -- mon_subscribe({mdsmap=1098+,monmap=4+}) v2 -- 0x7fc01fe54000 con 0 2017-08-23 14:44:28.981055 7fc00a804700 1 -- 192.168.18.100:0/2588995716 --> 192.168.18.3:6789/0 -- statfs(1 pool -1 v0) v2 -- 0x7fc01fe54240 con 0 2017-08-23 14:44:28.981065 7fc00a804700 1 -- 192.168.18.100:0/2588995716 --> 192.168.18.3:6789/0 -- statfs(1 pool -1 v0) v2 -- 0x7fc01fe54480 con 0 2017-08-23 14:44:28.981671 7fc00f00d700 1 -- 192.168.18.100:0/2588995716 >> 192.168.18.3:6789/0 conn(0x7fc01fd99000 :-1 s=STATE_OPEN pgs=26774 cs=1 l=1).read_bulk peer close file descriptor 0 2017-08-23 14:44:28.981684 7fc00f00d700 1 -- 192.168.18.100:0/2588995716 >> 192.168.18.3:6789/0 conn(0x7fc01fd99000 :-1 s=STATE_OPEN pgs=26774 cs=1 l=1).read_until read failed 2017-08-23 14:44:28.981689 7fc00f00d700 1 -- 192.168.18.100:0/2588995716 >> 192.168.18.3:6789/0 conn(0x7fc01fd99000 :-1 s=STATE_OPEN pgs=26774 cs=1 l=1).process read tag failed 2017-08-23 14:44:28.981693 7fc00f00d700 1 -- 192.168.18.100:0/2588995716 >> 192.168.18.3:6789/0 conn(0x7fc01fd99000 :-1 s=STATE_OPEN pgs=26774 cs=1 l=1).fault on lossy channel, failing 2017-08-23 14:44:28.981733 7fc00a804700 1 -- 192.168.18.100:0/2588995716 >> 192.168.18.3:6789/0 conn(0x7fc01fd99000 :-1 s=STATE_CLOSED pgs=26774 cs=1 l=1).mark_down 2017-08-23 14:44:28.982310 7fc00a804700 1 -- 192.168.18.100:0/2588995716 --> 192.168.18.1:6789/0 -- auth(proto 0 35 bytes epoch 3) v1 -- 0x7fc01ff4b700 con 0 2017-08-23 14:44:28.982327 7fc00a804700 1 -- 192.168.18.100:0/2588995716 --> 192.168.18.3:6789/0 -- auth(proto 0 35 bytes epoch 3) v1 -- 0x7fc01ff4b480 con 0 2017-08-23 14:44:28.982341 7fc00a804700 0 client.2640316 ms_handle_reset on 192.168.18.3:6789/0 2017-08-23 14:44:28.983235 7fc00a804700 10 client.2640316.objecter ms_handle_connect 0x7fc01fcee800 2017-08-23 14:44:28.983244 7fc00a804700 10 client.2640316.objecter resend_mon_ops 2017-08-23 14:44:28.983245 7fc00a804700 10 client.2640316.objecter fs_stats_submit1 2017-08-23 14:44:28.983248 7fc00a804700 10 client.2640316 ms_handle_connect on 192.168.18.1:6789/0 2017-08-23 14:44:28.983387 7fc00a804700 10 client.2640316.objecter ms_handle_connect 0x7fc01fd96000 2017-08-23 14:44:28.983394 7fc00a804700 10 client.2640316.objecter resend_mon_ops 2017-08-23 14:44:28.983396 7fc00a804700 10 client.2640316.objecter fs_stats_submit1 2017-08-23 14:44:28.983419 7fc00a804700 10 client.2640316 ms_handle_connect on 192.168.18.3:6789/0
Note that my client was upgraded but my server's weren't yet. So could be compatibility issue with the MStatfs message change.
Updated by John Spray over 6 years ago
yep. mon says:
2017-08-23 13:44:31.455063 7fd5045ba700 0 will not decode message of type 13 version 2 because compat_version 2 > supported version 1
Updated by John Spray over 6 years ago
- Status changed from New to Fix Under Review
- Backport set to luminous
Updated by Patrick Donnelly over 6 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Patrick Donnelly over 6 years ago
- Copied to Backport #21099: luminous: client: df hangs in ceph-fuse added
Updated by Patrick Donnelly over 6 years ago
- Status changed from Pending Backport to Resolved
Actions