Bug #288
closedcmds disappears under snapshot load
0%
Description
the ceph.git/unstable cmds gets killed by my snaptest-2 (http://
github.com/vinzent/ceph-testsuite/blob/master/tests/snaptest-2) with ceph-
client-standalone/unstable-backport on kernel 2.6.34.1. I can reproduce
the behaviour.
it somewhere happens on the "Delete the snapshots..." phase.
kernel log:
[ 2024.315441] ceph: client4102 fsid ab2d5e45-9f53-7764-c958-c099f5be6e33
[ 2024.316111] ceph: mon0 127.0.0.1:6789 session established
[ 3753.964099] ceph: tid 11109 timed out on osd0, will reset osd
[ 4054.056374] ceph: tid 15029 timed out on osd0, will reset osd
[ 4098.646013] ceph: mds0 127.0.0.1:6802 socket closed
[ 4099.804937] ceph: mds0 127.0.0.1:6802 connection failed
[ 4100.804629] ceph: mds0 127.0.0.1:6802 connection failed
[ 4101.804638] ceph: mds0 127.0.0.1:6802 connection failed
[ 4103.804636] ceph: mds0 127.0.0.1:6802 connection failed
[ 4107.804381] ceph: mds0 127.0.0.1:6802 connection failed
[ 4115.804644] ceph: mds0 127.0.0.1:6802 connection failed
[ 4131.804387] ceph: mds0 127.0.0.1:6802 connection failed
[ 4144.804343] ceph: mds0 caps stale
[ 4159.806936] ceph: mds0 caps stale
[ 4163.804149] ceph: mds0 127.0.0.1:6802 connection failed
there is no cmds segfault message, but cmds process has gone.
- Thomas
Updated by Greg Farnum almost 14 years ago
- Status changed from New to In Progress
- Assignee set to Greg Farnum
Updated by Greg Farnum almost 14 years ago
- Status changed from In Progress to 4
Tried this with cfuse and got an mds crash with core dump. Looks like there's an issue with selecting the proper snaprealm in listing.
Updated by Greg Farnum almost 14 years ago
- Status changed from 4 to 7
Sage should have got this in commit:1271fdd0e345d64493c386167e38e3bfea7c52e6. Will test and confirm.
Updated by Greg Farnum almost 14 years ago
- Status changed from 7 to In Progress
Looks like there's more to it than this, I got another crash farther on. Continuing to study.
Also there might be an issue in a multi-MDS setup, I got a hang before switching to a single-MDS test.
Updated by Greg Farnum almost 14 years ago
Switched back to using get_oldest_snap, works on a single-MDS install as of commit: e2b1a4ee119a68b403582ae3bc15b54e9458b9b5. Still having odd issues with a multi-MDS instance, though.
Updated by Greg Farnum almost 14 years ago
- Status changed from In Progress to Closed
All right, it works on one MDS. Opened #318 to track issues with the multi-mds cluster.
Updated by John Spray over 7 years ago
- Project changed from Ceph to CephFS
- Category deleted (
1)
Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.