Bug #288: cmds disappears under snapshot load - CephFS - Ceph

Actions

Copy link

Bug #288

closed

cmds disappears under snapshot load

Added by Greg Farnum almost 14 years ago. Updated over 7 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Greg Farnum

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

the ceph.git/unstable cmds gets killed by my snaptest-2 (http://
github.com/vinzent/ceph-testsuite/blob/master/tests/snaptest-2) with ceph-
client-standalone/unstable-backport on kernel 2.6.34.1. I can reproduce
the behaviour.

it somewhere happens on the "Delete the snapshots..." phase.

kernel log:
[ 2024.315441] ceph: client4102 fsid ab2d5e45-9f53-7764-c958-c099f5be6e33
[ 2024.316111] ceph: mon0 127.0.0.1:6789 session established
[ 3753.964099] ceph: tid 11109 timed out on osd0, will reset osd
[ 4054.056374] ceph: tid 15029 timed out on osd0, will reset osd
[ 4098.646013] ceph: mds0 127.0.0.1:6802 socket closed
[ 4099.804937] ceph: mds0 127.0.0.1:6802 connection failed
[ 4100.804629] ceph: mds0 127.0.0.1:6802 connection failed
[ 4101.804638] ceph: mds0 127.0.0.1:6802 connection failed
[ 4103.804636] ceph: mds0 127.0.0.1:6802 connection failed
[ 4107.804381] ceph: mds0 127.0.0.1:6802 connection failed
[ 4115.804644] ceph: mds0 127.0.0.1:6802 connection failed
[ 4131.804387] ceph: mds0 127.0.0.1:6802 connection failed
[ 4144.804343] ceph: mds0 caps stale
[ 4159.806936] ceph: mds0 caps stale
[ 4163.804149] ceph: mds0 127.0.0.1:6802 connection failed

there is no cmds segfault message, but cmds process has gone.

- Thomas

Actions

Copy link

Updated by Greg Farnum almost 14 years ago

Status changed from New to In Progress
Assignee set to Greg Farnum

Actions

Copy link

Updated by Greg Farnum almost 14 years ago

Status changed from In Progress to 4

Tried this with cfuse and got an mds crash with core dump. Looks like there's an issue with selecting the proper snaprealm in listing.

Actions

Copy link

Updated by Greg Farnum almost 14 years ago

Status changed from 4 to 7

Sage should have got this in commit:1271fdd0e345d64493c386167e38e3bfea7c52e6. Will test and confirm.

Actions

Copy link

Updated by Greg Farnum almost 14 years ago

Status changed from 7 to In Progress

Looks like there's more to it than this, I got another crash farther on. Continuing to study.

Also there might be an issue in a multi-MDS setup, I got a hang before switching to a single-MDS test.

Actions

Copy link

Updated by Greg Farnum almost 14 years ago

Switched back to using get_oldest_snap, works on a single-MDS install as of commit: e2b1a4ee119a68b403582ae3bc15b54e9458b9b5. Still having odd issues with a multi-MDS instance, though.

Actions

Copy link