Project

General

Profile

Actions

Bug #288

closed

cmds disappears under snapshot load

Added by Greg Farnum almost 14 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

the ceph.git/unstable cmds gets killed by my snaptest-2 (http://
github.com/vinzent/ceph-testsuite/blob/master/tests/snaptest-2) with ceph-
client-standalone/unstable-backport on kernel 2.6.34.1. I can reproduce
the behaviour.

it somewhere happens on the "Delete the snapshots..." phase.

kernel log:
[ 2024.315441] ceph: client4102 fsid ab2d5e45-9f53-7764-c958-c099f5be6e33
[ 2024.316111] ceph: mon0 127.0.0.1:6789 session established
[ 3753.964099] ceph: tid 11109 timed out on osd0, will reset osd
[ 4054.056374] ceph: tid 15029 timed out on osd0, will reset osd
[ 4098.646013] ceph: mds0 127.0.0.1:6802 socket closed
[ 4099.804937] ceph: mds0 127.0.0.1:6802 connection failed
[ 4100.804629] ceph: mds0 127.0.0.1:6802 connection failed
[ 4101.804638] ceph: mds0 127.0.0.1:6802 connection failed
[ 4103.804636] ceph: mds0 127.0.0.1:6802 connection failed
[ 4107.804381] ceph: mds0 127.0.0.1:6802 connection failed
[ 4115.804644] ceph: mds0 127.0.0.1:6802 connection failed
[ 4131.804387] ceph: mds0 127.0.0.1:6802 connection failed
[ 4144.804343] ceph: mds0 caps stale
[ 4159.806936] ceph: mds0 caps stale
[ 4163.804149] ceph: mds0 127.0.0.1:6802 connection failed

there is no cmds segfault message, but cmds process has gone.

- Thomas

Actions #1

Updated by Greg Farnum almost 14 years ago

  • Status changed from New to In Progress
  • Assignee set to Greg Farnum
Actions #2

Updated by Greg Farnum almost 14 years ago

  • Status changed from In Progress to 4

Tried this with cfuse and got an mds crash with core dump. Looks like there's an issue with selecting the proper snaprealm in listing.

Actions #3

Updated by Greg Farnum almost 14 years ago

  • Status changed from 4 to 7

Sage should have got this in commit:1271fdd0e345d64493c386167e38e3bfea7c52e6. Will test and confirm.

Actions #4

Updated by Greg Farnum almost 14 years ago

  • Status changed from 7 to In Progress

Looks like there's more to it than this, I got another crash farther on. Continuing to study.

Also there might be an issue in a multi-MDS setup, I got a hang before switching to a single-MDS test.

Actions #5

Updated by Greg Farnum almost 14 years ago

Switched back to using get_oldest_snap, works on a single-MDS install as of commit: e2b1a4ee119a68b403582ae3bc15b54e9458b9b5. Still having odd issues with a multi-MDS instance, though.

Actions #6

Updated by Greg Farnum almost 14 years ago

  • Status changed from In Progress to Closed

All right, it works on one MDS. Opened #318 to track issues with the multi-mds cluster.

Actions #7

Updated by John Spray over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Actions

Also available in: Atom PDF