Project

General

Profile

Actions

Bug #7503

closed

mds start and oops after access to cephfs

Added by Yann Dupont about 10 years ago. Updated about 10 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

this is a follow up to http://tracker.ceph.com/issues/7367, which explain the scenario.

I now attach the mds.log


Files

ceph-mds.lmbb1.log (639 KB) ceph-mds.lmbb1.log Yann Dupont, 02/21/2014 07:18 AM
Actions #1

Updated by John Spray about 10 years ago

MDS is getting an ENFILE (object lost) from the OSD while trying to read the OMAP from one of its stray directory objects. Given the history on the other tickets (especially going back and forth between versions that expected TMAPs vs OMAPs) this isn't hard to believe. CephFS doesn't currently have a way to recover from this: on a production system it would probably be time to recover from backups (after deleting your data and metadata pools and using 'ceph newfs' to recreate an empty filesystem).

Actions #2

Updated by Ian Colle about 10 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)
  • Target version deleted (v0.77)
Actions #3

Updated by Yann Dupont about 10 years ago

Ok for explanation, and as already said, all that data was test data, so I can loose it without problems. I also fully understand that this probably shouldn't happen on a production system, where you won't downgrade versions...
And I also understand cephfs isn't production ready right now.

My concern is more : if it happens (for any reason, like human error) on a production system, not having any way to correct is problematic.

It is better to be safe than sorry, so maybe :

1) Not allowing a downgrade of OSD when incompatible feture set is detected, like mds is already doing, to prevent such things
2) Even in case of corruption or something going wrong, my feeling is that the mds should not oops (yes, easy to say, not easy to do), but rather going in read-only mode (is this possible ?). That the way local FS generally treat problems/corruptions/unexpected or garbled structures.

3) In case of corruption, is there any cephfsck possible in near/distant future ?

Actions #4

Updated by Greg Farnum about 10 years ago

  • Status changed from New to Won't Fix

Ah, it sounds like this is happening because the MDS doesn't currently have a good versioning system to prevent too-old daemons coming up. I've created ticket #7531 for that.
More sophisticated responses are not in the near future, unfortunately.

Actions #5

Updated by Yann Dupont about 10 years ago

fine, ok for the ticket #7531.This one should be closed.

Actions

Also available in: Atom PDF