Bug #7503
closedmds start and oops after access to cephfs
0%
Description
this is a follow up to http://tracker.ceph.com/issues/7367, which explain the scenario.
I now attach the mds.log
Files
Updated by John Spray about 10 years ago
MDS is getting an ENFILE (object lost) from the OSD while trying to read the OMAP from one of its stray directory objects. Given the history on the other tickets (especially going back and forth between versions that expected TMAPs vs OMAPs) this isn't hard to believe. CephFS doesn't currently have a way to recover from this: on a production system it would probably be time to recover from backups (after deleting your data and metadata pools and using 'ceph newfs' to recreate an empty filesystem).
Updated by Ian Colle about 10 years ago
- Project changed from Ceph to CephFS
- Category deleted (
1) - Target version deleted (
v0.77)
Updated by Yann Dupont about 10 years ago
Ok for explanation, and as already said, all that data was test data, so I can loose it without problems. I also fully understand that this probably shouldn't happen on a production system, where you won't downgrade versions...
And I also understand cephfs isn't production ready right now.
My concern is more : if it happens (for any reason, like human error) on a production system, not having any way to correct is problematic.
It is better to be safe than sorry, so maybe :
1) Not allowing a downgrade of OSD when incompatible feture set is detected, like mds is already doing, to prevent such things
2) Even in case of corruption or something going wrong, my feeling is that the mds should not oops (yes, easy to say, not easy to do), but rather going in read-only mode (is this possible ?). That the way local FS generally treat problems/corruptions/unexpected or garbled structures.
3) In case of corruption, is there any cephfsck possible in near/distant future ?
Updated by Greg Farnum about 10 years ago
- Status changed from New to Won't Fix
Ah, it sounds like this is happening because the MDS doesn't currently have a good versioning system to prevent too-old daemons coming up. I've created ticket #7531 for that.
More sophisticated responses are not in the near future, unfortunately.
Updated by Yann Dupont about 10 years ago
fine, ok for the ticket #7531.This one should be closed.