Support #11073
closed
mds crashing and loop restart
Added by Jean-Sébastien Frerot about 9 years ago.
Updated about 9 years ago.
Description
I don't know how to reproduce the problem, what I know is I played with pools with for cephfs
what I had originally:
[root@compute01 ceph]# ceph fs ls
name: cephfs, metadata pool: metadata, data pools: [data live_migration ]
Then i did: ceph fs new cephfs metadata shared_web
which kinda break my stuff: name: cephfs, metadata pool: metadata, data pools: [shared_web ]
then issued this: ceph fs new cephfs data
which kinda break my stuff more: name: cephfs, metadata pool: metadata, data pools: [data ]
After that I played with the setfattr to finally get the final result:
[root@compute01 ceph]# ceph fs ls
name: cephfs, metadata pool: metadata, data pools: [data live_migration shared_web ]
but when I tried to copy data to my new pool (shared_web) mds started crashing. So here is the log attached.
Files
mds.log (53 KB)
mds.log |
mds log file (latest dump) |
Jean-Sébastien Frerot, 03/09/2015 08:13 PM
|
|
So you had an existing CephFS install and then ran "ceph fs new" without deleting the old one on the monitor, not removing the existing metadata objects from the pools?
The monitor letting you do this is a bug that should have been fixed for v0.87.1; is the monitor running the same version as the MDS is?
yes I'm using all 0.87.1 version.
I ended up restarting the whole cluster then deleting my cephfs to create a new one. They it worked correctly.
- Tracker changed from Bug to Support
- Status changed from New to Resolved
Okay, that sounds correct. When creating a new FS you need to really mean a new filesystem — it recreates a bunch of data.
Unfortunately it doesn't delete everything on its own, so you get the kinds of crashes you're seeing now. I've created #11124 to cover those bits.
In your case, you should delete the FS, delete the FS pools, recreate them, and create a new FS again. That should resolve everything.
Best guess of what's happening here is that an "fs new" was run, the MDSs came up and created a new inotable in which all inodes were available, and then a client reconnected and replayed a request that had an inode being allocated to a mkdir. That asserts out in add_inode, because the newly allocated inode appears to be free in the inode table.
There was a bug fix that would allow "fs new" without running "fs rm" (i.e. it could happen even while MDSs were running), that was fixed in 07b7f101 (which is not in the version you're running).
Question then is why we would be seeing the new inotable (in which the inode is free) vs. the old sessiontable (in which the client sessions still exist and therefore this replay is happening). Could be that when we create new FS, we reset the journal, but not tables, so if inotable parts were journalled but not flushed, whereas sessions were flushed, we would get old sessiontable and new inotable. Actually, this part doesn't rely on the "fs new" bug, it could happen whenever someone does a "fs rm ; fs new" cycle without clearing out the metadata pool.
The monitor fix was backported for v0.87.1, but that doesn't appear to have been at play here anyway (see previous messages).
Also available in: Atom
PDF