Bug #172
closedOSD and MDS crash on rm -r
0%
Description
I'm still using my test script which unpacks the kernel source and then removes it again with a few steps in between.
Right now the copy back from the snapshot goes fine, but afterwards the rm of the original kernel files fail.
Attached you will find various files with straces and logs in them, i'll try to point out the scenario:
The scripts runs fine and the copy back from the snapshot goes fine, expect for the messages in "ceph_client_script_log.txt"
Why can't those files be found? That seems like a different bug to me?
Well, the bug of the stalling cp seems to have been fixed, but while removing the files afterwards the MDS crashes first, 15 seconds later, all 5 OSD's go down as well.
In the mds strace and osd strace i've added a stat of /core, here you can see the MDS core dump is older then the OSD, thus the MDS crashes first.
After gathering all this information i started my OSD's and the crash MDS (192.168.6.206) again. While doing so, my cluster started to recover, but then mds0 (192.168.6.205) crashed. (Core and log is also attached in mds0_*)
I'm doing a clean mkcephfs right now and running the same test again, expecting the same result as it happened for the second time today.
My Envirioment:- Branch: unstable ( 7c0df0540700fe2816470f5cc2a2fc7a130e4456 )
- OS: Ubuntu 10.04 (AMD64)
- Kernel: 2.6.34
Files