Bug #6807
closed
Debian Wheezy Teuthology Ceph-deploy run failed.
Added by Anonymous over 10 years ago.
Updated over 10 years ago.
Description
/a/teuthology-2013-11-19_01:10:01-ceph-deploy-next-testing-basic-vps/108339 needs to be investigated. The run reports coredumps, and RuntimeError: Failed to execute command: rm rf --one-file-system - /var/lib/ceph
UPDATE
Something that is not directly related to ceph-deploy is triggering this behavior of having OSDs mounted which is why ceph-deploy
can't remove the contents.
A better error message should be put in place so that when the error is triggered it is clear that there might be OSDs still present.
That error should read:
RuntimeError: Failed to execute command: rm -rf --one-file-system -- /var/lib/ceph
- Assignee set to Alfredo Deza
- Status changed from New to 12
Good catch, I think that this is the culprit:
[[1;37mINFO[0m ] Running command: sudo rm -rf --one-file-system -- /var/lib/ceph
[[1;31mERROR[0m ] rm: skipping `/var/lib/ceph/osd/ceph-3', since it's on a different device
[[1;31mERROR[0m ] rm: skipping `/var/lib/ceph/osd/ceph-2', since it's on a different device
So for ceph-deploy, this is something that should complete normally as long as the directories its removing are on the same file system.
I'm not sure about the historical reasons why these commands where done with `--one-file-system`, but removing that constraint would definitely
fix this problem
- Status changed from 12 to 4
- Priority changed from Normal to High
It looks like the reason we were enforcing the single file system was because we might still have OSDs mounted (hence the different file system) and we might not
want to remove them.
The current resolution to fix this is that we should probably have a big WARNING message if we are unable to remove the contents for this reason and not error out.
Before implementing this I would really like some confirmation that this is indeed the correct path.
It sounds like there was an earlier problem with the test or a different failure — why is it trying to delete the ceph directories (purge, presumably) while it still has OSDs mounted on the filesystem? We don't want to "succeed" at removing the directory if there was stuff we couldn't get rid of...
- Description updated (diff)
- Status changed from 4 to In Progress
Just found out that before, we would just try to remove `/var/lib/ceph` and then we would check if that is still there and then attempt to unmount
anything that might still be lingering around.
When ceph-deploy made the change to actually exit when system calls are returning non-zero exit status this logic would not work any longer, so it would
prevent the unmounting of potential OSDs that might still be mounted at that location.
The fix then, is no longer adding a warning to that action, but actually dealing with the error (if it appears) and then unmount, just like it was before.
- Status changed from In Progress to Fix Under Review
- Status changed from Fix Under Review to Resolved
Merged into ceph-deploy's master branch with hash: 109040e
Also available in: Atom
PDF