Bug #24362
ceph-objectstore-tool incorrectly invokes crush_location_hook
0%
Description
Ceph release being used: 12.5.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
/etc/ceph/ceph.conf contains parameter:
crush_location_hook = /var/lib/ceph-tools/osd/ceph-osd-crush-location.sh
This script takes the --id #id
parameter, matches it to a host and outputs a host=...
crush map value, returning 0 in case of success. If ID cannot be parsed, the script raises an error and returns 1.
During normal ceph-osd daemon startup everythin works as intended.
When we had to use ceph-objectstore-tool (in circumstances similar to #19092) we encountered two issues.
The first is a strange error output by ceph-objectstore-tool (the OSD daemon in question was, obviously, stopped at the time):
~# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-196/ --journal-path /var/lib/ceph/osd/ceph-196/journal --pgid 11.9 --op list failed to init_on_startup : (0) Success ~# echo $? 1
Having examined the source code of the global_init function at src/global/global_init.cc, we discovered that the issue arises when this call is performed:
g_ceph_context->crush_location.init_on_startup()
Which brings us to the second issue - during configuration analysis the script specified in crush_location_hook was called, but instead of expected arguments including --id #id
it recieves:
--cluster ceph --id admin --type osd
Since a plain string containing "admin" does not in any way resemle an OSD ID, the script specified in crush_location_hook exits with a non-zero exit code, which caues ceph-objectstore-tool itself to fail without any meaningful explanation.
Our workaround at the moment is simply to comment the crush_location_hook parameter out of the configuration file when working with ceph-objectstore-tool.
The desired results are:- The script specified in crush_location_hook is executed with the correct OSD ID.
- If the script fails, output an error message stating that crush location cannot be determined through the external hook script, including the path to the script and its own output.
History
#1 Updated by Josh Durgin almost 6 years ago
- Subject changed from ceph-objectstore-tool interacts incorrectly with crush_location_hook to ceph-objectstore-tool incorrectly invokes crush_location_hook
- Category changed from Scrub/Repair to Administration/Usability
- Status changed from New to Triaged
Seems like the way to fix this is to stop ceph-objectstore-tool from trying to use the crush location hook at all.
You should be able to work around it by passing e.g. --name osd.0 --keyring /var/lib/ceph/osd/0/keyring to ceph-objectstore-tool