Project

General

Profile

Bug #24362

ceph-objectstore-tool incorrectly invokes crush_location_hook

Added by Roman Chebotarev almost 6 years ago. Updated almost 6 years ago.

Status:
Triaged
Priority:
Normal
Assignee:
-
Category:
Administration/Usability
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
objectstore-tool
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Ceph release being used: 12.5.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)

/etc/ceph/ceph.conf contains parameter:

crush_location_hook = /var/lib/ceph-tools/osd/ceph-osd-crush-location.sh

This script takes the --id #id parameter, matches it to a host and outputs a host=... crush map value, returning 0 in case of success. If ID cannot be parsed, the script raises an error and returns 1.
During normal ceph-osd daemon startup everythin works as intended.

When we had to use ceph-objectstore-tool (in circumstances similar to #19092) we encountered two issues.

The first is a strange error output by ceph-objectstore-tool (the OSD daemon in question was, obviously, stopped at the time):

~# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-196/ --journal-path /var/lib/ceph/osd/ceph-196/journal --pgid 11.9 --op list
failed to init_on_startup : (0) Success
~# echo $?
1

Having examined the source code of the global_init function at src/global/global_init.cc, we discovered that the issue arises when this call is performed:

g_ceph_context->crush_location.init_on_startup()

Which brings us to the second issue - during configuration analysis the script specified in crush_location_hook was called, but instead of expected arguments including --id #id it recieves:

--cluster ceph --id admin --type osd

Since a plain string containing "admin" does not in any way resemle an OSD ID, the script specified in crush_location_hook exits with a non-zero exit code, which caues ceph-objectstore-tool itself to fail without any meaningful explanation.

Our workaround at the moment is simply to comment the crush_location_hook parameter out of the configuration file when working with ceph-objectstore-tool.

The desired results are:
  1. The script specified in crush_location_hook is executed with the correct OSD ID.
  2. If the script fails, output an error message stating that crush location cannot be determined through the external hook script, including the path to the script and its own output.

History

#1 Updated by Josh Durgin almost 6 years ago

  • Subject changed from ceph-objectstore-tool interacts incorrectly with crush_location_hook to ceph-objectstore-tool incorrectly invokes crush_location_hook
  • Category changed from Scrub/Repair to Administration/Usability
  • Status changed from New to Triaged

Seems like the way to fix this is to stop ceph-objectstore-tool from trying to use the crush location hook at all.

You should be able to work around it by passing e.g. --name osd.0 --keyring /var/lib/ceph/osd/0/keyring to ceph-objectstore-tool

Also available in: Atom PDF