Project

General

Profile

Actions

Bug #12466

closed

Init script bug with two clusters with the same osd ID on the same host

Added by Angapov Vasily over 8 years ago. Updated about 7 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I installed two Ceph clusters (Centos 7, Ceph 9.0.1), one is "ceph", another is "ceph-prod". Both clusters have independent OSD numeration, so it turned out that i have the same osd IDs on some machines but belonging to different clusters, e.g. on one host I have osd.1 from "ceph" and osd.1 from "ceph-prod".

The problem is that by default Ceph stores OSD PID files in one directory (/var/tun/ceph) and calls them without honoring the cluster name, e.g. osd.1 have PID file "osd.1.pid" regardless of cluster it belongs to. So when osd.1 from "ceph" is already started, I cannot anymore start osd.1 from "ceph-prod" using /etc/init.d/ceph script, because it always tells:

=== osd.1 ===
Starting Ceph osd.1 on node1...already running

Error comes from init script where function is defined for checking OSD status:

daemon_is_running() {
name=$1
daemon=$2
daemon_id=$3
pidfile=$4
do_cmd "[ -e $pidfile ] || exit 1 # no pid, presumably not running
pid=\`cat $pidfile\`
[ -e /proc/\$pid ] && grep -q $daemon /proc/\$pid/cmdline && grep -qwe -i.$daemon_id /proc/\$pid/cmdline && exit 0 # running
exit 1 # pid is something else" "" "okfail"
}

And PID file name if generated like that:

get_conf pid_file "$run_dir/$type.$id.pid" "pid file"

So here we do not honor cluster name anyhow. It means that I just cannot start OSD from second cluster using init script, or at least cannot find a way how to do that.

Actions #1

Updated by huang jun over 8 years ago

use "$run_dir/$cluster.$type.$id.pid" instead of "$run_dir/$type.$id.pid" should be a good idea,

but why you use two cluster include osds on the same host? you can use two pool instead.

Actions #2

Updated by Angapov Vasily over 8 years ago

Unfortunately this not that easy, cluster name hardcoded in many other places in init script as well as in ceph_common.sh.
I'm trying to figure that out for a moment.

Reason for two clusters is easy: I need to have a stable cluster for OpenStack and a playground for performance tuning. So two pools approach is not very comfortable here.

Actions #3

Updated by Angapov Vasily over 8 years ago

Looks like I managed to abandon usage of ceph init-script. I wrote my own systemd-generator script which is very handful and automated. I wrote about that to ceph-devel ML. Code can be found here https://github.com/angapov/ceph-systemd/.

For me that issue can be closed.

Actions #4

Updated by Sage Weil about 7 years ago

  • Status changed from New to Won't Fix
Actions

Also available in: Atom PDF