Project

General

Profile

Actions

Bug #12466

closed

Init script bug with two clusters with the same osd ID on the same host

Added by Angapov Vasily almost 9 years ago. Updated about 7 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I installed two Ceph clusters (Centos 7, Ceph 9.0.1), one is "ceph", another is "ceph-prod". Both clusters have independent OSD numeration, so it turned out that i have the same osd IDs on some machines but belonging to different clusters, e.g. on one host I have osd.1 from "ceph" and osd.1 from "ceph-prod".

The problem is that by default Ceph stores OSD PID files in one directory (/var/tun/ceph) and calls them without honoring the cluster name, e.g. osd.1 have PID file "osd.1.pid" regardless of cluster it belongs to. So when osd.1 from "ceph" is already started, I cannot anymore start osd.1 from "ceph-prod" using /etc/init.d/ceph script, because it always tells:

=== osd.1 ===
Starting Ceph osd.1 on node1...already running

Error comes from init script where function is defined for checking OSD status:

daemon_is_running() {
name=$1
daemon=$2
daemon_id=$3
pidfile=$4
do_cmd "[ -e $pidfile ] || exit 1 # no pid, presumably not running
pid=\`cat $pidfile\`
[ -e /proc/\$pid ] && grep -q $daemon /proc/\$pid/cmdline && grep -qwe -i.$daemon_id /proc/\$pid/cmdline && exit 0 # running
exit 1 # pid is something else" "" "okfail"
}

And PID file name if generated like that:

get_conf pid_file "$run_dir/$type.$id.pid" "pid file"

So here we do not honor cluster name anyhow. It means that I just cannot start OSD from second cluster using init script, or at least cannot find a way how to do that.

Actions

Also available in: Atom PDF