Bug #12466
closedInit script bug with two clusters with the same osd ID on the same host
0%
Description
I installed two Ceph clusters (Centos 7, Ceph 9.0.1), one is "ceph", another is "ceph-prod". Both clusters have independent OSD numeration, so it turned out that i have the same osd IDs on some machines but belonging to different clusters, e.g. on one host I have osd.1 from "ceph" and osd.1 from "ceph-prod".
The problem is that by default Ceph stores OSD PID files in one directory (/var/tun/ceph) and calls them without honoring the cluster name, e.g. osd.1 have PID file "osd.1.pid" regardless of cluster it belongs to. So when osd.1 from "ceph" is already started, I cannot anymore start osd.1 from "ceph-prod" using /etc/init.d/ceph script, because it always tells:
=== osd.1 ===
Starting Ceph osd.1 on node1...already running
Error comes from init script where function is defined for checking OSD status:
daemon_is_running() {
name=$1
daemon=$2
daemon_id=$3
pidfile=$4
do_cmd "[ -e $pidfile ] || exit 1 # no pid, presumably not running
pid=\`cat $pidfile\`
[ -e /proc/\$pid ] && grep -q $daemon /proc/\$pid/cmdline && grep -qwe -i.$daemon_id /proc/\$pid/cmdline && exit 0 # running
exit 1 # pid is something else" "" "okfail"
}
And PID file name if generated like that:
get_conf pid_file "$run_dir/$type.$id.pid" "pid file"
So here we do not honor cluster name anyhow. It means that I just cannot start OSD from second cluster using init script, or at least cannot find a way how to do that.