Bug #12466
closedInit script bug with two clusters with the same osd ID on the same host
0%
Description
I installed two Ceph clusters (Centos 7, Ceph 9.0.1), one is "ceph", another is "ceph-prod". Both clusters have independent OSD numeration, so it turned out that i have the same osd IDs on some machines but belonging to different clusters, e.g. on one host I have osd.1 from "ceph" and osd.1 from "ceph-prod".
The problem is that by default Ceph stores OSD PID files in one directory (/var/tun/ceph) and calls them without honoring the cluster name, e.g. osd.1 have PID file "osd.1.pid" regardless of cluster it belongs to. So when osd.1 from "ceph" is already started, I cannot anymore start osd.1 from "ceph-prod" using /etc/init.d/ceph script, because it always tells:
=== osd.1 ===
Starting Ceph osd.1 on node1...already running
Error comes from init script where function is defined for checking OSD status:
daemon_is_running() {
name=$1
daemon=$2
daemon_id=$3
pidfile=$4
do_cmd "[ -e $pidfile ] || exit 1 # no pid, presumably not running
pid=\`cat $pidfile\`
[ -e /proc/\$pid ] && grep -q $daemon /proc/\$pid/cmdline && grep -qwe -i.$daemon_id /proc/\$pid/cmdline && exit 0 # running
exit 1 # pid is something else" "" "okfail"
}
And PID file name if generated like that:
get_conf pid_file "$run_dir/$type.$id.pid" "pid file"
So here we do not honor cluster name anyhow. It means that I just cannot start OSD from second cluster using init script, or at least cannot find a way how to do that.
Updated by huang jun almost 9 years ago
use "$run_dir/$cluster.$type.$id.pid" instead of "$run_dir/$type.$id.pid" should be a good idea,
but why you use two cluster include osds on the same host? you can use two pool instead.
Updated by Angapov Vasily almost 9 years ago
Unfortunately this not that easy, cluster name hardcoded in many other places in init script as well as in ceph_common.sh.
I'm trying to figure that out for a moment.
Reason for two clusters is easy: I need to have a stable cluster for OpenStack and a playground for performance tuning. So two pools approach is not very comfortable here.
Updated by Angapov Vasily over 8 years ago
Looks like I managed to abandon usage of ceph init-script. I wrote my own systemd-generator script which is very handful and automated. I wrote about that to ceph-devel ML. Code can be found here https://github.com/angapov/ceph-systemd/.
For me that issue can be closed.