Project

General

Profile

Actions

Bug #13236

closed

multiple kill commands lead to data lost when stoping ceph

Added by shun song over 8 years ago. Updated over 8 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

when stoping ceph, multiple kill commands may be sended to one osd or monitor every one second at /etc/init.d/ceph. However, ceph iteself can only handle one kill command at its lifecycle. if there is more kill commands, the system will take over the rest kill command by terminating the osd or monitor imediatelly. This situation is easy to appear if ceph cluster currently has huge loads, where the osd or monitor can't exit successfully imediatelly at first kill command, due to jobs,like sync disk. After one second, another kill command will send to the same osd or monitor, which will destory the normal exit procedure causing data lose, or at least pg data disturbance.

at /etc/init.d/ceph
stop_daemon() {
......
while [ -e /proc/\$pid ] && grep -q $daemon /proc/\$pid/cmdline ; do
cmd=\"kill $signal \$pid\"
echo -n \$cmd...
\$cmd
sleep 1
continue
done
......
}

Actions #1

Updated by Sage Weil over 8 years ago

It's true that the repeat kill signals are probably not helpful, but killing the process like this (or sending it kill -9, or removing power from the machine) do not cause data loss. The only downside is that during the orderly shutdown the OSD tells the mon it is stopping, while a hard stop means waiting a few seconds for the peer OSDs to discover it is down.

Actions #2

Updated by Sage Weil over 8 years ago

  • Status changed from New to Won't Fix
Actions

Also available in: Atom PDF