Bug #10947
Updated by Loïc Dachary about 9 years ago
If crushtool takes longer than *mon lease* too long and block the mon, an election will happen and the command will run again, indefinitely. mon lease renew interval < mon lease < mon lease ack timeout <pre> zhuzha:~/ceph/ceph/src% PATH=$(pwd):$PATH ./vstart.sh zhuzha:~/ceph/ceph/src% cat > crushtool #!/bin/sh sleep 10 exit 0 # success ^D zhuzha:~/ceph/ceph/src% ceph osd getcrushmap -o /tmp/map got crush map from osdmap epoch 18 zhuzha:~/ceph/ceph/src% ceph osd setcrushmap -i /tmp/map Hangs forever. In logs at that time: zhuzha:~/ceph/ceph/src% fgrep 'preprocess_query mon_command({"prefix": "osd setcrushmap"}' out/mon.a.log |tail -5 2015-02-25 12:07:17.734768 7f9154604700 10 mon.a@0(leader).osd e23 preprocess_query mon_command({"prefix": "osd setcrushmap"} v 0) v1 from client.14102 172.18.128.29:0/1020924 2015-02-25 12:07:28.158111 7f9154604700 10 mon.a@0(leader).osd e23 preprocess_query mon_command({"prefix": "osd setcrushmap"} v 0) v1 from client.14102 172.18.128.29:0/1020924 2015-02-25 12:07:38.504739 7f9154604700 10 mon.a@0(leader).osd e23 preprocess_query mon_command({"prefix": "osd setcrushmap"} v 0) v1 from client.14102 172.18.128.29:0/1020924 2015-02-25 12:07:49.209307 7f9154604700 10 mon.a@0(leader).osd e24 preprocess_query mon_command({"prefix": "osd setcrushmap"} v 0) v1 from client.14102 172.18.128.29:0/1020924 2015-02-25 12:07:59.577932 7f9154604700 10 mon.a@0(leader).osd e24 preprocess_query mon_command({"prefix": "osd setcrushmap"} v 0) v1 from client.14102 172.18.128.29:0/1020924 </pre> Changing the following fixes the problem: <pre> Modified src/vstart.sh diff --git a/src/vstart.sh b/src/vstart.sh index bf863dc..e1440e6 100755 --- a/src/vstart.sh +++ b/src/vstart.sh @@ -358,6 +358,10 @@ if [ "$start_mon" -eq 1 ]; then mon osd full ratio = .99 mon data avail warn = 10 mon data avail crit = 1 + mon lease = 20 + mon lease renew interval = 18 + mon lease ack timeout = 40 osd pool default erasure code directory = $EC_PATH osd pool default erasure code profile = plugin=jerasure technique=reed_sol_van k=2 m=1 ruleset-failure-domain=osd rgw frontends = fastcgi, civetweb port=$CEPH_RGW_PORT </pre>