Project

General

Profile

Bug #10947

ceph osd setcrushmap loops when crushtool lags

Added by Loïc Dachary about 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If crushtool takes longer than mon lease and block the mon, an election will happen and the command will run again, indefinitely.

mon lease renew interval < mon lease < mon lease ack timeout
zhuzha:~/ceph/ceph/src% PATH=$(pwd):$PATH ./vstart.sh
zhuzha:~/ceph/ceph/src% cat > crushtool
#!/bin/sh
sleep 10
exit 0 # success
^D
zhuzha:~/ceph/ceph/src% ceph osd getcrushmap -o /tmp/map
got crush map from osdmap epoch 18
zhuzha:~/ceph/ceph/src% ceph osd setcrushmap -i /tmp/map

Hangs forever. In logs at that time:

zhuzha:~/ceph/ceph/src% fgrep 'preprocess_query mon_command({"prefix": "osd setcrushmap"}' out/mon.a.log |tail -5
2015-02-25 12:07:17.734768 7f9154604700 10 mon.a@0(leader).osd e23 preprocess_query mon_command({"prefix": "osd setcrushmap"} v 0) v1 from client.14102 172.18.128.29:0/1020924
2015-02-25 12:07:28.158111 7f9154604700 10 mon.a@0(leader).osd e23 preprocess_query mon_command({"prefix": "osd setcrushmap"} v 0) v1 from client.14102 172.18.128.29:0/1020924
2015-02-25 12:07:38.504739 7f9154604700 10 mon.a@0(leader).osd e23 preprocess_query mon_command({"prefix": "osd setcrushmap"} v 0) v1 from client.14102 172.18.128.29:0/1020924
2015-02-25 12:07:49.209307 7f9154604700 10 mon.a@0(leader).osd e24 preprocess_query mon_command({"prefix": "osd setcrushmap"} v 0) v1 from client.14102 172.18.128.29:0/1020924
2015-02-25 12:07:59.577932 7f9154604700 10 mon.a@0(leader).osd e24 preprocess_query mon_command({"prefix": "osd setcrushmap"} v 0) v1 from client.14102 172.18.128.29:0/1020924

Changing the following fixes the problem:
    Modified   src/vstart.sh
diff --git a/src/vstart.sh b/src/vstart.sh
index bf863dc..e1440e6 100755
--- a/src/vstart.sh
+++ b/src/vstart.sh
@@ -358,6 +358,10 @@ if [ "$start_mon" -eq 1 ]; then
         mon osd full ratio = .99
         mon data avail warn = 10
         mon data avail crit = 1
+        mon lease = 20
+        mon lease renew interval = 18
+        mon lease ack timeout = 40
         osd pool default erasure code directory = $EC_PATH
         osd pool default erasure code profile = plugin=jerasure technique=reed_sol_van k=2 m=1 ruleset-failure-domain=osd
         rgw frontends = fastcgi, civetweb port=$CEPH_RGW_PORT

Associated revisions

Revision 1e3f8148 (diff)
Added by Loic Dachary about 9 years ago

mon: ignore crushtool validation if too long

The crushtool is aborted if it takes more than mon lease seconds. Since
the monitor blocks while running it, this is mandatory otherwise the
monitor will be considered down and new elections triggered.

http://tracker.ceph.com/issues/10947 Fixes: #10947

Signed-off-by: Loic Dachary <>

History

#1 Updated by Loïc Dachary about 9 years ago

I can't reproduce it on https://github.com/ceph/ceph/tree/hammer at https://github.com/ceph/ceph/commit/29861b1b64540b6b09a735adf619335a48edfad8

2015-02-25 11:38:21.657533 7f757b7ae700 10 mon.a@0(leader).paxosservice(osdmap 1..10) dispatch mon_command({"prefix": "osd setcrushmap"} v 0) v1 from client.4110 127.0.0.1:0/1017648
2015-02-25 11:38:21.657539 7f757b7ae700  5 mon.a@0(leader).paxos(paxos active c 1..43) is_readable = 1 - now=2015-02-25 11:38:21.657540 lease_expire=0.000000 has v0 lc 43
2015-02-25 11:38:21.657554 7f757b7ae700 10 mon.a@0(leader).osd e10 preprocess_query mon_command({"prefix": "osd setcrushmap"} v 0) v1 from client.4110 127.0.0.1:0/1017648
2015-02-25 11:38:21.657630 7f757b7ae700  7 mon.a@0(leader).osd e10 prepare_update mon_command({"prefix": "osd setcrushmap"} v 0) v1 from client.4110 127.0.0.1:0/1017648
2015-02-25 11:38:21.657699 7f757b7ae700 10 mon.a@0(leader).osd e10 prepare_command setting new crush map
2015-02-25 11:38:21.657825 7f757b7ae700 10 mon.a@0(leader).osd e10  testing map
2015-02-25 11:38:31.660127 7f757b7ae700 10 mon.a@0(leader).osd e10  result 
2015-02-25 11:38:31.660321 7f757b7ae700 10 mon.a@0(leader).osd e10 should_propose
2015-02-25 11:38:31.660336 7f757b7ae700 10 mon.a@0(leader).paxosservice(osdmap 1..10)  setting proposal_timer 0x7f75680a2a50 with delay of 0.05
2015-02-25 11:38:31.660448 7f757b7ae700  1 -- 127.0.0.1:6789/0 <== mon.0 127.0.0.1:6789/0 0 ==== log(1 entries) v1 ==== 0+0+0 (0 0 0) 0x7f75680c37b0 con 0x4e4fc40
2015-02-25 11:38:31.660471 7f757bfff700 11 mon.a@0(leader) e1 tick
2015-02-25 11:38:31.660505 7f757bfff700 10 mon.a@0(leader).pg v15  clearing pg_map delta (28.924947 > 10 seconds old)
2015-02-25 11:38:31.660538 7f757bfff700 10 mon.a@0(leader).pg v15  clearing pg_map delta for pool 0 (28.924981 > 10 seconds old)
2015-02-25 11:38:31.660568 7f757bfff700 10 mon.a@0(leader).pg v15 v15: 8 pgs: 8 active+clean; 0 bytes data, 501 GB used, 46698 MB / 547 GB avail
2015-02-25 11:38:31.660610 7f757bfff700 10 mon.a@0(leader).osd e10 e10: 3 osds: 3 up, 3 in

#2 Updated by Loïc Dachary about 9 years ago

  • Status changed from 12 to Can't reproduce

Mykola, could you let me know the SHA I should use to reproduce the problem ?

#3 Updated by Loïc Dachary about 9 years ago

  • Status changed from Can't reproduce to In Progress

Nevermind, got it. My test was only using one monitor. Using three monitors triggers the problem.

#4 Updated by Loïc Dachary about 9 years ago

2015-02-25 11:53:43.981719 7ff318914700 10 mon.a@0(leader).paxosservice(osdmap 1..10) dispatch mon_command({"prefix": "osd setcrushmap"} v 0) v1 from client.4114 127.0.0.1:0/1018996
2015-02-25 11:53:43.981727 7ff318914700  5 mon.a@0(leader).paxos(paxos updating c 1..61) is_readable = 1 - now=2015-02-25 11:53:43.981727 lease_expire=2015-02-25 11:53:47.553936 has v0 lc 61
2015-02-25 11:53:43.981732 7ff318914700 10 mon.a@0(leader).osd e10 preprocess_query mon_command({"prefix": "osd setcrushmap"} v 0) v1 from client.4114 127.0.0.1:0/1018996
2015-02-25 11:53:43.981805 7ff318914700  7 mon.a@0(leader).osd e10 prepare_update mon_command({"prefix": "osd setcrushmap"} v 0) v1 from client.4114 127.0.0.1:0/1018996
2015-02-25 11:53:43.981868 7ff318914700 10 mon.a@0(leader).osd e10 prepare_command setting new crush map
2015-02-25 11:53:43.981986 7ff318914700 10 mon.a@0(leader).osd e10  testing map
2015-02-25 11:53:53.984626 7ff318914700 10 mon.a@0(leader).osd e10  result 
2015-02-25 11:53:53.984801 7ff318914700 10 mon.a@0(leader).osd e10 should_propose
2015-02-25 11:53:53.984817 7ff318914700 10 mon.a@0(leader).paxosservice(osdmap 1..10)  setting proposal_timer 0x7ff300148d50 with delay of 0.05
2015-02-25 11:53:53.984952 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.1 127.0.0.1:6790/0 206 ==== forward(log(1 entries) v1 caps allow * tid 36 con_features 0) to leader v3 ==== 587+0+0 (2713400823 0 0) 0x7ff2fc001b70 con 0x4236480
2015-02-25 11:53:53.984979 7ff319165700 11 mon.a@0(leader) e1 tick
2015-02-25 11:53:53.985008 7ff319165700 10 mon.a@0(leader).pg v16 v16: 8 pgs: 8 active+clean; 0 bytes data, 501 GB used, 46679 MB / 547 GB avail
2015-02-25 11:53:53.985063 7ff319165700 10 mon.a@0(leader).osd e10 e10: 3 osds: 3 up, 3 in
2015-02-25 11:53:53.985081 7ff319165700  5 mon.a@0(leader).paxos(paxos updating c 1..61) is_readable = 0 - now=2015-02-25 11:53:53.985082 lease_expire=2015-02-25 11:53:47.553936 has v0 lc 61
2015-02-25 11:53:53.985104 7ff319165700  5 mon.a@0(leader).paxos(paxos updating c 1..61) is_readable = 0 - now=2015-02-25 11:53:53.985108 lease_expire=2015-02-25 11:53:47.553936 has v0 lc 61
2015-02-25 11:53:53.985118 7ff319165700  5 mon.a@0(leader).paxos(paxos updating c 1..61) is_readable = 0 - now=2015-02-25 11:53:53.985119 lease_expire=2015-02-25 11:53:47.553936 has v0 lc 61
2015-02-25 11:53:53.985131 7ff319165700 10 mon.a@0(leader).auth v5 auth
2015-02-25 11:53:53.985140 7ff319165700 10 cephx keyserver: _check_rotating_secrets
2015-02-25 11:53:53.985156 7ff319165700 20 mon.a@0(leader) e1 sync_trim_providers
2015-02-25 11:53:53.985177 7ff319165700  1 mon.a@0(leader).paxos(paxos updating c 1..61) accept timeout, calling fresh election
2015-02-25 11:53:53.985189 7ff319165700 10 mon.a@0(leader) e1 bootstrap
2015-02-25 11:53:53.985193 7ff319165700 10 mon.a@0(leader) e1 sync_reset_requester
2015-02-25 11:53:53.985197 7ff319165700 10 mon.a@0(leader) e1 unregister_cluster_logger
2015-02-25 11:53:53.985208 7ff319165700 10 mon.a@0(leader) e1 cancel_probe_timeout (none scheduled)
2015-02-25 11:53:53.985213 7ff319165700 10 mon.a@0(probing) e1 _reset
2015-02-25 11:53:53.985217 7ff319165700 10 mon.a@0(probing) e1 cancel_probe_timeout (none scheduled)
2015-02-25 11:53:53.985220 7ff319165700 10 mon.a@0(probing) e1 timecheck_finish
2015-02-25 11:53:53.985245 7ff319165700 15 mon.a@0(probing) e1 health_tick_stop
2015-02-25 11:53:53.985260 7ff319165700 15 mon.a@0(probing) e1 health_interval_stop
2015-02-25 11:53:53.985271 7ff319165700 10 mon.a@0(probing) e1 scrub_reset
2015-02-25 11:53:53.985276 7ff319165700 10 mon.a@0(probing).paxos(paxos updating c 1..61) restart -- canceling timeouts
2015-02-25 11:53:53.985352 7ff319165700 10 mon.a@0(probing).paxosservice(pgmap 1..16) restart
2015-02-25 11:53:53.985374 7ff319165700 10 mon.a@0(probing).paxosservice(mdsmap 1..1) restart
2015-02-25 11:53:53.985378 7ff319165700 10 mon.a@0(probing).paxosservice(osdmap 1..10) restart
2015-02-25 11:53:53.985380 7ff319165700 10 mon.a@0(probing).paxosservice(osdmap 1..10)  canceling proposal_timer 0x7ff300148d50
2015-02-25 11:53:53.985392 7ff319165700 20 mon.a@0(probing) e1 have connection
2015-02-25 11:53:53.985397 7ff319165700 20 mon.a@0(probing) e1 ms_dispatch existing session MonSession: client.4114 :/0 is openallow * for client.4114 :/0
2015-02-25 11:53:53.985408 7ff319165700 20 mon.a@0(probing) e1  caps allow *
2015-02-25 11:53:53.985706 7ff319165700  0 mon.a@0(probing) e1 handle_command mon_command({"prefix": "osd setcrushmap"} v 0) v1
2015-02-25 11:53:53.985775 7ff319165700 20 is_capable service=osd command=osd setcrushmap read write on cap allow *
2015-02-25 11:53:53.985782 7ff319165700 20  allow so far , doing grant allow *
2015-02-25 11:53:53.985785 7ff319165700 20  allow all
2015-02-25 11:53:53.985787 7ff319165700 10 mon.a@0(probing) e1 _allowed_command capable
2015-02-25 11:53:53.985806 7ff319165700  0 log_channel(audit) log [INF] : from='client.4114 :/0' entity='client.admin' cmd=[{"prefix": "osd setcrushmap"}]: dispatch
2015-02-25 11:53:53.985869 7ff319165700  1 -- 127.0.0.1:6789/0 --> 127.0.0.1:6789/0 -- log(1 entries) v1 -- ?+0 0x7ff2ec0539f0 con 0x4225cc0
2015-02-25 11:53:53.985899 7ff319165700 10 mon.a@0(probing).paxosservice(osdmap 1..10) dispatch mon_command({"prefix": "osd setcrushmap"} v 0) v1 from client.4114 127.0.0.1:0/1018996
2015-02-25 11:53:53.985913 7ff319165700  5 mon.a@0(probing).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:53.985914 lease_expire=2015-02-25 11:53:47.553936 has v0 lc 61
2015-02-25 11:53:53.985927 7ff319165700 10 mon.a@0(probing).paxosservice(osdmap 1..10)  waiting for paxos -> readable (v0)
2015-02-25 11:53:53.985932 7ff319165700  5 mon.a@0(probing).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:53.985933 lease_expire=2015-02-25 11:53:47.553936 has v0 lc 61
2015-02-25 11:53:53.985956 7ff319165700 10 mon.a@0(probing).paxosservice(logm 1..37) restart
2015-02-25 11:53:53.985966 7ff319165700  7 mon.a@0(probing).log v37 _updated_log for osd.2 127.0.0.1:6813/18809
2015-02-25 11:53:53.985978 7ff319165700 15 mon.a@0(probing) e1 send_reply routing reply to 127.0.0.1:6813/18809 via 127.0.0.1:6791/0 for request log(1 entries) v1
2015-02-25 11:53:53.985991 7ff319165700  1 -- 127.0.0.1:6789/0 --> 127.0.0.1:6791/0 -- route(log(last 5) v1 tid 21) v2 -- ?+0 0x7ff2ec0543f0 con 0x423be50
2015-02-25 11:53:53.986035 7ff319165700 10 mon.a@0(probing).paxosservice(monmap 1..1) restart
2015-02-25 11:53:53.986040 7ff319165700 10 mon.a@0(probing).paxosservice(auth 1..5) restart
2015-02-25 11:53:53.986050 7ff319165700 10 mon.a@0(probing) e1 cancel_probe_timeout (none scheduled)
2015-02-25 11:53:53.986059 7ff319165700 10 mon.a@0(probing) e1 reset_probe_timeout 0x7ff2ec01b630 after 2 seconds
2015-02-25 11:53:53.986079 7ff319165700 10 mon.a@0(probing) e1 probing other monitors
2015-02-25 11:53:53.986086 7ff319165700  1 -- 127.0.0.1:6789/0 --> mon.1 127.0.0.1:6790/0 -- mon_probe(probe c98533db-e94a-4062-8b01-43319c52b048 name a) v6 -- ?+0 0x7ff2ec054bc0
2015-02-25 11:53:53.986125 7ff319165700  1 -- 127.0.0.1:6789/0 --> mon.2 127.0.0.1:6791/0 -- mon_probe(probe c98533db-e94a-4062-8b01-43319c52b048 name a) v6 -- ?+0 0x7ff2ec013000
2015-02-25 11:53:53.986178 7ff318914700 20 mon.a@0(probing) e1 have connection
2015-02-25 11:53:53.986194 7ff318914700 20 mon.a@0(probing) e1 ms_dispatch existing session MonSession: mon.1 127.0.0.1:6790/0 is openallow * for mon.1 127.0.0.1:6790/0
2015-02-25 11:53:53.986218 7ff318914700 20 mon.a@0(probing) e1  caps allow *
2015-02-25 11:53:53.986224 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:53.986228 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:53.986232 7ff318914700 20  allow all
2015-02-25 11:53:53.986236 7ff318914700 10 mon.a@0(probing) e1 received forwarded message from mon.1 127.0.0.1:6790/0 via mon.1 127.0.0.1:6790/0
2015-02-25 11:53:53.986247 7ff318914700 20 is_capable service=mon command= exec on cap allow *
2015-02-25 11:53:53.986251 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:53.986253 7ff318914700 20  allow all
2015-02-25 11:53:53.986267 7ff318914700 10 mon.a@0(probing) e1  caps are allow *
2015-02-25 11:53:53.986271 7ff318914700 10 mon.a@0(probing) e1  entity name 'mon.' type 1
2015-02-25 11:53:53.986276 7ff318914700 10 mon.a@0(probing) e1  mesg 0x7ff2fc002330 from 127.0.0.1:6790/0
2015-02-25 11:53:53.986289 7ff318914700 20 mon.a@0(probing) e1 have connection
2015-02-25 11:53:53.986292 7ff318914700 20 mon.a@0(probing) e1 ms_dispatch existing session MonSession: mon.1 :/0 is openallow * for mon.1 :/0
2015-02-25 11:53:53.986298 7ff318914700 20 mon.a@0(probing) e1  caps allow *
2015-02-25 11:53:53.986302 7ff318914700 10 mon.a@0(probing).paxosservice(logm 1..37) dispatch log(1 entries) v1 from mon.1 127.0.0.1:6790/0
2015-02-25 11:53:53.986315 7ff318914700  5 mon.a@0(probing).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:53.986316 lease_expire=2015-02-25 11:53:47.553936 has v0 lc 61
2015-02-25 11:53:53.986333 7ff318914700 10 mon.a@0(probing).paxosservice(logm 1..37)  waiting for paxos -> readable (v0)
2015-02-25 11:53:53.986338 7ff318914700  5 mon.a@0(probing).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:53.986339 lease_expire=2015-02-25 11:53:47.553936 has v0 lc 61
2015-02-25 11:53:53.986376 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.2 127.0.0.1:6791/0 190 ==== paxos(accept lc 61 fc 0 pn 300 opn 0) v3 ==== 80+0+0 (3635305193 0 0) 0x7ff2f8001190 con 0x423be50
2015-02-25 11:53:53.986397 7ff318914700 20 mon.a@0(probing) e1 have connection
2015-02-25 11:53:53.986400 7ff318914700 20 mon.a@0(probing) e1 ms_dispatch existing session MonSession: mon.2 127.0.0.1:6791/0 is openallow * for mon.2 127.0.0.1:6791/0
2015-02-25 11:53:53.986409 7ff318914700 20 mon.a@0(probing) e1  caps allow *
2015-02-25 11:53:53.986413 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:53.986416 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:53.986418 7ff318914700 20  allow all
2015-02-25 11:53:53.986421 7ff318914700 20 is_capable service=mon command= exec on cap allow *
2015-02-25 11:53:53.986424 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:53.986426 7ff318914700 20  allow all
2015-02-25 11:53:53.986428 7ff318914700  5 mon.a@0(probing).paxos(paxos recovering c 1..61) election in progress, dropping paxos(accept lc 61 fc 0 pn 300 opn 0) v3
2015-02-25 11:53:53.986449 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.0 127.0.0.1:6789/0 0 ==== log(1 entries) v1 ==== 0+0+0 (0 0 0) 0x7ff30004b5a0 con 0x4225cc0
2015-02-25 11:53:53.986462 7ff318914700 20 mon.a@0(probing) e1 have connection
2015-02-25 11:53:53.986465 7ff318914700 20 mon.a@0(probing) e1 ms_dispatch existing session MonSession: mon.0 127.0.0.1:6789/0 is openallow * for mon.0 127.0.0.1:6789/0
2015-02-25 11:53:53.986474 7ff318914700 20 mon.a@0(probing) e1  caps allow *
2015-02-25 11:53:53.986477 7ff318914700 10 mon.a@0(probing).paxosservice(logm 1..37) dispatch log(1 entries) v1 from mon.0 127.0.0.1:6789/0
2015-02-25 11:53:53.986486 7ff318914700  5 mon.a@0(probing).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:53.986490 lease_expire=2015-02-25 11:53:47.553936 has v0 lc 61
2015-02-25 11:53:53.986500 7ff318914700 10 mon.a@0(probing).paxosservice(logm 1..37)  waiting for paxos -> readable (v0)
2015-02-25 11:53:53.986503 7ff318914700  5 mon.a@0(probing).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:53.986505 lease_expire=2015-02-25 11:53:47.553936 has v0 lc 61
2015-02-25 11:53:53.986520 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.1 127.0.0.1:6790/0 207 ==== paxos(accept lc 61 fc 0 pn 300 opn 0) v3 ==== 80+0+0 (2503014196 0 0) 0x7ff2fc002600 con 0x4236480
2015-02-25 11:53:53.986534 7ff318914700 20 mon.a@0(probing) e1 have connection
2015-02-25 11:53:53.986537 7ff318914700 20 mon.a@0(probing) e1 ms_dispatch existing session MonSession: mon.1 127.0.0.1:6790/0 is openallow * for mon.1 127.0.0.1:6790/0
2015-02-25 11:53:53.986546 7ff318914700 20 mon.a@0(probing) e1  caps allow *
2015-02-25 11:53:53.986549 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:53.986553 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:53.986557 7ff318914700 20  allow all
2015-02-25 11:53:53.986561 7ff318914700 20 is_capable service=mon command= exec on cap allow *
2015-02-25 11:53:53.986564 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:53.986566 7ff318914700 20  allow all
2015-02-25 11:53:53.986567 7ff318914700  5 mon.a@0(probing).paxos(paxos recovering c 1..61) election in progress, dropping paxos(accept lc 61 fc 0 pn 300 opn 0) v3
2015-02-25 11:53:53.986584 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.2 127.0.0.1:6791/0 191 ==== mon_probe(probe c98533db-e94a-4062-8b01-43319c52b048 name c) v6 ==== 58+0+0 (1529458076 0 0) 0x7ff2f8001a90 con 0x423be50
2015-02-25 11:53:53.986604 7ff318914700 20 mon.a@0(probing) e1 have connection
2015-02-25 11:53:53.986607 7ff318914700 20 mon.a@0(probing) e1 ms_dispatch existing session MonSession: mon.2 127.0.0.1:6791/0 is openallow * for mon.2 127.0.0.1:6791/0
2015-02-25 11:53:53.986615 7ff318914700 20 mon.a@0(probing) e1  caps allow *
2015-02-25 11:53:53.986619 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:53.986622 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:53.986624 7ff318914700 20  allow all
2015-02-25 11:53:53.986626 7ff318914700 10 mon.a@0(probing) e1 handle_probe mon_probe(probe c98533db-e94a-4062-8b01-43319c52b048 name c) v6
2015-02-25 11:53:53.986634 7ff318914700 10 mon.a@0(probing) e1 handle_probe_probe mon.2 127.0.0.1:6791/0mon_probe(probe c98533db-e94a-4062-8b01-43319c52b048 name c) v6 features 1125899906842623
2015-02-25 11:53:53.986705 7ff318914700  1 -- 127.0.0.1:6789/0 --> 127.0.0.1:6791/0 -- mon_probe(reply c98533db-e94a-4062-8b01-43319c52b048 name a paxos( fc 1 lc 61 )) v6 -- ?+0 0x7ff3000649d0 con 0x423be50
2015-02-25 11:53:53.986777 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.0 127.0.0.1:6789/0 0 ==== log(1 entries) v1 ==== 0+0+0 (0 0 0) 0x7ff2ec0539f0 con 0x4225cc0
2015-02-25 11:53:53.986809 7ff318914700 20 mon.a@0(probing) e1 have connection
2015-02-25 11:53:53.986816 7ff318914700 20 mon.a@0(probing) e1 ms_dispatch existing session MonSession: mon.0 127.0.0.1:6789/0 is openallow * for mon.0 127.0.0.1:6789/0
2015-02-25 11:53:53.986833 7ff318914700 20 mon.a@0(probing) e1  caps allow *
2015-02-25 11:53:53.986839 7ff318914700 10 mon.a@0(probing).paxosservice(logm 1..37) dispatch log(1 entries) v1 from mon.0 127.0.0.1:6789/0
2015-02-25 11:53:53.986857 7ff318914700  5 mon.a@0(probing).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:53.986859 lease_expire=2015-02-25 11:53:47.553936 has v0 lc 61
2015-02-25 11:53:53.986886 7ff318914700 10 mon.a@0(probing).paxosservice(logm 1..37)  waiting for paxos -> readable (v0)
2015-02-25 11:53:53.986896 7ff318914700  5 mon.a@0(probing).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:53.986898 lease_expire=2015-02-25 11:53:47.553936 has v0 lc 61
2015-02-25 11:53:53.986935 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.1 127.0.0.1:6790/0 208 ==== mon_probe(probe c98533db-e94a-4062-8b01-43319c52b048 name b) v6 ==== 58+0+0 (635250253 0 0) 0x7ff2fc002890 con 0x4236480
2015-02-25 11:53:53.986962 7ff318914700 20 mon.a@0(probing) e1 have connection
2015-02-25 11:53:53.986965 7ff318914700 20 mon.a@0(probing) e1 ms_dispatch existing session MonSession: mon.1 127.0.0.1:6790/0 is openallow * for mon.1 127.0.0.1:6790/0
2015-02-25 11:53:53.986974 7ff318914700 20 mon.a@0(probing) e1  caps allow *
2015-02-25 11:53:53.986979 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:53.986982 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:53.986984 7ff318914700 20  allow all
2015-02-25 11:53:53.986987 7ff318914700 10 mon.a@0(probing) e1 handle_probe mon_probe(probe c98533db-e94a-4062-8b01-43319c52b048 name b) v6
2015-02-25 11:53:53.986995 7ff318914700 10 mon.a@0(probing) e1 handle_probe_probe mon.1 127.0.0.1:6790/0mon_probe(probe c98533db-e94a-4062-8b01-43319c52b048 name b) v6 features 1125899906842623
2015-02-25 11:53:53.987052 7ff318914700  1 -- 127.0.0.1:6789/0 --> 127.0.0.1:6790/0 -- mon_probe(reply c98533db-e94a-4062-8b01-43319c52b048 name a paxos( fc 1 lc 61 )) v6 -- ?+0 0x7ff300042fa0 con 0x4236480
2015-02-25 11:53:53.987115 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.2 127.0.0.1:6791/0 192 ==== election(c98533db-e94a-4062-8b01-43319c52b048 propose 7) v5 ==== 529+0+0 (3021517391 0 0) 0x7ff2f8001fe0 con 0x423be50
2015-02-25 11:53:53.987154 7ff318914700 20 mon.a@0(probing) e1 have connection
2015-02-25 11:53:53.987160 7ff318914700 20 mon.a@0(probing) e1 ms_dispatch existing session MonSession: mon.2 127.0.0.1:6791/0 is openallow * for mon.2 127.0.0.1:6791/0
2015-02-25 11:53:53.987177 7ff318914700 20 mon.a@0(probing) e1  caps allow *
2015-02-25 11:53:53.987188 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:53.987194 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:53.987198 7ff318914700 20  allow all
2015-02-25 11:53:53.987203 7ff318914700 20 is_capable service=mon command= exec on cap allow *
2015-02-25 11:53:53.987208 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:53.987211 7ff318914700 20  allow all
2015-02-25 11:53:53.987239 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.1 127.0.0.1:6790/0 209 ==== election(c98533db-e94a-4062-8b01-43319c52b048 propose 7) v5 ==== 529+0+0 (3021517391 0 0) 0x7ff2fc002e20 con 0x4236480
2015-02-25 11:53:53.987274 7ff318914700 20 mon.a@0(probing) e1 have connection
2015-02-25 11:53:53.987280 7ff318914700 20 mon.a@0(probing) e1 ms_dispatch existing session MonSession: mon.1 127.0.0.1:6790/0 is openallow * for mon.1 127.0.0.1:6790/0
2015-02-25 11:53:53.987297 7ff318914700 20 mon.a@0(probing) e1  caps allow *
2015-02-25 11:53:53.987304 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:53.987309 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:53.987312 7ff318914700 20  allow all
2015-02-25 11:53:53.987317 7ff318914700 20 is_capable service=mon command= exec on cap allow *
2015-02-25 11:53:53.987322 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:53.987325 7ff318914700 20  allow all
2015-02-25 11:53:53.987350 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.1 127.0.0.1:6790/0 210 ==== mon_probe(reply c98533db-e94a-4062-8b01-43319c52b048 name b paxos( fc 1 lc 61 )) v6 ==== 527+0+0 (1101047061 0 0) 0x7ff2fc001170 con 0x4236480
2015-02-25 11:53:53.987387 7ff318914700 20 mon.a@0(probing) e1 have connection
2015-02-25 11:53:53.987393 7ff318914700 20 mon.a@0(probing) e1 ms_dispatch existing session MonSession: mon.1 127.0.0.1:6790/0 is openallow * for mon.1 127.0.0.1:6790/0
2015-02-25 11:53:53.987410 7ff318914700 20 mon.a@0(probing) e1  caps allow *
2015-02-25 11:53:53.987417 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:53.987421 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:53.987425 7ff318914700 20  allow all
2015-02-25 11:53:53.987427 7ff318914700 10 mon.a@0(probing) e1 handle_probe mon_probe(reply c98533db-e94a-4062-8b01-43319c52b048 name b paxos( fc 1 lc 61 )) v6
2015-02-25 11:53:53.987441 7ff318914700 10 mon.a@0(probing) e1 handle_probe_reply mon.1 127.0.0.1:6790/0mon_probe(reply c98533db-e94a-4062-8b01-43319c52b048 name b paxos( fc 1 lc 61 )) v6
2015-02-25 11:53:53.987452 7ff318914700 10 mon.a@0(probing) e1  monmap is e1: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,c=127.0.0.1:6791/0}
2015-02-25 11:53:53.987523 7ff318914700 10 mon.a@0(probing) e1  peer name is b
2015-02-25 11:53:53.987533 7ff318914700 10 mon.a@0(probing) e1  mon.b is outside the quorum
2015-02-25 11:53:53.987542 7ff318914700 10 mon.a@0(probing) e1  outside_quorum now a,b, need 2
2015-02-25 11:53:53.987549 7ff318914700 10 mon.a@0(probing) e1  that's enough to form a new quorum, calling election
2015-02-25 11:53:53.987552 7ff318914700 10 mon.a@0(probing) e1 start_election
2015-02-25 11:53:53.987556 7ff318914700 10 mon.a@0(probing) e1 _reset
2015-02-25 11:53:53.987560 7ff318914700 10 mon.a@0(probing) e1 cancel_probe_timeout 0x7ff2ec01b630
2015-02-25 11:53:53.987568 7ff318914700 10 mon.a@0(probing) e1 timecheck_finish
2015-02-25 11:53:53.987573 7ff318914700 15 mon.a@0(probing) e1 health_tick_stop
2015-02-25 11:53:53.987576 7ff318914700 15 mon.a@0(probing) e1 health_interval_stop
2015-02-25 11:53:53.987582 7ff318914700 10 mon.a@0(probing) e1 scrub_reset
2015-02-25 11:53:53.987586 7ff318914700 10 mon.a@0(probing).paxos(paxos recovering c 1..61) restart -- canceling timeouts
2015-02-25 11:53:53.987596 7ff318914700 10 mon.a@0(probing).paxosservice(pgmap 1..16) restart
2015-02-25 11:53:53.987601 7ff318914700 10 mon.a@0(probing).paxosservice(mdsmap 1..1) restart
2015-02-25 11:53:53.987605 7ff318914700 10 mon.a@0(probing).paxosservice(osdmap 1..10) restart
2015-02-25 11:53:53.987608 7ff318914700 10 mon.a@0(probing).paxosservice(logm 1..37) restart
2015-02-25 11:53:53.987611 7ff318914700 10 mon.a@0(probing).paxosservice(monmap 1..1) restart
2015-02-25 11:53:53.987614 7ff318914700 10 mon.a@0(probing).paxosservice(auth 1..5) restart
2015-02-25 11:53:53.987619 7ff318914700 10 mon.a@0(electing) e1 cancel_probe_timeout (none scheduled)
2015-02-25 11:53:53.987631 7ff318914700  0 log_channel(cluster) log [INF] : mon.a calling new monitor election
2015-02-25 11:53:53.987679 7ff318914700  1 -- 127.0.0.1:6789/0 --> 127.0.0.1:6789/0 -- log(1 entries) v1 -- ?+0 0x7ff300053bb0 con 0x4225cc0
2015-02-25 11:53:53.987780 7ff318914700  5 mon.a@0(electing).elector(6) start -- can i be leader?
2015-02-25 11:53:53.987848 7ff318914700  1 mon.a@0(electing).elector(6) init, last seen epoch 6
2015-02-25 11:53:53.987852 7ff318914700 10 mon.a@0(electing).elector(6) bump_epoch 6 to 7
2015-02-25 11:53:54.020000 7ff318914700 10 mon.a@0(electing) e1 join_election
2015-02-25 11:53:54.020026 7ff318914700 10 mon.a@0(electing) e1 _reset
2015-02-25 11:53:54.020029 7ff318914700 10 mon.a@0(electing) e1 cancel_probe_timeout (none scheduled)
2015-02-25 11:53:54.020031 7ff318914700 10 mon.a@0(electing) e1 timecheck_finish
2015-02-25 11:53:54.020034 7ff318914700 15 mon.a@0(electing) e1 health_tick_stop
2015-02-25 11:53:54.020036 7ff318914700 15 mon.a@0(electing) e1 health_interval_stop
2015-02-25 11:53:54.020039 7ff318914700 10 mon.a@0(electing) e1 scrub_reset
2015-02-25 11:53:54.020041 7ff318914700 10 mon.a@0(electing).paxos(paxos recovering c 1..61) restart -- canceling timeouts
2015-02-25 11:53:54.020050 7ff318914700 10 mon.a@0(electing).paxosservice(pgmap 1..16) restart
2015-02-25 11:53:54.020054 7ff318914700 10 mon.a@0(electing).paxosservice(mdsmap 1..1) restart
2015-02-25 11:53:54.020056 7ff318914700 10 mon.a@0(electing).paxosservice(osdmap 1..10) restart
2015-02-25 11:53:54.020057 7ff318914700 10 mon.a@0(electing).paxosservice(logm 1..37) restart
2015-02-25 11:53:54.020059 7ff318914700 10 mon.a@0(electing).paxosservice(monmap 1..1) restart
2015-02-25 11:53:54.020060 7ff318914700 10 mon.a@0(electing).paxosservice(auth 1..5) restart
2015-02-25 11:53:54.020109 7ff318914700  1 -- 127.0.0.1:6789/0 --> mon.1 127.0.0.1:6790/0 -- election(c98533db-e94a-4062-8b01-43319c52b048 propose 7) v5 -- ?+0 0x7ff300058f50
2015-02-25 11:53:54.020146 7ff318914700  1 -- 127.0.0.1:6789/0 --> mon.2 127.0.0.1:6791/0 -- election(c98533db-e94a-4062-8b01-43319c52b048 propose 7) v5 -- ?+0 0x7ff300059210
2015-02-25 11:53:54.020181 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.2 127.0.0.1:6791/0 193 ==== mon_probe(reply c98533db-e94a-4062-8b01-43319c52b048 name c paxos( fc 1 lc 61 )) v6 ==== 527+0+0 (3748692843 0 0) 0x7ff2f8001820 con 0x423be50
2015-02-25 11:53:54.020195 7ff318914700 20 mon.a@0(electing) e1 have connection
2015-02-25 11:53:54.020198 7ff318914700 20 mon.a@0(electing) e1 ms_dispatch existing session MonSession: mon.2 127.0.0.1:6791/0 is openallow * for mon.2 127.0.0.1:6791/0
2015-02-25 11:53:54.020203 7ff318914700 20 mon.a@0(electing) e1  caps allow *
2015-02-25 11:53:54.020207 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:54.020209 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.020210 7ff318914700 20  allow all
2015-02-25 11:53:54.020212 7ff318914700 10 mon.a@0(electing) e1 handle_probe mon_probe(reply c98533db-e94a-4062-8b01-43319c52b048 name c paxos( fc 1 lc 61 )) v6
2015-02-25 11:53:54.020216 7ff318914700 10 mon.a@0(electing) e1 handle_probe_reply mon.2 127.0.0.1:6791/0mon_probe(reply c98533db-e94a-4062-8b01-43319c52b048 name c paxos( fc 1 lc 61 )) v6
2015-02-25 11:53:54.020220 7ff318914700 10 mon.a@0(electing) e1  monmap is e1: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,c=127.0.0.1:6791/0}
2015-02-25 11:53:54.020239 7ff318914700 10 mon.a@0(electing) e1  peer name is c
2015-02-25 11:53:54.020255 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.0 127.0.0.1:6789/0 0 ==== log(1 entries) v1 ==== 0+0+0 (0 0 0) 0x7ff300053bb0 con 0x4225cc0
2015-02-25 11:53:54.020268 7ff318914700 20 mon.a@0(electing) e1 have connection
2015-02-25 11:53:54.020271 7ff318914700 20 mon.a@0(electing) e1 ms_dispatch existing session MonSession: mon.0 127.0.0.1:6789/0 is openallow * for mon.0 127.0.0.1:6789/0
2015-02-25 11:53:54.020275 7ff318914700 20 mon.a@0(electing) e1  caps allow *
2015-02-25 11:53:54.020276 7ff318914700 10 mon.a@0(electing).paxosservice(logm 1..37) dispatch log(1 entries) v1 from mon.0 127.0.0.1:6789/0
2015-02-25 11:53:54.020281 7ff318914700  5 mon.a@0(electing).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:54.020281 lease_expire=2015-02-25 11:53:47.553936 has v0 lc 61
2015-02-25 11:53:54.020289 7ff318914700 10 mon.a@0(electing).paxosservice(logm 1..37)  waiting for paxos -> readable (v0)
2015-02-25 11:53:54.020291 7ff318914700  5 mon.a@0(electing).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:54.020292 lease_expire=2015-02-25 11:53:47.553936 has v0 lc 61
2015-02-25 11:53:54.020622 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.2 127.0.0.1:6791/0 194 ==== election(c98533db-e94a-4062-8b01-43319c52b048 ack 7) v5 ==== 26058+0+0 (108993628 0 0) 0x7ff2f80008c0 con 0x423be50
2015-02-25 11:53:54.020650 7ff318914700 20 mon.a@0(electing) e1 have connection
2015-02-25 11:53:54.020653 7ff318914700 20 mon.a@0(electing) e1 ms_dispatch existing session MonSession: mon.2 127.0.0.1:6791/0 is openallow * for mon.2 127.0.0.1:6791/0
2015-02-25 11:53:54.020658 7ff318914700 20 mon.a@0(electing) e1  caps allow *
2015-02-25 11:53:54.020660 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:54.020663 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.020664 7ff318914700 20  allow all
2015-02-25 11:53:54.020665 7ff318914700 20 is_capable service=mon command= exec on cap allow *
2015-02-25 11:53:54.020666 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.020667 7ff318914700 20  allow all
2015-02-25 11:53:54.020687 7ff318914700  5 mon.a@0(electing).elector(7) handle_ack from mon.2
2015-02-25 11:53:54.020691 7ff318914700  5 mon.a@0(electing).elector(7)  so far i have {0=1125899906842623,2=1125899906842623}
2015-02-25 11:53:54.020705 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.1 127.0.0.1:6790/0 211 ==== election(c98533db-e94a-4062-8b01-43319c52b048 ack 7) v5 ==== 26058+0+0 (108993628 0 0) 0x7ff2fc001b70 con 0x4236480
2015-02-25 11:53:54.020714 7ff318914700 20 mon.a@0(electing) e1 have connection
2015-02-25 11:53:54.020715 7ff318914700 20 mon.a@0(electing) e1 ms_dispatch existing session MonSession: mon.1 127.0.0.1:6790/0 is openallow * for mon.1 127.0.0.1:6790/0
2015-02-25 11:53:54.020719 7ff318914700 20 mon.a@0(electing) e1  caps allow *
2015-02-25 11:53:54.020721 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:54.020722 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.020723 7ff318914700 20  allow all
2015-02-25 11:53:54.020724 7ff318914700 20 is_capable service=mon command= exec on cap allow *
2015-02-25 11:53:54.020725 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.020726 7ff318914700 20  allow all
2015-02-25 11:53:54.020739 7ff318914700  5 mon.a@0(electing).elector(7) handle_ack from mon.1
2015-02-25 11:53:54.020742 7ff318914700  5 mon.a@0(electing).elector(7)  so far i have {0=1125899906842623,1=1125899906842623,2=1125899906842623}
2015-02-25 11:53:54.020750 7ff318914700 10 mon.a@0(electing).elector(7) bump_epoch 7 to 8
2015-02-25 11:53:54.053558 7ff318914700 10 mon.a@0(electing) e1 join_election
2015-02-25 11:53:54.053594 7ff318914700 10 mon.a@0(electing) e1 _reset
2015-02-25 11:53:54.053598 7ff318914700 10 mon.a@0(electing) e1 cancel_probe_timeout (none scheduled)
2015-02-25 11:53:54.053601 7ff318914700 10 mon.a@0(electing) e1 timecheck_finish
2015-02-25 11:53:54.053605 7ff318914700 15 mon.a@0(electing) e1 health_tick_stop
2015-02-25 11:53:54.053608 7ff318914700 15 mon.a@0(electing) e1 health_interval_stop
2015-02-25 11:53:54.053613 7ff318914700 10 mon.a@0(electing) e1 scrub_reset
2015-02-25 11:53:54.053617 7ff318914700 10 mon.a@0(electing).paxos(paxos recovering c 1..61) restart -- canceling timeouts
2015-02-25 11:53:54.053629 7ff318914700 10 mon.a@0(electing).paxosservice(pgmap 1..16) restart
2015-02-25 11:53:54.053637 7ff318914700 10 mon.a@0(electing).paxosservice(mdsmap 1..1) restart
2015-02-25 11:53:54.053640 7ff318914700 10 mon.a@0(electing).paxosservice(osdmap 1..10) restart
2015-02-25 11:53:54.053642 7ff318914700 10 mon.a@0(electing).paxosservice(logm 1..37) restart
2015-02-25 11:53:54.053644 7ff318914700 10 mon.a@0(electing).paxosservice(monmap 1..1) restart
2015-02-25 11:53:54.053647 7ff318914700 10 mon.a@0(electing).paxosservice(auth 1..5) restart
2015-02-25 11:53:54.053732 7ff318914700  1 -- 127.0.0.1:6789/0 --> mon.1 127.0.0.1:6790/0 -- election(c98533db-e94a-4062-8b01-43319c52b048 victory 8) v5 -- ?+0 0x7ff30005b4f0
2015-02-25 11:53:54.053802 7ff318914700  1 -- 127.0.0.1:6789/0 --> mon.2 127.0.0.1:6791/0 -- election(c98533db-e94a-4062-8b01-43319c52b048 victory 8) v5 -- ?+0 0x7ff30018f460
2015-02-25 11:53:54.053825 7ff318914700 10 mon.a@0(electing) e1 win_election epoch 8 quorum 0,1,2 features 1125899906842623
2015-02-25 11:53:54.053843 7ff318914700  0 log_channel(cluster) log [INF] : mon.a@0 won leader election with quorum 0,1,2
2015-02-25 11:53:54.053883 7ff318914700  1 -- 127.0.0.1:6789/0 --> 127.0.0.1:6789/0 -- log(1 entries) v1 -- ?+0 0x7ff300070690 con 0x4225cc0
2015-02-25 11:53:54.053906 7ff318914700 10 mon.a@0(leader).paxos(paxos recovering c 1..61) leader_init -- starting paxos recovery
2015-02-25 11:53:54.054068 7ff318914700 10 mon.a@0(leader).paxos(paxos recovering c 1..61) learned uncommitted 62 pn 300 (13530 bytes) from myself
2015-02-25 11:53:54.112165 7ff318914700 10 mon.a@0(leader).paxos(paxos recovering c 1..61) get_new_proposal_number = 400
2015-02-25 11:53:54.112208 7ff318914700 10 mon.a@0(leader).paxos(paxos recovering c 1..61) collect with pn 400
2015-02-25 11:53:54.112225 7ff318914700  1 -- 127.0.0.1:6789/0 --> mon.1 127.0.0.1:6790/0 -- paxos(collect lc 61 fc 1 pn 400 opn 0) v3 -- ?+0 0x7ff3000709a0
2015-02-25 11:53:54.112272 7ff318914700  1 -- 127.0.0.1:6789/0 --> mon.2 127.0.0.1:6791/0 -- paxos(collect lc 61 fc 1 pn 400 opn 0) v3 -- ?+0 0x7ff300070d00
2015-02-25 11:53:54.112306 7ff318914700 10 mon.a@0(leader).paxosservice(monmap 1..1) election_finished
2015-02-25 11:53:54.112314 7ff318914700 10 mon.a@0(leader).paxosservice(monmap 1..1) _active - not active
2015-02-25 11:53:54.112322 7ff318914700 10 mon.a@0(leader).paxosservice(pgmap 1..16) election_finished
2015-02-25 11:53:54.112325 7ff318914700 10 mon.a@0(leader).paxosservice(pgmap 1..16) _active - not active
2015-02-25 11:53:54.112330 7ff318914700 10 mon.a@0(leader).paxosservice(mdsmap 1..1) election_finished
2015-02-25 11:53:54.112335 7ff318914700 10 mon.a@0(leader).paxosservice(mdsmap 1..1) _active - not active
2015-02-25 11:53:54.112344 7ff318914700 10 mon.a@0(leader).paxosservice(osdmap 1..10) election_finished
2015-02-25 11:53:54.112348 7ff318914700 10 mon.a@0(leader).paxosservice(osdmap 1..10) _active - not active
2015-02-25 11:53:54.112355 7ff318914700 10 mon.a@0(leader).paxosservice(logm 1..37) election_finished
2015-02-25 11:53:54.112358 7ff318914700 10 mon.a@0(leader).paxosservice(logm 1..37) _active - not active
2015-02-25 11:53:54.112362 7ff318914700 10 mon.a@0(leader).paxosservice(auth 1..5) election_finished
2015-02-25 11:53:54.112366 7ff318914700 10 mon.a@0(leader).paxosservice(auth 1..5) _active - not active
2015-02-25 11:53:54.112371 7ff318914700 10 mon.a@0(leader).data_health(8) start_epoch epoch 8
2015-02-25 11:53:54.112399 7ff318914700 10 mon.a@0(leader) e1 timecheck_finish
2015-02-25 11:53:54.112406 7ff318914700 10 mon.a@0(leader) e1 resend_routed_requests
2015-02-25 11:53:54.112413 7ff318914700 10 mon.a@0(leader) e1 register_cluster_logger
2015-02-25 11:53:54.112430 7ff318914700 10 mon.a@0(leader) e1 timecheck_start
2015-02-25 11:53:54.112435 7ff318914700 10 mon.a@0(leader) e1 timecheck_start_round curr 0
2015-02-25 11:53:54.112441 7ff318914700 10 mon.a@0(leader) e1 timecheck_start_round new 1
2015-02-25 11:53:54.112445 7ff318914700 10 mon.a@0(leader) e1 timecheck
2015-02-25 11:53:54.112448 7ff318914700 10 mon.a@0(leader) e1 timecheck start timecheck epoch 8 round 1
2015-02-25 11:53:54.112469 7ff318914700 10 mon.a@0(leader) e1 timecheck send time_check( ping e 8 r 1 ) v1 to mon.1 127.0.0.1:6790/0
2015-02-25 11:53:54.112486 7ff318914700  1 -- 127.0.0.1:6789/0 --> mon.1 127.0.0.1:6790/0 -- time_check( ping e 8 r 1 ) v1 -- ?+0 0x7ff300191010
2015-02-25 11:53:54.112521 7ff318914700 10 mon.a@0(leader) e1 timecheck send time_check( ping e 8 r 1 ) v1 to mon.2 127.0.0.1:6791/0
2015-02-25 11:53:54.112533 7ff318914700  1 -- 127.0.0.1:6789/0 --> mon.2 127.0.0.1:6791/0 -- time_check( ping e 8 r 1 ) v1 -- ?+0 0x7ff30018bfa0
2015-02-25 11:53:54.112563 7ff318914700 10 mon.a@0(leader) e1 timecheck_start_round setting up next event
2015-02-25 11:53:54.112577 7ff318914700 15 mon.a@0(leader) e1 health_tick_start
2015-02-25 11:53:54.112581 7ff318914700 15 mon.a@0(leader) e1 health_tick_stop
2015-02-25 11:53:54.112589 7ff318914700 10 mon.a@0(leader) e1 do_health_to_clog_interval
2015-02-25 11:53:54.112592 7ff318914700 10 mon.a@0(leader) e1 do_health_to_clog (force)
2015-02-25 11:53:54.112661 7ff318914700 10 mon.a@0(leader).data_health(8) get_health
2015-02-25 11:53:54.112691 7ff318914700  0 log_channel(cluster) log [INF] : HEALTH_WARN; 3 near full osd(s)
2015-02-25 11:53:54.112732 7ff318914700  1 -- 127.0.0.1:6789/0 --> 127.0.0.1:6789/0 -- log(1 entries) v1 -- ?+0 0x7ff30018dfb0 con 0x4225cc0
2015-02-25 11:53:54.112758 7ff318914700 15 mon.a@0(leader) e1 health_interval_start
2015-02-25 11:53:54.112762 7ff318914700 15 mon.a@0(leader) e1 health_interval_stop
2015-02-25 11:53:54.112765 7ff318914700 20 mon.a@0(leader) e1 health_interval_calc_next_update now: 2015-02-25 11:53:54.112764, next: 2015-02-25 12:00:00.000000, interval: 3600
2015-02-25 11:53:54.112811 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.0 127.0.0.1:6789/0 0 ==== log(1 entries) v1 ==== 0+0+0 (0 0 0) 0x7ff300070690 con 0x4225cc0
2015-02-25 11:53:54.112827 7ff318914700 20 mon.a@0(leader) e1 have connection
2015-02-25 11:53:54.112830 7ff318914700 20 mon.a@0(leader) e1 ms_dispatch existing session MonSession: mon.0 127.0.0.1:6789/0 is openallow * for mon.0 127.0.0.1:6789/0
2015-02-25 11:53:54.112840 7ff318914700 20 mon.a@0(leader) e1  caps allow *
2015-02-25 11:53:54.112843 7ff318914700 10 mon.a@0(leader).paxosservice(logm 1..37) dispatch log(1 entries) v1 from mon.0 127.0.0.1:6789/0
2015-02-25 11:53:54.112852 7ff318914700  5 mon.a@0(leader).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:54.112853 lease_expire=0.000000 has v0 lc 61
2015-02-25 11:53:54.112862 7ff318914700 10 mon.a@0(leader).paxosservice(logm 1..37)  waiting for paxos -> readable (v0)
2015-02-25 11:53:54.112866 7ff318914700  5 mon.a@0(leader).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:54.112867 lease_expire=0.000000 has v0 lc 61
2015-02-25 11:53:54.113034 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.0 127.0.0.1:6789/0 0 ==== log(1 entries) v1 ==== 0+0+0 (0 0 0) 0x7ff30018dfb0 con 0x4225cc0
2015-02-25 11:53:54.113065 7ff318914700 20 mon.a@0(leader) e1 have connection
2015-02-25 11:53:54.113069 7ff318914700 20 mon.a@0(leader) e1 ms_dispatch existing session MonSession: mon.0 127.0.0.1:6789/0 is openallow * for mon.0 127.0.0.1:6789/0
2015-02-25 11:53:54.113078 7ff318914700 20 mon.a@0(leader) e1  caps allow *
2015-02-25 11:53:54.113081 7ff318914700 10 mon.a@0(leader).paxosservice(logm 1..37) dispatch log(1 entries) v1 from mon.0 127.0.0.1:6789/0
2015-02-25 11:53:54.113090 7ff318914700  5 mon.a@0(leader).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:54.113091 lease_expire=0.000000 has v0 lc 61
2015-02-25 11:53:54.113104 7ff318914700 10 mon.a@0(leader).paxosservice(logm 1..37)  waiting for paxos -> readable (v0)
2015-02-25 11:53:54.113137 7ff318914700  5 mon.a@0(leader).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:54.113139 lease_expire=0.000000 has v0 lc 61
2015-02-25 11:53:54.113335 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.2 127.0.0.1:6791/0 195 ==== forward(osd_boot(osd.2 booted 0 features 1125899906842623 v8) v6 caps allow * tid 13 con_features 1125899906842623) to leader v3 ==== 1978+0+0 (4197059272 0 0) 0x7ff2f80008c0 con 0x423be50
2015-02-25 11:53:54.113382 7ff318914700 20 mon.a@0(leader) e1 have connection
2015-02-25 11:53:54.113387 7ff318914700 20 mon.a@0(leader) e1 ms_dispatch existing session MonSession: mon.2 127.0.0.1:6791/0 is openallow * for mon.2 127.0.0.1:6791/0
2015-02-25 11:53:54.113395 7ff318914700 20 mon.a@0(leader) e1  caps allow *
2015-02-25 11:53:54.113402 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:54.113406 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.113410 7ff318914700 20  allow all
2015-02-25 11:53:54.113413 7ff318914700 10 mon.a@0(leader) e1 received forwarded message from osd.2 127.0.0.1:6813/18809 via mon.2 127.0.0.1:6791/0
2015-02-25 11:53:54.113420 7ff318914700 20 is_capable service=mon command= exec on cap allow *
2015-02-25 11:53:54.113423 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.113424 7ff318914700 20  allow all
2015-02-25 11:53:54.113433 7ff318914700 10 mon.a@0(leader) e1  caps are allow *
2015-02-25 11:53:54.113436 7ff318914700 10 mon.a@0(leader) e1  entity name 'unknown.' type 0
2015-02-25 11:53:54.113440 7ff318914700 10 mon.a@0(leader) e1  mesg 0x7ff2f8001d50 from 127.0.0.1:6791/0
2015-02-25 11:53:54.113444 7ff318914700 20 mon.a@0(leader) e1 have connection
2015-02-25 11:53:54.113447 7ff318914700 20 mon.a@0(leader) e1 ms_dispatch existing session MonSession: osd.2 :/0 is openallow * for osd.2 :/0
2015-02-25 11:53:54.113452 7ff318914700 20 mon.a@0(leader) e1  caps allow *
2015-02-25 11:53:54.113454 7ff318914700 10 mon.a@0(leader).paxosservice(osdmap 1..10) dispatch osd_boot(osd.2 booted 0 features 1125899906842623 v8) v6 from osd.2 127.0.0.1:6813/18809
2015-02-25 11:53:54.113463 7ff318914700  5 mon.a@0(leader).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:54.113464 lease_expire=0.000000 has v0 lc 61
2015-02-25 11:53:54.113476 7ff318914700 10 mon.a@0(leader).paxosservice(osdmap 1..10)  waiting for paxos -> readable (v8)
2015-02-25 11:53:54.113479 7ff318914700  5 mon.a@0(leader).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:54.113480 lease_expire=0.000000 has v0 lc 61
2015-02-25 11:53:54.115223 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.1 127.0.0.1:6790/0 212 ==== forward(log(1 entries) v1 caps allow * tid 36 con_features 0) to leader v3 ==== 587+0+0 (3901180112 0 0) 0x7ff2fc001b70 con 0x4236480
2015-02-25 11:53:54.115262 7ff318914700 20 mon.a@0(leader) e1 have connection
2015-02-25 11:53:54.115269 7ff318914700 20 mon.a@0(leader) e1 ms_dispatch existing session MonSession: mon.1 127.0.0.1:6790/0 is openallow * for mon.1 127.0.0.1:6790/0
2015-02-25 11:53:54.115284 7ff318914700 20 mon.a@0(leader) e1  caps allow *
2015-02-25 11:53:54.115291 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:54.115295 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.115299 7ff318914700 20  allow all
2015-02-25 11:53:54.115304 7ff318914700 10 mon.a@0(leader) e1 received forwarded message from mon.1 127.0.0.1:6790/0 via mon.1 127.0.0.1:6790/0
2015-02-25 11:53:54.115318 7ff318914700 20 is_capable service=mon command= exec on cap allow *
2015-02-25 11:53:54.115322 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.115325 7ff318914700 20  allow all
2015-02-25 11:53:54.115338 7ff318914700 10 mon.a@0(leader) e1  caps are allow *
2015-02-25 11:53:54.115344 7ff318914700 10 mon.a@0(leader) e1  entity name 'unknown.' type 0
2015-02-25 11:53:54.115351 7ff318914700 10 mon.a@0(leader) e1  mesg 0x7ff2fc002860 from 127.0.0.1:6790/0
2015-02-25 11:53:54.115360 7ff318914700 20 mon.a@0(leader) e1 have connection
2015-02-25 11:53:54.115364 7ff318914700 20 mon.a@0(leader) e1 ms_dispatch existing session MonSession: mon.1 :/0 is openallow * for mon.1 :/0
2015-02-25 11:53:54.115374 7ff318914700 20 mon.a@0(leader) e1  caps allow *
2015-02-25 11:53:54.115378 7ff318914700 10 mon.a@0(leader).paxosservice(logm 1..37) dispatch log(1 entries) v1 from mon.1 127.0.0.1:6790/0
2015-02-25 11:53:54.115393 7ff318914700  5 mon.a@0(leader).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:54.115394 lease_expire=0.000000 has v0 lc 61
2015-02-25 11:53:54.115410 7ff318914700 10 mon.a@0(leader).paxosservice(logm 1..37)  waiting for paxos -> readable (v0)
2015-02-25 11:53:54.115417 7ff318914700  5 mon.a@0(leader).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:54.115418 lease_expire=0.000000 has v0 lc 61
2015-02-25 11:53:54.115451 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.1 127.0.0.1:6790/0 213 ==== forward(osd_boot(osd.0 booted 0 features 1125899906842623 v3) v6 caps allow * tid 5 con_features 1125899906842623) to leader v3 ==== 1978+0+0 (3061426784 0 0) 0x7ff2fc003300 con 0x4236480
2015-02-25 11:53:54.115479 7ff318914700 20 mon.a@0(leader) e1 have connection
2015-02-25 11:53:54.115484 7ff318914700 20 mon.a@0(leader) e1 ms_dispatch existing session MonSession: mon.1 127.0.0.1:6790/0 is openallow * for mon.1 127.0.0.1:6790/0
2015-02-25 11:53:54.115497 7ff318914700 20 mon.a@0(leader) e1  caps allow *
2015-02-25 11:53:54.115504 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:54.115508 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.115511 7ff318914700 20  allow all
2015-02-25 11:53:54.115514 7ff318914700 10 mon.a@0(leader) e1 received forwarded message from osd.0 127.0.0.1:6800/18322 via mon.1 127.0.0.1:6790/0
2015-02-25 11:53:54.115528 7ff318914700 20 is_capable service=mon command= exec on cap allow *
2015-02-25 11:53:54.115532 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.115535 7ff318914700 20  allow all
2015-02-25 11:53:54.115545 7ff318914700 10 mon.a@0(leader) e1  caps are allow *
2015-02-25 11:53:54.115550 7ff318914700 10 mon.a@0(leader) e1  entity name 'unknown.' type 0
2015-02-25 11:53:54.115556 7ff318914700 10 mon.a@0(leader) e1  mesg 0x7ff2fc003ac0 from 127.0.0.1:6790/0
2015-02-25 11:53:54.115564 7ff318914700 20 mon.a@0(leader) e1 have connection
2015-02-25 11:53:54.115568 7ff318914700 20 mon.a@0(leader) e1 ms_dispatch existing session MonSession: osd.0 :/0 is openallow * for osd.0 :/0
2015-02-25 11:53:54.115577 7ff318914700 20 mon.a@0(leader) e1  caps allow *
2015-02-25 11:53:54.115581 7ff318914700 10 mon.a@0(leader).paxosservice(osdmap 1..10) dispatch osd_boot(osd.0 booted 0 features 1125899906842623 v3) v6 from osd.0 127.0.0.1:6800/18322
2015-02-25 11:53:54.115594 7ff318914700  5 mon.a@0(leader).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:54.115595 lease_expire=0.000000 has v0 lc 61
2015-02-25 11:53:54.115608 7ff318914700 10 mon.a@0(leader).paxosservice(osdmap 1..10)  waiting for paxos -> readable (v3)
2015-02-25 11:53:54.115614 7ff318914700  5 mon.a@0(leader).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:54.115615 lease_expire=0.000000 has v0 lc 61
2015-02-25 11:53:54.115642 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.1 127.0.0.1:6790/0 214 ==== forward(osd_boot(osd.1 booted 0 features 1125899906842623 v5) v6 caps allow * tid 11 con_features 1125899906842623) to leader v3 ==== 1978+0+0 (1198834337 0 0) 0x7ff2fc005720 con 0x4236480
2015-02-25 11:53:54.115668 7ff318914700 20 mon.a@0(leader) e1 have connection
2015-02-25 11:53:54.115672 7ff318914700 20 mon.a@0(leader) e1 ms_dispatch existing session MonSession: mon.1 127.0.0.1:6790/0 is openallow * for mon.1 127.0.0.1:6790/0
2015-02-25 11:53:54.115685 7ff318914700 20 mon.a@0(leader) e1  caps allow *
2015-02-25 11:53:54.115740 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:54.115745 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.115748 7ff318914700 20  allow all
2015-02-25 11:53:54.115751 7ff318914700 10 mon.a@0(leader) e1 received forwarded message from osd.1 127.0.0.1:6809/18556 via mon.1 127.0.0.1:6790/0
2015-02-25 11:53:54.115766 7ff318914700 20 is_capable service=mon command= exec on cap allow *
2015-02-25 11:53:54.115770 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.115772 7ff318914700 20  allow all
2015-02-25 11:53:54.115787 7ff318914700 10 mon.a@0(leader) e1  caps are allow *
2015-02-25 11:53:54.115792 7ff318914700 10 mon.a@0(leader) e1  entity name 'unknown.' type 0
2015-02-25 11:53:54.115798 7ff318914700 10 mon.a@0(leader) e1  mesg 0x7ff2fc005a30 from 127.0.0.1:6790/0
2015-02-25 11:53:54.115807 7ff318914700 20 mon.a@0(leader) e1 have connection
2015-02-25 11:53:54.115811 7ff318914700 20 mon.a@0(leader) e1 ms_dispatch existing session MonSession: osd.1 :/0 is openallow * for osd.1 :/0
2015-02-25 11:53:54.115820 7ff318914700 20 mon.a@0(leader) e1  caps allow *
2015-02-25 11:53:54.115824 7ff318914700 10 mon.a@0(leader).paxosservice(osdmap 1..10) dispatch osd_boot(osd.1 booted 0 features 1125899906842623 v5) v6 from osd.1 127.0.0.1:6809/18556
2015-02-25 11:53:54.115837 7ff318914700  5 mon.a@0(leader).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:54.115839 lease_expire=0.000000 has v0 lc 61
2015-02-25 11:53:54.115853 7ff318914700 10 mon.a@0(leader).paxosservice(osdmap 1..10)  waiting for paxos -> readable (v5)
2015-02-25 11:53:54.115858 7ff318914700  5 mon.a@0(leader).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:54.115860 lease_expire=0.000000 has v0 lc 61
2015-02-25 11:53:54.115890 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.1 127.0.0.1:6790/0 215 ==== forward(osd_alive(want up_thru 6 have 6) v1 caps allow * tid 14 con_features 1125899906842623) to leader v3 ==== 286+0+0 (407096547 0 0) 0x7ff2fc007780 con 0x4236480
2015-02-25 11:53:54.115915 7ff318914700 20 mon.a@0(leader) e1 have connection
2015-02-25 11:53:54.115921 7ff318914700 20 mon.a@0(leader) e1 ms_dispatch existing session MonSession: mon.1 127.0.0.1:6790/0 is openallow * for mon.1 127.0.0.1:6790/0
2015-02-25 11:53:54.115934 7ff318914700 20 mon.a@0(leader) e1  caps allow *
2015-02-25 11:53:54.115939 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:54.115943 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.115946 7ff318914700 20  allow all
2015-02-25 11:53:54.115949 7ff318914700 10 mon.a@0(leader) e1 received forwarded message from osd.0 127.0.0.1:6800/18322 via mon.1 127.0.0.1:6790/0
2015-02-25 11:53:54.115963 7ff318914700 20 is_capable service=mon command= exec on cap allow *
2015-02-25 11:53:54.115966 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.115969 7ff318914700 20  allow all
2015-02-25 11:53:54.115979 7ff318914700 10 mon.a@0(leader) e1  caps are allow *
2015-02-25 11:53:54.115984 7ff318914700 10 mon.a@0(leader) e1  entity name 'unknown.' type 0
2015-02-25 11:53:54.115990 7ff318914700 10 mon.a@0(leader) e1  mesg 0x7ff2fc0082b0 from 127.0.0.1:6790/0
2015-02-25 11:53:54.115998 7ff318914700 20 mon.a@0(leader) e1 have connection
2015-02-25 11:53:54.116002 7ff318914700 20 mon.a@0(leader) e1 ms_dispatch existing session MonSession: osd.0 :/0 is openallow * for osd.0 :/0
2015-02-25 11:53:54.116010 7ff318914700 20 mon.a@0(leader) e1  caps allow *
2015-02-25 11:53:54.116014 7ff318914700 10 mon.a@0(leader).paxosservice(osdmap 1..10) dispatch osd_alive(want up_thru 6 have 6) v1 from osd.0 127.0.0.1:6800/18322
2015-02-25 11:53:54.116026 7ff318914700  5 mon.a@0(leader).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:54.116027 lease_expire=0.000000 has v0 lc 61
2015-02-25 11:53:54.116040 7ff318914700 10 mon.a@0(leader).paxosservice(osdmap 1..10)  waiting for paxos -> readable (v6)
2015-02-25 11:53:54.116046 7ff318914700  5 mon.a@0(leader).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:54.116047 lease_expire=0.000000 has v0 lc 61
2015-02-25 11:53:54.116081 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.1 127.0.0.1:6790/0 216 ==== forward(mon_command({"prefix": "osd setcrushmap"} v 0) v1 caps allow * tid 35 con_features 1125899906842623) to leader v3 ==== 776+0+0 (644466655 0 0) 0x7ff2fc007f10 con 0x4236480
2015-02-25 11:53:54.116106 7ff318914700 20 mon.a@0(leader) e1 have connection
2015-02-25 11:53:54.116110 7ff318914700 20 mon.a@0(leader) e1 ms_dispatch existing session MonSession: mon.1 127.0.0.1:6790/0 is openallow * for mon.1 127.0.0.1:6790/0
2015-02-25 11:53:54.116123 7ff318914700 20 mon.a@0(leader) e1  caps allow *
2015-02-25 11:53:54.116128 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:54.116132 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.116135 7ff318914700 20  allow all
2015-02-25 11:53:54.116138 7ff318914700 10 mon.a@0(leader) e1 received forwarded message from client.4114 127.0.0.1:0/1018996 via mon.1 127.0.0.1:6790/0
2015-02-25 11:53:54.116152 7ff318914700 20 is_capable service=mon command= exec on cap allow *
2015-02-25 11:53:54.116155 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.116158 7ff318914700 20  allow all
2015-02-25 11:53:54.116167 7ff318914700 10 mon.a@0(leader) e1  caps are allow *
2015-02-25 11:53:54.116171 7ff318914700 10 mon.a@0(leader) e1  entity name 'unknown.' type 0
2015-02-25 11:53:54.116177 7ff318914700 10 mon.a@0(leader) e1  mesg 0x7ff2fc008d40 from 127.0.0.1:6790/0
2015-02-25 11:53:54.116185 7ff318914700 20 mon.a@0(leader) e1 have connection
2015-02-25 11:53:54.116189 7ff318914700 20 mon.a@0(leader) e1 ms_dispatch existing session MonSession: client.4114 :/0 is openallow * for client.4114 :/0
2015-02-25 11:53:54.116198 7ff318914700 20 mon.a@0(leader) e1  caps allow *
2015-02-25 11:53:54.116491 7ff318914700  0 mon.a@0(leader) e1 handle_command mon_command({"prefix": "osd setcrushmap"} v 0) v1
2015-02-25 11:53:54.116544 7ff318914700 20 is_capable service=osd command=osd setcrushmap read write on cap allow *
2015-02-25 11:53:54.116551 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.116554 7ff318914700 20  allow all
2015-02-25 11:53:54.116556 7ff318914700 10 mon.a@0(leader) e1 _allowed_command capable
2015-02-25 11:53:54.116574 7ff318914700  0 log_channel(audit) log [INF] : from='client.4114 :/0' entity='unknown.' cmd=[{"prefix": "osd setcrushmap"}]: dispatch
2015-02-25 11:53:54.116628 7ff318914700  1 -- 127.0.0.1:6789/0 --> 127.0.0.1:6789/0 -- log(1 entries) v1 -- ?+0 0x7ff300194030 con 0x4225cc0
2015-02-25 11:53:54.116662 7ff318914700 10 mon.a@0(leader).paxosservice(osdmap 1..10) dispatch mon_command({"prefix": "osd setcrushmap"} v 0) v1 from client.4114 127.0.0.1:0/1018996
2015-02-25 11:53:54.116678 7ff318914700  5 mon.a@0(leader).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:54.116680 lease_expire=0.000000 has v0 lc 61
2015-02-25 11:53:54.116693 7ff318914700 10 mon.a@0(leader).paxosservice(osdmap 1..10)  waiting for paxos -> readable (v0)
2015-02-25 11:53:54.116700 7ff318914700  5 mon.a@0(leader).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:54.116701 lease_expire=0.000000 has v0 lc 61
2015-02-25 11:53:54.116745 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.0 127.0.0.1:6789/0 0 ==== log(1 entries) v1 ==== 0+0+0 (0 0 0) 0x7ff300194030 con 0x4225cc0
2015-02-25 11:53:54.116768 7ff318914700 20 mon.a@0(leader) e1 have connection
2015-02-25 11:53:54.116773 7ff318914700 20 mon.a@0(leader) e1 ms_dispatch existing session MonSession: mon.0 127.0.0.1:6789/0 is openallow * for mon.0 127.0.0.1:6789/0
2015-02-25 11:53:54.116787 7ff318914700 20 mon.a@0(leader) e1  caps allow *
2015-02-25 11:53:54.116791 7ff318914700 10 mon.a@0(leader).paxosservice(logm 1..37) dispatch log(1 entries) v1 from mon.0 127.0.0.1:6789/0
2015-02-25 11:53:54.116804 7ff318914700  5 mon.a@0(leader).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:54.116805 lease_expire=0.000000 has v0 lc 61
2015-02-25 11:53:54.116817 7ff318914700 10 mon.a@0(leader).paxosservice(logm 1..37)  waiting for paxos -> readable (v0)
2015-02-25 11:53:54.116822 7ff318914700  5 mon.a@0(leader).paxos(paxos recovering c 1..61) is_readable = 0 - now=2015-02-25 11:53:54.116824 lease_expire=0.000000 has v0 lc 61
2015-02-25 11:53:54.164582 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.2 127.0.0.1:6791/0 196 ==== paxos(last lc 61 fc 1 pn 400 opn 300) v3 ==== 13622+0+0 (1717090164 0 0) 0x7ff2f80008c0 con 0x423be50
2015-02-25 11:53:54.164603 7ff318914700 20 mon.a@0(leader) e1 have connection
2015-02-25 11:53:54.164606 7ff318914700 20 mon.a@0(leader) e1 ms_dispatch existing session MonSession: mon.2 127.0.0.1:6791/0 is openallow * for mon.2 127.0.0.1:6791/0
2015-02-25 11:53:54.164621 7ff318914700 20 mon.a@0(leader) e1  caps allow *
2015-02-25 11:53:54.164624 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:54.164625 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.164626 7ff318914700 20  allow all
2015-02-25 11:53:54.164627 7ff318914700 20 is_capable service=mon command= exec on cap allow *
2015-02-25 11:53:54.164629 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.164629 7ff318914700 20  allow all
2015-02-25 11:53:54.164631 7ff318914700 10 mon.a@0(leader).paxos(paxos recovering c 1..61) handle_last paxos(last lc 61 fc 1 pn 400 opn 300) v3
2015-02-25 11:53:54.164638 7ff318914700 10 mon.a@0(leader).paxos(paxos recovering c 1..61) store_state nothing to commit
2015-02-25 11:53:54.164662 7ff318914700 10 mon.a@0(leader).paxos(paxos recovering c 1..61)  they accepted our pn, we now have 2 peons
2015-02-25 11:53:54.164667 7ff318914700 10 mon.a@0(leader).paxos(paxos recovering c 1..61) we learned an uncommitted value for 62 pn 300 13530 bytes
2015-02-25 11:53:54.164677 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.2 127.0.0.1:6791/0 197 ==== time_check( pong e 8 r 1 ts 2015-02-25 11:53:54.164458 ) v1 ==== 36+0+0 (2444859969 0 0) 0x7ff2f80068b0 con 0x423be50
2015-02-25 11:53:54.164686 7ff318914700 20 mon.a@0(leader) e1 have connection
2015-02-25 11:53:54.164688 7ff318914700 20 mon.a@0(leader) e1 ms_dispatch existing session MonSession: mon.2 127.0.0.1:6791/0 is openallow * for mon.2 127.0.0.1:6791/0
2015-02-25 11:53:54.164692 7ff318914700 20 mon.a@0(leader) e1  caps allow *
2015-02-25 11:53:54.164694 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:54.164695 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.164696 7ff318914700 20  allow all
2015-02-25 11:53:54.164697 7ff318914700 10 mon.a@0(leader) e1 handle_timecheck time_check( pong e 8 r 1 ts 2015-02-25 11:53:54.164458 ) v1
2015-02-25 11:53:54.164701 7ff318914700 10 mon.a@0(leader) e1 handle_timecheck_leader time_check( pong e 8 r 1 ts 2015-02-25 11:53:54.164458 ) v1
2015-02-25 11:53:54.164713 7ff318914700 10 mon.a@0(leader) e1 handle_timecheck_leader from mon.2 127.0.0.1:6791/0 ts 2015-02-25 11:53:54.164458 delta -0.000246525 skew_bound 0 latency 0.0521936
2015-02-25 11:53:54.187725 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.1 127.0.0.1:6790/0 217 ==== paxos(last lc 61 fc 1 pn 400 opn 300) v3 ==== 13622+0+0 (3722418234 0 0) 0x7ff2fc003300 con 0x4236480
2015-02-25 11:53:54.187754 7ff318914700 20 mon.a@0(leader) e1 have connection
2015-02-25 11:53:54.187757 7ff318914700 20 mon.a@0(leader) e1 ms_dispatch existing session MonSession: mon.1 127.0.0.1:6790/0 is openallow * for mon.1 127.0.0.1:6790/0
2015-02-25 11:53:54.187760 7ff318914700 20 mon.a@0(leader) e1  caps allow *
2015-02-25 11:53:54.187762 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:54.187763 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.187764 7ff318914700 20  allow all
2015-02-25 11:53:54.187765 7ff318914700 20 is_capable service=mon command= exec on cap allow *
2015-02-25 11:53:54.187766 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.187766 7ff318914700 20  allow all
2015-02-25 11:53:54.187767 7ff318914700 10 mon.a@0(leader).paxos(paxos recovering c 1..61) handle_last paxos(last lc 61 fc 1 pn 400 opn 300) v3
2015-02-25 11:53:54.187772 7ff318914700 10 mon.a@0(leader).paxos(paxos recovering c 1..61) store_state nothing to commit
2015-02-25 11:53:54.187793 7ff318914700 10 mon.a@0(leader).paxos(paxos recovering c 1..61)  they accepted our pn, we now have 3 peons
2015-02-25 11:53:54.187797 7ff318914700 10 mon.a@0(leader).paxos(paxos recovering c 1..61) we learned an uncommitted value for 62 pn 300 13530 bytes
2015-02-25 11:53:54.187803 7ff318914700 10 mon.a@0(leader).paxos(paxos recovering c 1..61) that's everyone.  begin on old learned value
2015-02-25 11:53:54.187804 7ff318914700 10 mon.a@0(leader).paxos(paxos updating-previous c 1..61) begin for 62 13530 bytes
2015-02-25 11:53:54.221061 7ff318914700 10 mon.a@0(leader).paxos(paxos updating-previous c 1..61)  sending begin to mon.1
2015-02-25 11:53:54.221100 7ff318914700  1 -- 127.0.0.1:6789/0 --> mon.1 127.0.0.1:6790/0 -- paxos(begin lc 61 fc 0 pn 400 opn 0) v3 -- ?+0 0x7ff3000770f0
2015-02-25 11:53:54.221129 7ff318914700 10 mon.a@0(leader).paxos(paxos updating-previous c 1..61)  sending begin to mon.2
2015-02-25 11:53:54.221136 7ff318914700  1 -- 127.0.0.1:6789/0 --> mon.2 127.0.0.1:6791/0 -- paxos(begin lc 61 fc 0 pn 400 opn 0) v3 -- ?+0 0x7ff300077f20
2015-02-25 11:53:54.221178 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.1 127.0.0.1:6790/0 218 ==== time_check( pong e 8 r 1 ts 2015-02-25 11:53:54.187582 ) v1 ==== 36+0+0 (128518225 0 0) 0x7ff2fc005720 con 0x4236480
2015-02-25 11:53:54.221201 7ff318914700 20 mon.a@0(leader) e1 have connection
2015-02-25 11:53:54.221205 7ff318914700 20 mon.a@0(leader) e1 ms_dispatch existing session MonSession: mon.1 127.0.0.1:6790/0 is openallow * for mon.1 127.0.0.1:6790/0
2015-02-25 11:53:54.221211 7ff318914700 20 mon.a@0(leader) e1  caps allow *
2015-02-25 11:53:54.221215 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:54.221217 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.221219 7ff318914700 20  allow all
2015-02-25 11:53:54.221221 7ff318914700 10 mon.a@0(leader) e1 handle_timecheck time_check( pong e 8 r 1 ts 2015-02-25 11:53:54.187582 ) v1
2015-02-25 11:53:54.221225 7ff318914700 10 mon.a@0(leader) e1 handle_timecheck_leader time_check( pong e 8 r 1 ts 2015-02-25 11:53:54.187582 ) v1
2015-02-25 11:53:54.221240 7ff318914700 10 mon.a@0(leader) e1 handle_timecheck_leader from mon.1 127.0.0.1:6790/0 ts 2015-02-25 11:53:54.187582 delta -0.0336463 skew_bound 0 latency 0.108768
2015-02-25 11:53:54.221256 7ff318914700 10 mon.a@0(leader) e1 handle_timecheck_leader got pongs from everybody (3 total)
2015-02-25 11:53:54.221258 7ff318914700 10 mon.a@0(leader) e1 timecheck_finish_round curr 1
2015-02-25 11:53:54.221260 7ff318914700 10 mon.a@0(leader) e1 timecheck_report
2015-02-25 11:53:54.221271 7ff318914700 10 mon.a@0(leader) e1 timecheck_report send report to mon.1 127.0.0.1:6790/0
2015-02-25 11:53:54.221274 7ff318914700  1 -- 127.0.0.1:6789/0 --> mon.1 127.0.0.1:6790/0 -- time_check( report e 8 r 2 #skews 3 #latencies 3 ) v1 -- ?+0 0x7ff30007a0c0
2015-02-25 11:53:54.221297 7ff318914700 10 mon.a@0(leader) e1 timecheck_report send report to mon.2 127.0.0.1:6791/0
2015-02-25 11:53:54.221303 7ff318914700  1 -- 127.0.0.1:6789/0 --> mon.2 127.0.0.1:6791/0 -- time_check( report e 8 r 2 #skews 3 #latencies 3 ) v1 -- ?+0 0x7ff30007b190
2015-02-25 11:53:54.263269 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.2 127.0.0.1:6791/0 198 ==== paxos(accept lc 61 fc 0 pn 400 opn 0) v3 ==== 80+0+0 (3718585490 0 0) 0x7ff2f80008c0 con 0x423be50
2015-02-25 11:53:54.263299 7ff318914700 20 mon.a@0(leader) e1 have connection
2015-02-25 11:53:54.263302 7ff318914700 20 mon.a@0(leader) e1 ms_dispatch existing session MonSession: mon.2 127.0.0.1:6791/0 is openallow * for mon.2 127.0.0.1:6791/0
2015-02-25 11:53:54.263308 7ff318914700 20 mon.a@0(leader) e1  caps allow *
2015-02-25 11:53:54.263311 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:54.263314 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.263315 7ff318914700 20  allow all
2015-02-25 11:53:54.263317 7ff318914700 20 is_capable service=mon command= exec on cap allow *
2015-02-25 11:53:54.263318 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.263319 7ff318914700 20  allow all
2015-02-25 11:53:54.263321 7ff318914700 10 mon.a@0(leader).paxos(paxos updating-previous c 1..61) handle_accept paxos(accept lc 61 fc 0 pn 400 opn 0) v3
2015-02-25 11:53:54.263329 7ff318914700 10 mon.a@0(leader).paxos(paxos updating-previous c 1..61)  now 0,2 have accepted
2015-02-25 11:53:54.263343 7ff318914700  1 -- 127.0.0.1:6789/0 <== mon.1 127.0.0.1:6790/0 219 ==== paxos(accept lc 61 fc 0 pn 400 opn 0) v3 ==== 80+0+0 (3889582458 0 0) 0x7ff2fc007780 con 0x4236480
2015-02-25 11:53:54.263351 7ff318914700 20 mon.a@0(leader) e1 have connection
2015-02-25 11:53:54.263352 7ff318914700 20 mon.a@0(leader) e1 ms_dispatch existing session MonSession: mon.1 127.0.0.1:6790/0 is openallow * for mon.1 127.0.0.1:6790/0
2015-02-25 11:53:54.263356 7ff318914700 20 mon.a@0(leader) e1  caps allow *
2015-02-25 11:53:54.263358 7ff318914700 20 is_capable service=mon command= read on cap allow *
2015-02-25 11:53:54.263359 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.263360 7ff318914700 20  allow all
2015-02-25 11:53:54.263362 7ff318914700 20 is_capable service=mon command= exec on cap allow *
2015-02-25 11:53:54.263363 7ff318914700 20  allow so far , doing grant allow *
2015-02-25 11:53:54.263364 7ff318914700 20  allow all
2015-02-25 11:53:54.263364 7ff318914700 10 mon.a@0(leader).paxos(paxos updating-previous c 1..61) handle_accept paxos(accept lc 61 fc 0 pn 400 opn 0) v3
2015-02-25 11:53:54.263369 7ff318914700 10 mon.a@0(leader).paxos(paxos updating-previous c 1..61)  now 0,1,2 have accepted
2015-02-25 11:53:54.263373 7ff318914700 10 mon.a@0(leader).paxos(paxos updating-previous c 1..61)  got majority, committing, done with update
2015-02-25 11:53:54.263375 7ff318914700 10 mon.a@0(leader).paxos(paxos updating-previous c 1..61) commit_start 62
2015-02-25 11:53:54.296667 7ff31a207700 20 mon.a@0(leader).paxos(paxos writing-previous c 1..61) commit_finish 62
2015-02-25 11:53:54.296764 7ff31a207700 10 mon.a@0(leader).paxos(paxos writing-previous c 1..62)  sending commit to mon.1
2015-02-25 11:53:54.296781 7ff31a207700  1 -- 127.0.0.1:6789/0 --> mon.1 127.0.0.1:6790/0 -- paxos(commit lc 62 fc 0 pn 400 opn 0) v3 -- ?+0 0x7ff2f0168080
2015-02-25 11:53:54.296809 7ff31a207700 10 mon.a@0(leader).paxos(paxos writing-previous c 1..62)  sending commit to mon.2
2015-02-25 11:53:54.296816 7ff31a207700  1 -- 127.0.0.1:6789/0 --> mon.2 127.0.0.1:6791/0 -- paxos(commit lc 62 fc 0 pn 400 opn 0) v3 -- ?+0 0x7ff2f0040960
2015-02-25 11:53:54.296841 7ff31a207700 10 mon.a@0(leader) e1 refresh_from_paxos
2015-02-25 11:53:54.296920 7ff31a207700 10 mon.a@0(leader).paxosservice(pgmap 1..16) refresh
2015-02-25 11:53:54.296968 7ff31a207700 10 mon.a@0(leader).paxosservice(mdsmap 1..1) refresh
2015-02-25 11:53:54.297010 7ff31a207700 10 mon.a@0(leader).paxosservice(osdmap 1..10) refresh
2015-02-25 11:53:54.297079 7ff31a207700 10 mon.a@0(leader).paxosservice(logm 1..38) refresh

#5 Updated by Loïc Dachary about 9 years ago

  • Description updated (diff)

#6 Updated by Loïc Dachary about 9 years ago

  • Description updated (diff)

#7 Updated by Loïc Dachary about 9 years ago

  • Description updated (diff)

#8 Updated by Loïc Dachary about 9 years ago

  • Description updated (diff)

#9 Updated by Loïc Dachary about 9 years ago

<jluis> if the tool hangs for more than one of the timeouts (I believe the lowest is 5 seconds), elections will be triggered when it snaps out of it
<jluis> by then it will retry the message once election finishes
<jluis> that doesn't happen solely with the crushtool though; anything that takes more than that time, on the monitor, will end up having the same ill effects
<jluis> one of the reasons why leveldb writes are now performed on a separate thread
<jluis> this could be improved though
<loicd> the problem is that it is quite possible that crushtool will lag for that long
<jluis> for this particular purpose, having the crushtool map stuff be performed asynchronously, having a callback that would finish processing the update once the validation is finished; or retrying the message if we lost quorum
<loicd> yeah
<jluis> we would have to make sure that the callback would check if the state once if finishes validation is the same as the state once it started validation
<jluis> otherwise would have to revalidate
<jluis> e.g., two concurrent requests, as it is now, would not break anything as they'd be serialized
<jluis> if we async the validation, a concurrent request could change the crushmap in-between

#10 Updated by Loïc Dachary about 9 years ago

#11 Updated by Loïc Dachary about 9 years ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF