Bug #11313
closedMonitors not forming quorum : flapping : Cluster unreachable
Added by karan singh about 9 years ago. Updated about 9 years ago.
0%
Description
Today i encountered a weird problem with my ceph cluster , which was working before.
I was troubleshooting my PG which are incomplete. So i tried marking osd as lost.
As soon as i marked osd as lost # ceph osd lost 115 --yes-i-really-mean-it cluster became unreachable. All the monitors lost quorum and i was not able to connect to the cluster back.
It tried restarting services several times , but no luck.
Below are the logs that shows that the problem came exactly after # ceph osd lost 115
2015-04-02 13:44:47.280207 7f376cce1700 0 mon.pouta-s01@0(leader) e3 handle_command mon_command({"prefix": "status"} v 0) v1 2015-04-02 13:45:09.118663 7f376d6e2700 0 mon.pouta-s01@0(leader).data_health(1920) update_stats avail 80% total 82932 MB, used 12329 MB, a vail 66390 MB 2015-04-02 13:46:12.709808 7f376d6e2700 0 mon.pouta-s01@0(leader).data_health(1920) update_stats avail 80% total 82932 MB, used 12328 MB, a vail 66391 MB 2015-04-02 13:47:14.233436 7f376d6e2700 0 mon.pouta-s01@0(leader).data_health(1920) update_stats avail 80% total 82932 MB, used 12327 MB, a vail 66392 MB 2015-04-02 13:48:18.824272 7f376d6e2700 0 mon.pouta-s01@0(leader).data_health(1920) update_stats avail 80% total 82932 MB, used 12326 MB, a vail 66392 MB 2015-04-02 13:48:19.059478 7f376cce1700 0 mon.pouta-s01@0(leader) e3 handle_command mon_command({"prefix": "osd lost", "sure": "--yes-i-really-mean-it", "id": 115} v 0) v1 2015-04-02 13:48:38.403011 7f376cce1700 0 log [INF] : mon.pouta-s01 calling new monitor election 2015-04-02 13:48:38.406063 7f376cce1700 0 log [INF] : mon.pouta-s01@0 won leader election with quorum 0,1,2 2015-04-02 13:48:49.455605 7f376cce1700 0 log [INF] : monmap e3: 3 mons at {pouta-s01=10.100.50.1:6789/0,pouta-s02=10.100.50.2:6789/0,pouta-s03=10.100.50.3:6789/0} 2015-04-02 13:48:49.456310 7f376cce1700 0 log [INF] : pgmap v598458: 18432 pgs: 7 down+incomplete, 18412 active+clean, 13 incomplete; 2338 GB data, 19068 GB used, 853 TB / 871 TB avail; 133/328874 unfound (0.040%) 2015-04-02 13:48:49.456409 7f376cce1700 0 log [INF] : mdsmap e1: 0/0/1 up 2015-04-02 13:48:49.457323 7f376cce1700 0 log [INF] : osdmap e262821: 240 osds: 240 up, 240 in 2015-04-02 13:48:49.665436 7f376cce1700 0 log [INF] : mon.pouta-s01 calling new monitor election 2015-04-02 13:48:49.668347 7f376cce1700 0 log [INF] : mon.pouta-s01@0 won leader election with quorum 0,1,2 2015-04-02 13:49:00.525321 7f376cce1700 0 log [INF] : monmap e3: 3 mons at {pouta-s01=10.100.50.1:6789/0,pouta-s02=10.100.50.2:6789/0,pouta-s03=10.100.50.3:6789/0} 2015-04-02 13:49:00.532675 7f376cce1700 0 log [INF] : pgmap v598458: 18432 pgs: 7 down+incomplete, 18412 active+clean, 13 incomplete; 2338 GB data, 19068 GB used, 853 TB / 871 TB avail; 133/328874 unfound (0.040%) 2015-04-02 13:49:00.532980 7f376cce1700 0 log [INF] : mdsmap e1: 0/0/1 up 2015-04-02 13:49:00.538582 7f376cce1700 0 log [INF] : osdmap e262822: 240 osds: 240 up, 240 in 2015-04-02 13:49:00.782187 7f376cce1700 0 log [INF] : mon.pouta-s01 calling new monitor election 2015-04-02 13:49:01.002140 7f376cce1700 0 log [INF] : mon.pouta-s01@0 won leader election with quorum 0,1,2
More logs : http://paste.ubuntu.com/10724004/
You can observer monitor entering into quorum and then loosing the quorum : http://paste.ubuntu.com/10723915/
Cluster unreachable : debug ms 1 : http://paste.ubuntu.com/10723862/
Monitor logs : debug mon 10 : http://paste.ubuntu.com/10723840/
I tried upgrading to 0.80.9 as well but no luck.
This is production cluster and need to fix really soon ( before easter holidays ) any help from the development team would be really helpful.
Updated by Joao Eduardo Luis about 9 years ago
Looks like the leader keeps on calling elections after not getting acks for paxos leases.
This may be a case of overloaded monitors, clock skews or something totally different.
Logs with 'debug paxos = 10' would help.
Updated by Joao Eduardo Luis about 9 years ago
Also, logs from the other monitors would definitely help too.
Updated by karan singh about 9 years ago
Hi
1# Monitors are not overloaded
2# NTP is setup on all monitors and to be on safe side mon clock drift allowed = 10 has been set just for troubleshooting.
Logs with debug paxos = 10
Monitor 1 : http://paste.ubuntu.com/10726112/
Monitor 2 : http://paste.ubuntu.com/10726096/
Monitor 3 : http://paste.ubuntu.com/10726070/
I have also tried restarting all the monitor machines and checked quorum_status, its still forming quorum for a moment and then going out of quorum.
[root@XXXXX-s02 ceph]# ceph daemon mon.XXXXX-s02 mon_status { "name": "XXXXX-s02", "rank": 1, "state": "electing", "election_epoch": 3227, "quorum": [], "outside_quorum": [], "extra_probe_peers": [], "sync_provider": [], "monmap": { "epoch": 3, "fsid": "2bd3283d-67ef-4316-8b7e-d8f4747eae33", "modified": "2014-03-09 05:34:22.667665", "created": "2014-03-09 05:33:48.846437", "mons": [ { "rank": 0, "name": "XXXXX-s01", "addr": "10.100.50.1:6789\/0"}, { "rank": 1, "name": "XXXXX-s02", "addr": "10.100.50.2:6789\/0"}, { "rank": 2, "name": "XXXXX-s03", "addr": "10.100.50.3:6789\/0"}]}} [root@XXXXX-s02 ceph]# [root@XXXXX-s02 ceph]# [root@XXXXX-s02 ceph]# ceph daemon mon.XXXXX-s02 mon_status { "name": "XXXXX-s02", "rank": 1, "state": "peon", "election_epoch": 3232, "quorum": [ 0, 1, 2], "outside_quorum": [], "extra_probe_peers": [], "sync_provider": [], "monmap": { "epoch": 3, "fsid": "2bd3283d-67ef-4316-8b7e-d8f4747eae33", "modified": "2014-03-09 05:34:22.667665", "created": "2014-03-09 05:33:48.846437", "mons": [ { "rank": 0, "name": "XXXXX-s01", "addr": "10.100.50.1:6789\/0"}, { "rank": 1, "name": "XXXXX-s02", "addr": "10.100.50.2:6789\/0"}, { "rank": 2, "name": "XXXXX-s03", "addr": "10.100.50.3:6789\/0"}]}} [root@XXXXX-s02 ceph]# [root@XXXXX-s02 ceph]# [root@XXXXX-s02 ceph]# ceph daemon mon.XXXXX-s02 mon_status { "name": "XXXXX-s02", "rank": 1, "state": "electing", "election_epoch": 3233, "quorum": [], "outside_quorum": [], "extra_probe_peers": [], "sync_provider": [], "monmap": { "epoch": 3, "fsid": "2bd3283d-67ef-4316-8b7e-d8f4747eae33", "modified": "2014-03-09 05:34:22.667665", "created": "2014-03-09 05:33:48.846437", "mons": [ { "rank": 0, "name": "XXXXX-s01", "addr": "10.100.50.1:6789\/0"}, { "rank": 1, "name": "XXXXX-s02", "addr": "10.100.50.2:6789\/0"}, { "rank": 2, "name": "XXXXX-s03", "addr": "10.100.50.3:6789\/0"}]}} [root@XXXXX-s02 ceph]# [root@XXXXX-s01 ceph]# ceph daemon mon.XXXXX-s01 mon_status { "name": "XXXXX-s01", "rank": 0, "state": "leader", "election_epoch": 3252, "quorum": [ 0, 1, 2], "outside_quorum": [], "extra_probe_peers": [], "sync_provider": [], "monmap": { "epoch": 3, "fsid": "2bd3283d-67ef-4316-8b7e-d8f4747eae33", "modified": "2014-03-09 05:34:22.667665", "created": "2014-03-09 05:33:48.846437", "mons": [ { "rank": 0, "name": "XXXXX-s01", "addr": "10.100.50.1:6789\/0"}, { "rank": 1, "name": "XXXXX-s02", "addr": "10.100.50.2:6789\/0"}, { "rank": 2, "name": "XXXXX-s03", "addr": "10.100.50.3:6789\/0"}]}} [root@XXXXX-s01 ceph]# ceph daemon mon.XXXXX-s01 mon_status { "name": "XXXXX-s01", "rank": 0, "state": "electing", "election_epoch": 3255, "quorum": [], "outside_quorum": [], "extra_probe_peers": [], "sync_provider": [], "monmap": { "epoch": 3, "fsid": "2bd3283d-67ef-4316-8b7e-d8f4747eae33", "modified": "2014-03-09 05:34:22.667665", "created": "2014-03-09 05:33:48.846437", "mons": [ { "rank": 0, "name": "XXXXX-s01", "addr": "10.100.50.1:6789\/0"}, { "rank": 1, "name": "XXXXX-s02", "addr": "10.100.50.2:6789\/0"}, { "rank": 2, "name": "XXXXX-s03", "addr": "10.100.50.3:6789\/0"}]}} [root@XXXXX-s01 ceph]# ceph daemon mon.XXXXX-s01 mon_status { "name": "XXXXX-s01", "rank": 0, "state": "probing", "election_epoch": 3258, "quorum": [], "outside_quorum": [ "XXXXX-s01"], "extra_probe_peers": [], "sync_provider": [], "monmap": { "epoch": 3, "fsid": "2bd3283d-67ef-4316-8b7e-d8f4747eae33", "modified": "2014-03-09 05:34:22.667665", "created": "2014-03-09 05:33:48.846437", "mons": [ { "rank": 0, "name": "XXXXX-s01", "addr": "10.100.50.1:6789\/0"}, { "rank": 1, "name": "XXXXX-s02", "addr": "10.100.50.2:6789\/0"}, { "rank": 2, "name": "XXXXX-s03", "addr": "10.100.50.3:6789\/0"}]}} [root@XXXXX-s01 ceph]#
Updated by karan singh about 9 years ago
It would be great if you can extend your help over IRC ( #CEPH ) [ksingh]
Updated by Joao Eduardo Luis about 9 years ago
I can't really troubleshoot this on IRC now, but you may consider that increasing the allowed clock drift value will do nothing but mask any issues you may have. That option allows the monitor to report warning IF there is a clock drift that is considered harmful - increasing it does not make the monitor any more tolerant to clock drifts.
In any case, as I suspected, the culprit seems to be the paxos lease_timeout being triggered:
2015-04-02 22:51:24.331736 7f747f4df700 1 mon.pouta-s03@2(peon).paxos(paxos active c 1474627..1475338) lease_timeout -- calling new election
You probably want to try increasing 'mon_lease' and 'mon_lease_ack_timeout' instead of 'mon_clock_drift_allowed'. :)
Updated by karan singh about 9 years ago
Thanks Joao
After adding
mon lease = 20
mon lease ack timeout = 20
At least monitors are forming quorum and seems working. However the cluster seems to respond slow on ceph commands.
Ceph commands like ceph -s is taking 40-50 seconds to complete .
Also some of the OSD are down and when i am trying to bring them up they throws timeout errors as cluster is responding slow.
[root@pouta-s04 ~]# service ceph start === osd.95 === failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.95 --keyring=/var/lib/ceph/osd/ceph-95/keyring osd crush create-or-move -- 95 3.63 host=pouta-s04 root=default' [root@pouta-s04 ~]#
[root@pouta-s04 ~]# time ceph -s cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33 health HEALTH_WARN 503 pgs degraded; 7 pgs down; 19 pgs incomplete; 3013 pgs peering; 5754 pgs stale; 3032 pgs stuck inactive; 3213 pgs stuck stale; 3535 pgs stuck unclean; 8 requests are blocked > 32 sec; recovery 11577/986841 objects degraded (1.173%); 339/328947 unfound (0.103%); 120/240 in osds are down; clock skew detected on mon.pouta-s02, mon.pouta-s03 monmap e3: 3 mons at {pouta-s01=10.100.50.1:6789/0,pouta-s02=10.100.50.2:6789/0,pouta-s03=10.100.50.3:6789/0}, election epoch 3938, quorum 0,1,2 pouta-s01,pouta-s02,pouta-s03 osdmap e262854: 240 osds: 120 up, 240 in pgmap v598497: 18432 pgs, 13 pools, 2339 GB data, 321 kobjects 19054 GB used, 853 TB / 871 TB avail 11577/986841 objects degraded (1.173%); 339/328947 unfound (0.103%) 42 stale+peering 5708 stale+active+clean 503 active+degraded 3 down+incomplete 2970 peering 1 stale+incomplete 9189 active+clean 1 down+peering 12 incomplete 3 stale+down+incomplete real 0m50.211s user 0m0.293s sys 0m0.056s [root@pouta-s04 ~]#
Is there any thing that i can set in ceph.conf to make cluster respond quickly as it was performing before this problem came. So that down OSD can be started.
Since you are the expert , could you please tell what made monitors to throw "paxos(paxos active c 1474627..1475338) lease_timeout -- calling new election"
messages , since i have not changed anything on monitors and this problem came after making an OSD lost.
The default values of lease and lease_ack_timeout should work as it was working before this problem came.
Thanks a ton for your help.
Updated by karan singh about 9 years ago
Hi
sorry just check now and still the quorum lost problem is there. Here are the logs after adding
mon lease = 30
mon lease ack timeout = 20
Monitor1 : http://paste.ubuntu.com/10727076/
Monitor2 : http://paste.ubuntu.com/10727084/
Takes 1M 35Sec to get ceph command responses.
[root@pouta-s04 ~]# time ceph -s cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33 health HEALTH_WARN 2480 pgs degraded; 8 pgs down; 18 pgs incomplete; 2728 pgs peering; 6 pgs recovering; 1 pgs recovery_wait; 8909 pgs stale; 2754 pgs stuck inactive; 8909 pgs stuck stale; 5367 pgs stuck unclean; 9 requests are blocked > 32 sec; recovery 78400/963132 objects degraded (8.140%); 300/321044 unfound (0.093%); 64/152 in osds are down; nodown,noout,norecover flag(s) set; clock skew detected on mon.pouta-s02 monmap e3: 3 mons at {pouta-s01=10.100.50.1:6789/0,pouta-s02=10.100.50.2:6789/0,pouta-s03=10.100.50.3:6789/0}, election epoch 4052, quorum 0,1,2 pouta-s01,pouta-s02,pouta-s03 osdmap e262887: 240 osds: 88 up, 152 in flags nodown,noout,norecover pgmap v598538: 18432 pgs, 13 pools, 2282 GB data, 313 kobjects 12405 GB used, 539 TB / 552 TB avail 78400/963132 objects degraded (8.140%); 300/321044 unfound (0.093%) 3 active+recovering+remapped 8 inactive 8878 stale+active+clean 2 down+incomplete 2650 peering 7 active 154 active+degraded+remapped 12 stale+incomplete 4187 active+clean 2322 active+degraded 1 active+recovery_wait+degraded 61 remapped+peering 123 active+remapped 2 down+peering 4 stale+down+incomplete 15 stale+peering 3 active+recovering+degraded real 1m35.404s user 0m0.344s sys 0m0.083s [root@pouta-s04 ~]#
Updated by Sage Weil about 9 years ago
- Priority changed from Immediate to Urgent
- you'll need to clear nodown to tell if pgs aren't peering because of osds that are not running
- 'perf top' on a mon. if you see lots of time spent in Snappy decompression, disable leveldb compression and force compaction of the mon store (ceph-mon --compact ...).
- otherwise, need mon logs to see why the mons are slow to respond.
- ceph pg $pgid query on the down or incomplete pgs to see why. usually ceph osd pool set $pool min_size 1 is all that is needed to let it recover (be sure ot set it back to 2 when it's healthy again).
Updated by karan singh about 9 years ago
Hi Sage
PG incomplete is the original issue and i was troubleshooting it. Intentionally i marked an OSD as lost , then suddenly BOOM all monitors out of quorum.
Now first i need to fix monitor quorum / slow response problem. Then concentrate on PG incomplete issue.
Monitors are taking hell lot of time 110 seconds to respond to ceph -s.
[root@pouta-s04 ~]# time ceph -s cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33 health HEALTH_WARN 3233 pgs degraded; 12 pgs down; 10 pgs incomplete; 5334 pgs peering; 73 pgs recovering; 574 pgs recovery_wait; 4518 pgs stale; 6055 pgs stuck inactive; 4518 pgs stuck stale; 10070 pgs stuck unclean; 12 requests are blocked > 32 sec; recovery 152451/930087 objects degraded (16.391%); 337/310029 unfound (0.109%); 47/177 in osds are down; nodown,noout,norecover flag(s) set; 1 mons down, quorum 0,2 pouta-s01,pouta-s03 monmap e3: 3 mons at {pouta-s01=10.100.50.1:6789/0,pouta-s02=10.100.50.2:6789/0,pouta-s03=10.100.50.3:6789/0}, election epoch 4164, quorum 0,2 pouta-s01,pouta-s03 osdmap e262984: 240 osds: 130 up, 177 in flags nodown,noout,norecover pgmap v599316: 19456 pgs, 14 pools, 2202 GB data, 302 kobjects 14558 GB used, 628 TB / 642 TB avail 152451/930087 objects degraded (16.391%); 337/310029 unfound (0.109%) 108 inactive 535 creating 20 active 162 active+recovery_wait+remapped 4878 active+clean 3 stale+incomplete 5134 peering 6 active+recovering+remapped 136 active+recovery_wait+degraded 184 active+degraded+remapped 2 incomplete 45 active+recovering+degraded 2 stale+down+incomplete 7 down+peering 68 remapped 202 active+recovery_wait 379 active+remapped 13 active+recovering 188 remapped+peering 3 down+incomplete 2785 active+degraded 74 active+recovery_wait+degraded+remapped 4508 stale+active+clean 5 stale+peering 9 active+recovering+degraded+remapped real 1m46.120s user 0m0.388s sys 0m0.102s [root@pouta-s04 ~]#Yesterday i added the below entries in ceph.conf and monitor started to for quorum but keep on responding very slow.
mon lease = 50
mon lease renew interval = 30
mon lease ack timeout = 100
Below logs ( debug mon / paxos 10 ) shows that monitor attains leader --> probing --> electing stages and after some repeats the same.
Full logs here : https://www.dropbox.com/s/fjtun72d66n46gn/ceph-mon.pouta-s01.log?dl=0
Logs extracted below.
2015-04-07 16:57:15.363270 7f339ca33700 10 mon.pouta-s01@0(leader).paxosservice(osdmap 254763..262972) dispatch osd_failure(failed osd.235 10.100.50.3:6880/1051330 for 1sec e262972 v262972) v3 from osd.69 10.100.50.4:6800/39385 2015-04-07 16:57:15.363282 7f339ca33700 5 mon.pouta-s01@0(leader).paxos(paxos updating c 1477137..1477789) is_readable now=2015-04-07 16:57:15.363283 lease_expire=2015-04-07 16:57:34.723796 has v0 lc 1477789 2015-04-07 16:57:15.363292 7f339ca33700 10 mon.pouta-s01@0(leader).osd e262972 preprocess_query osd_failure(failed osd.235 10.100.50.3:6880/1051330 for 1sec e262972 v262972) v3 from osd.69 10.100.50.4:6800/39385 2015-04-07 16:57:15.363304 7f339ca33700 5 mon.pouta-s01@0(leader).osd e262972 can_mark_down NODOWN flag set, will not mark osd.235 down 2015-04-07 16:57:15.363307 7f339ca33700 5 mon.pouta-s01@0(leader).osd e262972 preprocess_failure ignoring report of osd.235 10.100.50.3:6880/1051330 from osd.69 10.100.50.4:6800/39385 2015-04-07 16:57:15.363368 7f339ca33700 10 mon.pouta-s01@0(leader).paxosservice(osdmap 254763..262972) dispatch osd_failure(failed osd.237 10.100.50.3:6895/1053322 for 1sec e262972 v262972) v3 from osd.69 10.100.50.4:6800/39385 2015-04-07 16:57:15.363379 7f339ca33700 5 mon.pouta-s01@0(leader).paxos(paxos updating c 1477137..1477789) is_readable now=2015-04-07 16:57:15.363380 lease_expire=2015-04-07 16:57:34.723796 has v0 lc 1477789 2015-04-07 16:57:15.363389 7f339ca33700 10 mon.pouta-s01@0(leader).osd e262972 preprocess_query osd_failure(failed osd.237 10.100.50.3:6895/1053322 for 1sec e262972 v262972) v3 from osd.69 10.100.50.4:6800/39385 2015-04-07 16:57:15.363426 7f339ca33700 5 mon.pouta-s01@0(leader).osd e262972 can_mark_down NODOWN flag set, will not mark osd.237 down 2015-04-07 16:57:15.363429 7f339ca33700 5 mon.pouta-s01@0(leader).osd e262972 preprocess_failure ignoring report of osd.237 10.100.50.3:6895/1053322 from osd.69 10.100.50.4:6800/39385 2015-04-07 16:57:15.363441 7f339ca33700 10 mon.pouta-s01@0(leader).paxosservice(pgmap 598691..599303) dispatch pg_stats(0 pgs tid 2037 v 0) v1 from osd.69 10.100.50.4:6800/39385 2015-04-07 16:57:15.363449 7f339ca33700 5 mon.pouta-s01@0(leader).paxos(paxos updating c 1477137..1477789) is_readable now=2015-04-07 16:57:15.363450 lease_expire=2015-04-07 16:57:34.723796 has v0 lc 1477789 2015-04-07 16:57:15.363459 7f339ca33700 10 mon.pouta-s01@0(leader).pg v599303 preprocess_query pg_stats(0 pgs tid 2037 v 0) v1 from osd.69 10.100.50.4:6800/39385 2015-04-07 16:57:15.363468 7f339ca33700 5 mon.pouta-s01@0(leader).paxos(paxos updating c 1477137..1477789) is_readable now=2015-04-07 16:57:15.363469 lease_expire=2015-04-07 16:57:34.723796 has v0 lc 1477789 2015-04-07 16:57:15.363477 7f339ca33700 10 mon.pouta-s01@0(leader).pg v599303 prepare_update pg_stats(0 pgs tid 2037 v 0) v1 from osd.69 10.100.50.4:6800/39385 2015-04-07 16:57:15.363484 7f339ca33700 10 mon.pouta-s01@0(leader).pg v599303 prepare_pg_stats pg_stats(0 pgs tid 2037 v 0) v1 from osd.69 2015-04-07 16:57:15.363499 7f339ca33700 10 mon.pouta-s01@0(leader).pg v599303 message contains no new osd|pg stats 2015-04-07 16:57:15.363522 7f339ca33700 10 mon.pouta-s01@0(leader) e3 handle_subscribe mon_subscribe({monmap=4+,osd_pg_creates=0}) v2 2015-04-07 16:57:15.363532 7f339ca33700 10 mon.pouta-s01@0(leader) e3 check_sub monmap next 4 have 3 2015-04-07 16:57:15.380448 7f339e495700 1 mon.pouta-s01@0(leader).paxos(paxos updating c 1477137..1477789) accept timeout, calling fresh election 2015-04-07 16:57:15.380483 7f339e495700 10 mon.pouta-s01@0(leader) e3 bootstrap 2015-04-07 16:57:15.380488 7f339e495700 10 mon.pouta-s01@0(leader) e3 sync_reset_requester 2015-04-07 16:57:15.380491 7f339e495700 10 mon.pouta-s01@0(leader) e3 unregister_cluster_logger 2015-04-07 16:57:15.380499 7f339e495700 10 mon.pouta-s01@0(leader) e3 cancel_probe_timeout (none scheduled) 2015-04-07 16:57:15.380503 7f339e495700 10 mon.pouta-s01@0(probing) e3 _reset 2015-04-07 16:57:15.380506 7f339e495700 10 mon.pouta-s01@0(probing) e3 cancel_probe_timeout (none scheduled) 2015-04-07 16:57:15.380508 7f339e495700 10 mon.pouta-s01@0(probing) e3 timecheck_finish 2015-04-07 16:57:15.380518 7f339e495700 10 mon.pouta-s01@0(probing) e3 scrub_reset 2015-04-07 16:57:15.380521 7f339e495700 10 mon.pouta-s01@0(probing).paxos(paxos updating c 1477137..1477789) restart -- canceling timeouts 2015-04-07 16:57:15.380566 7f339e495700 10 mon.pouta-s01@0(probing).paxosservice(pgmap 598691..599303) restart 2015-04-07 16:57:15.380589 7f339e495700 10 mon.pouta-s01@0(probing).paxosservice(mdsmap 1..1) restart 2015-04-07 16:57:15.380593 7f339e495700 10 mon.pouta-s01@0(probing).paxosservice(osdmap 254763..262972) restart 2015-04-07 16:57:15.380595 7f339e495700 10 mon.pouta-s01@0(probing).paxosservice(logm 599085..599743) restart 2015-04-07 16:57:15.380601 7f339e495700 7 mon.pouta-s01@0(probing).log v599743 _updated_log for mon.0 10.100.50.1:6789/0 2015-04-07 16:57:15.380637 7f339e495700 7 mon.pouta-s01@0(probing).log v599743 _updated_log for mon.0 10.100.50.1:6789/0 2015-04-07 16:57:15.380651 7f339e495700 7 mon.pouta-s01@0(probing).log v599743 _updated_log for mon.0 10.100.50.1:6789/0 2015-04-07 16:57:15.380662 7f339e495700 7 mon.pouta-s01@0(probing).log v599743 _updated_log for mon.0 10.100.50.1:6789/0 2015-04-07 16:57:15.380680 7f339e495700 7 mon.pouta-s01@0(probing).log v599743 _updated_log for mon.0 10.100.50.1:6789/0 2015-04-07 16:57:15.380713 7f339e495700 7 mon.pouta-s01@0(probing).log v599743 _updated_log for mon.0 10.100.50.1:6789/0 2015-04-07 16:57:15.380724 7f339e495700 7 mon.pouta-s01@0(probing).log v599743 _updated_log for mon.0 10.100.50.1:6789/0 2015-04-07 16:57:15.380733 7f339e495700 7 mon.pouta-s01@0(probing).log v599743 _updated_log for mon.0 10.100.50.1:6789/0 2015-04-07 16:57:15.380744 7f339e495700 7 mon.pouta-s01@0(probing).log v599743 _updated_log for mon.2 10.100.50.3:6789/0 2015-04-07 16:57:15.380773 7f339e495700 7 mon.pouta-s01@0(probing).log v599743 _updated_log for mon.0 10.100.50.1:6789/0 2015-04-07 16:57:15.380785 7f339e495700 7 mon.pouta-s01@0(probing).log v599743 _updated_log for mon.0 10.100.50.1:6789/0 2015-04-07 16:57:15.380796 7f339e495700 10 mon.pouta-s01@0(probing).paxosservice(monmap 1..3) restart 2015-04-07 16:57:15.380800 7f339e495700 10 mon.pouta-s01@0(probing).paxosservice(auth 4501..4587) restart 2015-04-07 16:57:15.380807 7f339e495700 10 mon.pouta-s01@0(probing) e3 cancel_probe_timeout (none scheduled) 2015-04-07 16:57:15.380812 7f339e495700 10 mon.pouta-s01@0(probing) e3 reset_probe_timeout 0x367d380 after 2 seconds 2015-04-07 16:57:15.380827 7f339e495700 10 mon.pouta-s01@0(probing) e3 probing other monitors 2015-04-07 16:57:15.381949 7f339ca33700 10 mon.pouta-s01@0(probing) e3 handle_probe mon_probe(reply 2bd3283d-67ef-4316-8b7e-d8f4747eae33 name pouta-s03 quorum 0,1,2 paxos( fc 1477137 lc 1477789 )) v6 2015-04-07 16:57:15.381985 7f339ca33700 10 mon.pouta-s01@0(probing) e3 handle_probe_reply mon.2 10.100.50.3:6789/0mon_probe(reply 2bd3283d-67ef-4316-8b7e-d8f4747eae33 name pouta-s03 quorum 0,1,2 paxos( fc 1477137 lc 1477789 )) v6 2015-04-07 16:57:15.382003 7f339ca33700 10 mon.pouta-s01@0(probing) e3 monmap is e3: 3 mons at {pouta-s01=10.100.50.1:6789/0,pouta-s02=10.100.50.2:6789/0,pouta-s03=10.100.50.3:6789/0} 2015-04-07 16:57:15.382089 7f339ca33700 10 mon.pouta-s01@0(probing) e3 peer name is pouta-s03 2015-04-07 16:57:15.382094 7f339ca33700 10 mon.pouta-s01@0(probing) e3 existing quorum 0,1,2 2015-04-07 16:57:15.382097 7f339ca33700 10 mon.pouta-s01@0(probing) e3 peer paxos version 1477789 vs my version 1477789 (ok) 2015-04-07 16:57:15.382103 7f339ca33700 10 mon.pouta-s01@0(probing) e3 start_election 2015-04-07 16:57:15.382106 7f339ca33700 10 mon.pouta-s01@0(electing) e3 _reset 2015-04-07 16:57:15.382109 7f339ca33700 10 mon.pouta-s01@0(electing) e3 cancel_probe_timeout 0x367d380 2015-04-07 16:57:15.382115 7f339ca33700 10 mon.pouta-s01@0(electing) e3 timecheck_finish 2015-04-07 16:57:15.382120 7f339ca33700 10 mon.pouta-s01@0(electing) e3 scrub_reset 2015-04-07 16:57:15.382123 7f339ca33700 10 mon.pouta-s01@0(electing).paxos(paxos recovering c 1477137..1477789) restart -- canceling timeouts 2015-04-07 16:57:15.382130 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(pgmap 598691..599303) restart 2015-04-07 16:57:15.382135 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(mdsmap 1..1) restart 2015-04-07 16:57:15.382139 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(osdmap 254763..262972) restart 2015-04-07 16:57:15.382149 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(logm 599085..599743) restart 2015-04-07 16:57:15.382152 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(monmap 1..3) restart 2015-04-07 16:57:15.382154 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(auth 4501..4587) restart 2015-04-07 16:57:15.382160 7f339ca33700 10 mon.pouta-s01@0(electing) e3 cancel_probe_timeout (none scheduled) 2015-04-07 16:57:15.382171 7f339ca33700 0 log [INF] : mon.pouta-s01 calling new monitor election 2015-04-07 16:57:15.382185 7f339ca33700 5 mon.pouta-s01@0(electing).elector(4146) start -- can i be leader? 2015-04-07 16:57:15.382363 7f339ca33700 1 mon.pouta-s01@0(electing).elector(4146) init, last seen epoch 4146 2015-04-07 16:57:15.382371 7f339ca33700 10 mon.pouta-s01@0(electing).elector(4146) bump_epoch 4146 to 4147 2015-04-07 16:57:15.384198 7f339ca33700 10 mon.pouta-s01@0(electing) e3 join_election 2015-04-07 16:57:15.384217 7f339ca33700 10 mon.pouta-s01@0(electing) e3 _reset 2015-04-07 16:57:15.384221 7f339ca33700 10 mon.pouta-s01@0(electing) e3 cancel_probe_timeout (none scheduled) 2015-04-07 16:57:15.384224 7f339ca33700 10 mon.pouta-s01@0(electing) e3 timecheck_finish 2015-04-07 16:57:15.384226 7f339ca33700 10 mon.pouta-s01@0(electing) e3 scrub_reset 2015-04-07 16:57:15.384229 7f339ca33700 10 mon.pouta-s01@0(electing).paxos(paxos recovering c 1477137..1477789) restart -- canceling timeouts 2015-04-07 16:57:15.384237 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(pgmap 598691..599303) restart 2015-04-07 16:57:15.384241 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(mdsmap 1..1) restart 2015-04-07 16:57:15.384244 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(osdmap 254763..262972) restart 2015-04-07 16:57:15.384246 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(logm 599085..599743) restart 2015-04-07 16:57:15.384248 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(monmap 1..3) restart 2015-04-07 16:57:15.384250 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(auth 4501..4587) restart 2015-04-07 16:57:15.384316 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(logm 599085..599743) dispatch log(1 entries) v1 from mon.0 10.100.50.1:6789/0
when i keep cluster untouched for few hours, monitor responds almost quickly ( ceph -s output under 1 sec ) and then as soon as i start ODS it starts to respond dead slow and again goes to probing --> electing --> leader.
Updated by Sage Weil about 9 years ago
- Status changed from New to Can't reproduce
2015-04-07 16:57:15.380448 7f339e495700 1 mon.pouta-s01@0(leader).paxos(paxos updating c 1477137..1477789) accept timeout, calling fresh election
is the key line. one of the peons is taking too long to commit things. chech 'perf top' on the slow peon and see why it isn't responding quickly enough. in the past i've seen this trigger because of snappy in leveldb...