Bug #11313: Monitors not forming quorum : flapping : Cluster unreachable - Ceph - Ceph

Actions

Copy link

Bug #11313

closed

Monitors not forming quorum : flapping : Cluster unreachable

Added by karan singh about 9 years ago. Updated about 9 years ago.

Status:

Can't reproduce

Priority:

Urgent

Assignee:

Category:

Monitor

Target version:

% Done:

Source:

Q/A

Tags:

monitor down, cluster unreachable

Backport:

Regression:

Severity:

1 - critical

Reviewed:

Affected Versions:

0.80

ceph-qa-suite:

rados

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Today i encountered a weird problem with my ceph cluster , which was working before.

I was troubleshooting my PG which are incomplete. So i tried marking osd as lost.

As soon as i marked osd as lost # ceph osd lost 115 --yes-i-really-mean-it cluster became unreachable. All the monitors lost quorum and i was not able to connect to the cluster back.

It tried restarting services several times , but no luck.

Below are the logs that shows that the problem came exactly after # ceph osd lost 115

2015-04-02 13:44:47.280207 7f376cce1700  0 mon.pouta-s01@0(leader) e3 handle_command mon_command({"prefix": "status"} v 0) v1
2015-04-02 13:45:09.118663 7f376d6e2700  0 mon.pouta-s01@0(leader).data_health(1920) update_stats avail 80% total 82932 MB, used 12329 MB, a
vail 66390 MB
2015-04-02 13:46:12.709808 7f376d6e2700  0 mon.pouta-s01@0(leader).data_health(1920) update_stats avail 80% total 82932 MB, used 12328 MB, a
vail 66391 MB
2015-04-02 13:47:14.233436 7f376d6e2700  0 mon.pouta-s01@0(leader).data_health(1920) update_stats avail 80% total 82932 MB, used 12327 MB, a
vail 66392 MB
2015-04-02 13:48:18.824272 7f376d6e2700  0 mon.pouta-s01@0(leader).data_health(1920) update_stats avail 80% total 82932 MB, used 12326 MB, a
vail 66392 MB
2015-04-02 13:48:19.059478 7f376cce1700  0 mon.pouta-s01@0(leader) e3 handle_command mon_command({"prefix": "osd lost", "sure": "--yes-i-really-mean-it", "id": 115} v 0) v1
2015-04-02 13:48:38.403011 7f376cce1700  0 log [INF] : mon.pouta-s01 calling new monitor election
2015-04-02 13:48:38.406063 7f376cce1700  0 log [INF] : mon.pouta-s01@0 won leader election with quorum 0,1,2
2015-04-02 13:48:49.455605 7f376cce1700  0 log [INF] : monmap e3: 3 mons at {pouta-s01=10.100.50.1:6789/0,pouta-s02=10.100.50.2:6789/0,pouta-s03=10.100.50.3:6789/0}
2015-04-02 13:48:49.456310 7f376cce1700  0 log [INF] : pgmap v598458: 18432 pgs: 7 down+incomplete, 18412 active+clean, 13 incomplete; 2338 GB data, 19068 GB used, 853 TB / 871 TB avail; 133/328874 unfound (0.040%)
2015-04-02 13:48:49.456409 7f376cce1700  0 log [INF] : mdsmap e1: 0/0/1 up
2015-04-02 13:48:49.457323 7f376cce1700  0 log [INF] : osdmap e262821: 240 osds: 240 up, 240 in
2015-04-02 13:48:49.665436 7f376cce1700  0 log [INF] : mon.pouta-s01 calling new monitor election
2015-04-02 13:48:49.668347 7f376cce1700  0 log [INF] : mon.pouta-s01@0 won leader election with quorum 0,1,2
2015-04-02 13:49:00.525321 7f376cce1700  0 log [INF] : monmap e3: 3 mons at {pouta-s01=10.100.50.1:6789/0,pouta-s02=10.100.50.2:6789/0,pouta-s03=10.100.50.3:6789/0}
2015-04-02 13:49:00.532675 7f376cce1700  0 log [INF] : pgmap v598458: 18432 pgs: 7 down+incomplete, 18412 active+clean, 13 incomplete; 2338 GB data, 19068 GB used, 853 TB / 871 TB avail; 133/328874 unfound (0.040%)
2015-04-02 13:49:00.532980 7f376cce1700  0 log [INF] : mdsmap e1: 0/0/1 up
2015-04-02 13:49:00.538582 7f376cce1700  0 log [INF] : osdmap e262822: 240 osds: 240 up, 240 in
2015-04-02 13:49:00.782187 7f376cce1700  0 log [INF] : mon.pouta-s01 calling new monitor election
2015-04-02 13:49:01.002140 7f376cce1700  0 log [INF] : mon.pouta-s01@0 won leader election with quorum 0,1,2

More logs : http://paste.ubuntu.com/10724004/

You can observer monitor entering into quorum and then loosing the quorum : http://paste.ubuntu.com/10723915/

Cluster unreachable : debug ms 1 : http://paste.ubuntu.com/10723862/

Monitor logs : debug mon 10 : http://paste.ubuntu.com/10723840/

I tried upgrading to 0.80.9 as well but no luck.

This is production cluster and need to fix really soon ( before easter holidays ) any help from the development team would be really helpful.

Actions

Copy link

Updated by Joao Eduardo Luis about 9 years ago

Looks like the leader keeps on calling elections after not getting acks for paxos leases.

This may be a case of overloaded monitors, clock skews or something totally different.

Logs with 'debug paxos = 10' would help.

Actions

Copy link

Updated by Joao Eduardo Luis about 9 years ago

Also, logs from the other monitors would definitely help too.

Actions

Copy link

Updated by karan singh about 9 years ago

1# Monitors are not overloaded

2# NTP is setup on all monitors and to be on safe side mon clock drift allowed = 10 has been set just for troubleshooting.

Logs with debug paxos = 10

Monitor 1 : http://paste.ubuntu.com/10726112/
Monitor 2 : http://paste.ubuntu.com/10726096/
Monitor 3 : http://paste.ubuntu.com/10726070/

I have also tried restarting all the monitor machines and checked quorum_status, its still forming quorum for a moment and then going out of quorum.

[root@XXXXX-s02 ceph]#  ceph daemon mon.XXXXX-s02 mon_status
{ "name": "XXXXX-s02",
  "rank": 1,
  "state": "electing",
  "election_epoch": 3227,
  "quorum": [],
  "outside_quorum": [],
  "extra_probe_peers": [],
  "sync_provider": [],
  "monmap": { "epoch": 3,
      "fsid": "2bd3283d-67ef-4316-8b7e-d8f4747eae33",
      "modified": "2014-03-09 05:34:22.667665",
      "created": "2014-03-09 05:33:48.846437",
      "mons": [
            { "rank": 0,
              "name": "XXXXX-s01",
              "addr": "10.100.50.1:6789\/0"},
            { "rank": 1,
              "name": "XXXXX-s02",
              "addr": "10.100.50.2:6789\/0"},
            { "rank": 2,
              "name": "XXXXX-s03",
              "addr": "10.100.50.3:6789\/0"}]}}

[root@XXXXX-s02 ceph]#
[root@XXXXX-s02 ceph]#
[root@XXXXX-s02 ceph]#  ceph daemon mon.XXXXX-s02 mon_status
{ "name": "XXXXX-s02",
  "rank": 1,
  "state": "peon",
  "election_epoch": 3232,
  "quorum": [
        0,
        1,
        2],
  "outside_quorum": [],
  "extra_probe_peers": [],
  "sync_provider": [],
  "monmap": { "epoch": 3,
      "fsid": "2bd3283d-67ef-4316-8b7e-d8f4747eae33",
      "modified": "2014-03-09 05:34:22.667665",
      "created": "2014-03-09 05:33:48.846437",
      "mons": [
            { "rank": 0,
              "name": "XXXXX-s01",
              "addr": "10.100.50.1:6789\/0"},
            { "rank": 1,
              "name": "XXXXX-s02",
              "addr": "10.100.50.2:6789\/0"},
            { "rank": 2,
              "name": "XXXXX-s03",
              "addr": "10.100.50.3:6789\/0"}]}}

[root@XXXXX-s02 ceph]#
[root@XXXXX-s02 ceph]#
[root@XXXXX-s02 ceph]#  ceph daemon mon.XXXXX-s02 mon_status
{ "name": "XXXXX-s02",
  "rank": 1,
  "state": "electing",
  "election_epoch": 3233,
  "quorum": [],
  "outside_quorum": [],
  "extra_probe_peers": [],
  "sync_provider": [],
  "monmap": { "epoch": 3,
      "fsid": "2bd3283d-67ef-4316-8b7e-d8f4747eae33",
      "modified": "2014-03-09 05:34:22.667665",
      "created": "2014-03-09 05:33:48.846437",
      "mons": [
            { "rank": 0,
              "name": "XXXXX-s01",
              "addr": "10.100.50.1:6789\/0"},
            { "rank": 1,
              "name": "XXXXX-s02",
              "addr": "10.100.50.2:6789\/0"},
            { "rank": 2,
              "name": "XXXXX-s03",
              "addr": "10.100.50.3:6789\/0"}]}}

[root@XXXXX-s02 ceph]#

[root@XXXXX-s01 ceph]# ceph daemon mon.XXXXX-s01 mon_status
{ "name": "XXXXX-s01",
  "rank": 0,
  "state": "leader",
  "election_epoch": 3252,
  "quorum": [
        0,
        1,
        2],
  "outside_quorum": [],
  "extra_probe_peers": [],
  "sync_provider": [],
  "monmap": { "epoch": 3,
      "fsid": "2bd3283d-67ef-4316-8b7e-d8f4747eae33",
      "modified": "2014-03-09 05:34:22.667665",
      "created": "2014-03-09 05:33:48.846437",
      "mons": [
            { "rank": 0,
              "name": "XXXXX-s01",
              "addr": "10.100.50.1:6789\/0"},
            { "rank": 1,
              "name": "XXXXX-s02",
              "addr": "10.100.50.2:6789\/0"},
            { "rank": 2,
              "name": "XXXXX-s03",
              "addr": "10.100.50.3:6789\/0"}]}}

[root@XXXXX-s01 ceph]# ceph daemon mon.XXXXX-s01 mon_status
{ "name": "XXXXX-s01",
  "rank": 0,
  "state": "electing",
  "election_epoch": 3255,
  "quorum": [],
  "outside_quorum": [],
  "extra_probe_peers": [],
  "sync_provider": [],
  "monmap": { "epoch": 3,
      "fsid": "2bd3283d-67ef-4316-8b7e-d8f4747eae33",
      "modified": "2014-03-09 05:34:22.667665",
      "created": "2014-03-09 05:33:48.846437",
      "mons": [
            { "rank": 0,
              "name": "XXXXX-s01",
              "addr": "10.100.50.1:6789\/0"},
            { "rank": 1,
              "name": "XXXXX-s02",
              "addr": "10.100.50.2:6789\/0"},
            { "rank": 2,
              "name": "XXXXX-s03",
              "addr": "10.100.50.3:6789\/0"}]}}

[root@XXXXX-s01 ceph]# ceph daemon mon.XXXXX-s01 mon_status
{ "name": "XXXXX-s01",
  "rank": 0,
  "state": "probing",
  "election_epoch": 3258,
  "quorum": [],
  "outside_quorum": [
        "XXXXX-s01"],
  "extra_probe_peers": [],
  "sync_provider": [],
  "monmap": { "epoch": 3,
      "fsid": "2bd3283d-67ef-4316-8b7e-d8f4747eae33",
      "modified": "2014-03-09 05:34:22.667665",
      "created": "2014-03-09 05:33:48.846437",
      "mons": [
            { "rank": 0,
              "name": "XXXXX-s01",
              "addr": "10.100.50.1:6789\/0"},
            { "rank": 1,
              "name": "XXXXX-s02",
              "addr": "10.100.50.2:6789\/0"},
            { "rank": 2,
              "name": "XXXXX-s03",
              "addr": "10.100.50.3:6789\/0"}]}}

[root@XXXXX-s01 ceph]#

Actions

Copy link

Updated by karan singh about 9 years ago

It would be great if you can extend your help over IRC ( #CEPH ) [ksingh]

Actions

Copy link

Updated by Joao Eduardo Luis about 9 years ago

I can't really troubleshoot this on IRC now, but you may consider that increasing the allowed clock drift value will do nothing but mask any issues you may have. That option allows the monitor to report warning IF there is a clock drift that is considered harmful - increasing it does not make the monitor any more tolerant to clock drifts.

In any case, as I suspected, the culprit seems to be the paxos lease_timeout being triggered:

2015-04-02 22:51:24.331736 7f747f4df700  1 mon.pouta-s03@2(peon).paxos(paxos active c 1474627..1475338) lease_timeout -- calling new election

You probably want to try increasing 'mon_lease' and 'mon_lease_ack_timeout' instead of 'mon_clock_drift_allowed'. :)

Actions

Copy link

Updated by karan singh about 9 years ago

Thanks Joao

After adding

mon lease = 20
mon lease ack timeout = 20

At least monitors are forming quorum and seems working. However the cluster seems to respond slow on ceph commands.

Ceph commands like ceph -s is taking 40-50 seconds to complete .

Also some of the OSD are down and when i am trying to bring them up they throws timeout errors as cluster is responding slow.

[root@pouta-s04 ~]# service ceph start
=== osd.95 ===
failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.95 --keyring=/var/lib/ceph/osd/ceph-95/keyring osd crush create-or-move -- 95 3.63 host=pouta-s04 root=default'
[root@pouta-s04 ~]#

[root@pouta-s04 ~]# time ceph -s
    cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33
     health HEALTH_WARN 503 pgs degraded; 7 pgs down; 19 pgs incomplete; 3013 pgs peering; 5754 pgs stale; 3032 pgs stuck inactive; 3213 pgs stuck stale; 3535 pgs stuck unclean; 8 requests are blocked > 32 sec; recovery 11577/986841 objects degraded (1.173%); 339/328947 unfound (0.103%); 120/240 in osds are down; clock skew detected on mon.pouta-s02, mon.pouta-s03
     monmap e3: 3 mons at {pouta-s01=10.100.50.1:6789/0,pouta-s02=10.100.50.2:6789/0,pouta-s03=10.100.50.3:6789/0}, election epoch 3938, quorum 0,1,2 pouta-s01,pouta-s02,pouta-s03
     osdmap e262854: 240 osds: 120 up, 240 in
      pgmap v598497: 18432 pgs, 13 pools, 2339 GB data, 321 kobjects
            19054 GB used, 853 TB / 871 TB avail
            11577/986841 objects degraded (1.173%); 339/328947 unfound (0.103%)
                  42 stale+peering
                5708 stale+active+clean
                 503 active+degraded
                   3 down+incomplete
                2970 peering
                   1 stale+incomplete
                9189 active+clean
                   1 down+peering
                  12 incomplete
                   3 stale+down+incomplete

real    0m50.211s
user    0m0.293s
sys    0m0.056s
[root@pouta-s04 ~]#

Is there any thing that i can set in ceph.conf to make cluster respond quickly as it was performing before this problem came. So that down OSD can be started.

Since you are the expert , could you please tell what made monitors to throw "paxos(paxos active c 1474627..1475338) lease_timeout -- calling new election"
messages , since i have not changed anything on monitors and this problem came after making an OSD lost.

The default values of lease and lease_ack_timeout should work as it was working before this problem came.

Thanks a ton for your help.

Actions

Copy link

Updated by karan singh about 9 years ago

sorry just check now and still the quorum lost problem is there. Here are the logs after adding

mon lease = 30
mon lease ack timeout = 20

Monitor1 : http://paste.ubuntu.com/10727076/
Monitor2 : http://paste.ubuntu.com/10727084/

Takes 1M 35Sec to get ceph command responses.

[root@pouta-s04 ~]# time ceph -s
    cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33
     health HEALTH_WARN 2480 pgs degraded; 8 pgs down; 18 pgs incomplete; 2728 pgs peering; 6 pgs recovering; 1 pgs recovery_wait; 8909 pgs stale; 2754 pgs stuck inactive; 8909 pgs stuck stale; 5367 pgs stuck unclean; 9 requests are blocked > 32 sec; recovery 78400/963132 objects degraded (8.140%); 300/321044 unfound (0.093%); 64/152 in osds are down; nodown,noout,norecover flag(s) set; clock skew detected on mon.pouta-s02
     monmap e3: 3 mons at {pouta-s01=10.100.50.1:6789/0,pouta-s02=10.100.50.2:6789/0,pouta-s03=10.100.50.3:6789/0}, election epoch 4052, quorum 0,1,2 pouta-s01,pouta-s02,pouta-s03
     osdmap e262887: 240 osds: 88 up, 152 in
            flags nodown,noout,norecover
      pgmap v598538: 18432 pgs, 13 pools, 2282 GB data, 313 kobjects
            12405 GB used, 539 TB / 552 TB avail
            78400/963132 objects degraded (8.140%); 300/321044 unfound (0.093%)
                   3 active+recovering+remapped
                   8 inactive
                8878 stale+active+clean
                   2 down+incomplete
                2650 peering
                   7 active
                 154 active+degraded+remapped
                  12 stale+incomplete
                4187 active+clean
                2322 active+degraded
                   1 active+recovery_wait+degraded
                  61 remapped+peering
                 123 active+remapped
                   2 down+peering
                   4 stale+down+incomplete
                  15 stale+peering
                   3 active+recovering+degraded

real    1m35.404s
user    0m0.344s
sys    0m0.083s
[root@pouta-s04 ~]#

Actions

Copy link

Updated by Sage Weil about 9 years ago

Priority changed from Immediate to Urgent

- you'll need to clear nodown to tell if pgs aren't peering because of osds that are not running
- 'perf top' on a mon. if you see lots of time spent in Snappy decompression, disable leveldb compression and force compaction of the mon store (ceph-mon --compact ...).
- otherwise, need mon logs to see why the mons are slow to respond.
- ceph pg $pgid query on the down or incomplete pgs to see why. usually ceph osd pool set $pool min_size 1 is all that is needed to let it recover (be sure ot set it back to 2 when it's healthy again).

Actions

Copy link

Updated by karan singh about 9 years ago

Hi Sage

PG incomplete is the original issue and i was troubleshooting it. Intentionally i marked an OSD as lost , then suddenly BOOM all monitors out of quorum.
Now first i need to fix monitor quorum / slow response problem. Then concentrate on PG incomplete issue.

Monitors are taking hell lot of time 110 seconds to respond to ceph -s.

[root@pouta-s04 ~]# time ceph -s
    cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33
     health HEALTH_WARN 3233 pgs degraded; 12 pgs down; 10 pgs incomplete; 5334 pgs peering; 73 pgs recovering; 574 pgs recovery_wait; 4518 pgs stale; 6055 pgs stuck inactive; 4518 pgs stuck stale; 10070 pgs stuck unclean; 12 requests are blocked > 32 sec; recovery 152451/930087 objects degraded (16.391%); 337/310029 unfound (0.109%); 47/177 in osds are down; nodown,noout,norecover flag(s) set; 1 mons down, quorum 0,2 pouta-s01,pouta-s03
     monmap e3: 3 mons at {pouta-s01=10.100.50.1:6789/0,pouta-s02=10.100.50.2:6789/0,pouta-s03=10.100.50.3:6789/0}, election epoch 4164, quorum 0,2 pouta-s01,pouta-s03
     osdmap e262984: 240 osds: 130 up, 177 in
            flags nodown,noout,norecover
      pgmap v599316: 19456 pgs, 14 pools, 2202 GB data, 302 kobjects
            14558 GB used, 628 TB / 642 TB avail
            152451/930087 objects degraded (16.391%); 337/310029 unfound (0.109%)
                 108 inactive
                 535 creating
                  20 active
                 162 active+recovery_wait+remapped
                4878 active+clean
                   3 stale+incomplete
                5134 peering
                   6 active+recovering+remapped
                 136 active+recovery_wait+degraded
                 184 active+degraded+remapped
                   2 incomplete
                  45 active+recovering+degraded
                   2 stale+down+incomplete
                   7 down+peering
                  68 remapped
                 202 active+recovery_wait
                 379 active+remapped
                  13 active+recovering
                 188 remapped+peering
                   3 down+incomplete
                2785 active+degraded
                  74 active+recovery_wait+degraded+remapped
                4508 stale+active+clean
                   5 stale+peering
                   9 active+recovering+degraded+remapped

real    1m46.120s
user    0m0.388s
sys    0m0.102s
[root@pouta-s04 ~]#

Yesterday i added the below entries in ceph.conf and monitor started to for quorum but keep on responding very slow.

mon lease = 50
mon lease renew interval = 30
mon lease ack timeout = 100

Below logs ( debug mon / paxos 10 ) shows that monitor attains leader --> probing --> electing stages and after some repeats the same.

Full logs here : https://www.dropbox.com/s/fjtun72d66n46gn/ceph-mon.pouta-s01.log?dl=0

Logs extracted below.

2015-04-07 16:57:15.363270 7f339ca33700 10 mon.pouta-s01@0(leader).paxosservice(osdmap 254763..262972) dispatch osd_failure(failed osd.235 10.100.50.3:6880/1051330 for 1sec e262972 v262972) v3 from osd.69 10.100.50.4:6800/39385
2015-04-07 16:57:15.363282 7f339ca33700  5 mon.pouta-s01@0(leader).paxos(paxos updating c 1477137..1477789) is_readable now=2015-04-07 16:57:15.363283 lease_expire=2015-04-07 16:57:34.723796 has v0 lc 1477789
2015-04-07 16:57:15.363292 7f339ca33700 10 mon.pouta-s01@0(leader).osd e262972 preprocess_query osd_failure(failed osd.235 10.100.50.3:6880/1051330 for 1sec e262972 v262972) v3 from osd.69 10.100.50.4:6800/39385
2015-04-07 16:57:15.363304 7f339ca33700  5 mon.pouta-s01@0(leader).osd e262972 can_mark_down NODOWN flag set, will not mark osd.235 down
2015-04-07 16:57:15.363307 7f339ca33700  5 mon.pouta-s01@0(leader).osd e262972 preprocess_failure ignoring report of osd.235 10.100.50.3:6880/1051330 from osd.69 10.100.50.4:6800/39385
2015-04-07 16:57:15.363368 7f339ca33700 10 mon.pouta-s01@0(leader).paxosservice(osdmap 254763..262972) dispatch osd_failure(failed osd.237 10.100.50.3:6895/1053322 for 1sec e262972 v262972) v3 from osd.69 10.100.50.4:6800/39385
2015-04-07 16:57:15.363379 7f339ca33700  5 mon.pouta-s01@0(leader).paxos(paxos updating c 1477137..1477789) is_readable now=2015-04-07 16:57:15.363380 lease_expire=2015-04-07 16:57:34.723796 has v0 lc 1477789
2015-04-07 16:57:15.363389 7f339ca33700 10 mon.pouta-s01@0(leader).osd e262972 preprocess_query osd_failure(failed osd.237 10.100.50.3:6895/1053322 for 1sec e262972 v262972) v3 from osd.69 10.100.50.4:6800/39385
2015-04-07 16:57:15.363426 7f339ca33700  5 mon.pouta-s01@0(leader).osd e262972 can_mark_down NODOWN flag set, will not mark osd.237 down
2015-04-07 16:57:15.363429 7f339ca33700  5 mon.pouta-s01@0(leader).osd e262972 preprocess_failure ignoring report of osd.237 10.100.50.3:6895/1053322 from osd.69 10.100.50.4:6800/39385
2015-04-07 16:57:15.363441 7f339ca33700 10 mon.pouta-s01@0(leader).paxosservice(pgmap 598691..599303) dispatch pg_stats(0 pgs tid 2037 v 0) v1 from osd.69 10.100.50.4:6800/39385
2015-04-07 16:57:15.363449 7f339ca33700  5 mon.pouta-s01@0(leader).paxos(paxos updating c 1477137..1477789) is_readable now=2015-04-07 16:57:15.363450 lease_expire=2015-04-07 16:57:34.723796 has v0 lc 1477789
2015-04-07 16:57:15.363459 7f339ca33700 10 mon.pouta-s01@0(leader).pg v599303 preprocess_query pg_stats(0 pgs tid 2037 v 0) v1 from osd.69 10.100.50.4:6800/39385
2015-04-07 16:57:15.363468 7f339ca33700  5 mon.pouta-s01@0(leader).paxos(paxos updating c 1477137..1477789) is_readable now=2015-04-07 16:57:15.363469 lease_expire=2015-04-07 16:57:34.723796 has v0 lc 1477789
2015-04-07 16:57:15.363477 7f339ca33700 10 mon.pouta-s01@0(leader).pg v599303 prepare_update pg_stats(0 pgs tid 2037 v 0) v1 from osd.69 10.100.50.4:6800/39385
2015-04-07 16:57:15.363484 7f339ca33700 10 mon.pouta-s01@0(leader).pg v599303 prepare_pg_stats pg_stats(0 pgs tid 2037 v 0) v1 from osd.69
2015-04-07 16:57:15.363499 7f339ca33700 10 mon.pouta-s01@0(leader).pg v599303  message contains no new osd|pg stats
2015-04-07 16:57:15.363522 7f339ca33700 10 mon.pouta-s01@0(leader) e3 handle_subscribe mon_subscribe({monmap=4+,osd_pg_creates=0}) v2
2015-04-07 16:57:15.363532 7f339ca33700 10 mon.pouta-s01@0(leader) e3 check_sub monmap next 4 have 3
2015-04-07 16:57:15.380448 7f339e495700  1 mon.pouta-s01@0(leader).paxos(paxos updating c 1477137..1477789) accept timeout, calling fresh election
2015-04-07 16:57:15.380483 7f339e495700 10 mon.pouta-s01@0(leader) e3 bootstrap
2015-04-07 16:57:15.380488 7f339e495700 10 mon.pouta-s01@0(leader) e3 sync_reset_requester
2015-04-07 16:57:15.380491 7f339e495700 10 mon.pouta-s01@0(leader) e3 unregister_cluster_logger
2015-04-07 16:57:15.380499 7f339e495700 10 mon.pouta-s01@0(leader) e3 cancel_probe_timeout (none scheduled)
2015-04-07 16:57:15.380503 7f339e495700 10 mon.pouta-s01@0(probing) e3 _reset
2015-04-07 16:57:15.380506 7f339e495700 10 mon.pouta-s01@0(probing) e3 cancel_probe_timeout (none scheduled)
2015-04-07 16:57:15.380508 7f339e495700 10 mon.pouta-s01@0(probing) e3 timecheck_finish
2015-04-07 16:57:15.380518 7f339e495700 10 mon.pouta-s01@0(probing) e3 scrub_reset
2015-04-07 16:57:15.380521 7f339e495700 10 mon.pouta-s01@0(probing).paxos(paxos updating c 1477137..1477789) restart -- canceling timeouts
2015-04-07 16:57:15.380566 7f339e495700 10 mon.pouta-s01@0(probing).paxosservice(pgmap 598691..599303) restart
2015-04-07 16:57:15.380589 7f339e495700 10 mon.pouta-s01@0(probing).paxosservice(mdsmap 1..1) restart
2015-04-07 16:57:15.380593 7f339e495700 10 mon.pouta-s01@0(probing).paxosservice(osdmap 254763..262972) restart
2015-04-07 16:57:15.380595 7f339e495700 10 mon.pouta-s01@0(probing).paxosservice(logm 599085..599743) restart
2015-04-07 16:57:15.380601 7f339e495700  7 mon.pouta-s01@0(probing).log v599743 _updated_log for mon.0 10.100.50.1:6789/0
2015-04-07 16:57:15.380637 7f339e495700  7 mon.pouta-s01@0(probing).log v599743 _updated_log for mon.0 10.100.50.1:6789/0
2015-04-07 16:57:15.380651 7f339e495700  7 mon.pouta-s01@0(probing).log v599743 _updated_log for mon.0 10.100.50.1:6789/0
2015-04-07 16:57:15.380662 7f339e495700  7 mon.pouta-s01@0(probing).log v599743 _updated_log for mon.0 10.100.50.1:6789/0
2015-04-07 16:57:15.380680 7f339e495700  7 mon.pouta-s01@0(probing).log v599743 _updated_log for mon.0 10.100.50.1:6789/0
2015-04-07 16:57:15.380713 7f339e495700  7 mon.pouta-s01@0(probing).log v599743 _updated_log for mon.0 10.100.50.1:6789/0
2015-04-07 16:57:15.380724 7f339e495700  7 mon.pouta-s01@0(probing).log v599743 _updated_log for mon.0 10.100.50.1:6789/0
2015-04-07 16:57:15.380733 7f339e495700  7 mon.pouta-s01@0(probing).log v599743 _updated_log for mon.0 10.100.50.1:6789/0
2015-04-07 16:57:15.380744 7f339e495700  7 mon.pouta-s01@0(probing).log v599743 _updated_log for mon.2 10.100.50.3:6789/0
2015-04-07 16:57:15.380773 7f339e495700  7 mon.pouta-s01@0(probing).log v599743 _updated_log for mon.0 10.100.50.1:6789/0
2015-04-07 16:57:15.380785 7f339e495700  7 mon.pouta-s01@0(probing).log v599743 _updated_log for mon.0 10.100.50.1:6789/0
2015-04-07 16:57:15.380796 7f339e495700 10 mon.pouta-s01@0(probing).paxosservice(monmap 1..3) restart
2015-04-07 16:57:15.380800 7f339e495700 10 mon.pouta-s01@0(probing).paxosservice(auth 4501..4587) restart
2015-04-07 16:57:15.380807 7f339e495700 10 mon.pouta-s01@0(probing) e3 cancel_probe_timeout (none scheduled)
2015-04-07 16:57:15.380812 7f339e495700 10 mon.pouta-s01@0(probing) e3 reset_probe_timeout 0x367d380 after 2 seconds
2015-04-07 16:57:15.380827 7f339e495700 10 mon.pouta-s01@0(probing) e3 probing other monitors
2015-04-07 16:57:15.381949 7f339ca33700 10 mon.pouta-s01@0(probing) e3 handle_probe mon_probe(reply 2bd3283d-67ef-4316-8b7e-d8f4747eae33 name pouta-s03 quorum 0,1,2 paxos( fc 1477137 lc 1477789 )) v6
2015-04-07 16:57:15.381985 7f339ca33700 10 mon.pouta-s01@0(probing) e3 handle_probe_reply mon.2 10.100.50.3:6789/0mon_probe(reply 2bd3283d-67ef-4316-8b7e-d8f4747eae33 name pouta-s03 quorum 0,1,2 paxos( fc 1477137 lc 1477789 )) v6
2015-04-07 16:57:15.382003 7f339ca33700 10 mon.pouta-s01@0(probing) e3  monmap is e3: 3 mons at {pouta-s01=10.100.50.1:6789/0,pouta-s02=10.100.50.2:6789/0,pouta-s03=10.100.50.3:6789/0}
2015-04-07 16:57:15.382089 7f339ca33700 10 mon.pouta-s01@0(probing) e3  peer name is pouta-s03
2015-04-07 16:57:15.382094 7f339ca33700 10 mon.pouta-s01@0(probing) e3  existing quorum 0,1,2
2015-04-07 16:57:15.382097 7f339ca33700 10 mon.pouta-s01@0(probing) e3  peer paxos version 1477789 vs my version 1477789 (ok)
2015-04-07 16:57:15.382103 7f339ca33700 10 mon.pouta-s01@0(probing) e3 start_election
2015-04-07 16:57:15.382106 7f339ca33700 10 mon.pouta-s01@0(electing) e3 _reset
2015-04-07 16:57:15.382109 7f339ca33700 10 mon.pouta-s01@0(electing) e3 cancel_probe_timeout 0x367d380
2015-04-07 16:57:15.382115 7f339ca33700 10 mon.pouta-s01@0(electing) e3 timecheck_finish
2015-04-07 16:57:15.382120 7f339ca33700 10 mon.pouta-s01@0(electing) e3 scrub_reset
2015-04-07 16:57:15.382123 7f339ca33700 10 mon.pouta-s01@0(electing).paxos(paxos recovering c 1477137..1477789) restart -- canceling timeouts
2015-04-07 16:57:15.382130 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(pgmap 598691..599303) restart
2015-04-07 16:57:15.382135 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(mdsmap 1..1) restart
2015-04-07 16:57:15.382139 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(osdmap 254763..262972) restart
2015-04-07 16:57:15.382149 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(logm 599085..599743) restart
2015-04-07 16:57:15.382152 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(monmap 1..3) restart
2015-04-07 16:57:15.382154 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(auth 4501..4587) restart
2015-04-07 16:57:15.382160 7f339ca33700 10 mon.pouta-s01@0(electing) e3 cancel_probe_timeout (none scheduled)
2015-04-07 16:57:15.382171 7f339ca33700  0 log [INF] : mon.pouta-s01 calling new monitor election
2015-04-07 16:57:15.382185 7f339ca33700  5 mon.pouta-s01@0(electing).elector(4146) start -- can i be leader?
2015-04-07 16:57:15.382363 7f339ca33700  1 mon.pouta-s01@0(electing).elector(4146) init, last seen epoch 4146
2015-04-07 16:57:15.382371 7f339ca33700 10 mon.pouta-s01@0(electing).elector(4146) bump_epoch 4146 to 4147
2015-04-07 16:57:15.384198 7f339ca33700 10 mon.pouta-s01@0(electing) e3 join_election
2015-04-07 16:57:15.384217 7f339ca33700 10 mon.pouta-s01@0(electing) e3 _reset
2015-04-07 16:57:15.384221 7f339ca33700 10 mon.pouta-s01@0(electing) e3 cancel_probe_timeout (none scheduled)
2015-04-07 16:57:15.384224 7f339ca33700 10 mon.pouta-s01@0(electing) e3 timecheck_finish
2015-04-07 16:57:15.384226 7f339ca33700 10 mon.pouta-s01@0(electing) e3 scrub_reset
2015-04-07 16:57:15.384229 7f339ca33700 10 mon.pouta-s01@0(electing).paxos(paxos recovering c 1477137..1477789) restart -- canceling timeouts
2015-04-07 16:57:15.384237 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(pgmap 598691..599303) restart
2015-04-07 16:57:15.384241 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(mdsmap 1..1) restart
2015-04-07 16:57:15.384244 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(osdmap 254763..262972) restart
2015-04-07 16:57:15.384246 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(logm 599085..599743) restart
2015-04-07 16:57:15.384248 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(monmap 1..3) restart
2015-04-07 16:57:15.384250 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(auth 4501..4587) restart
2015-04-07 16:57:15.384316 7f339ca33700 10 mon.pouta-s01@0(electing).paxosservice(logm 599085..599743) dispatch log(1 entries) v1 from mon.0 10.100.50.1:6789/0


when i keep cluster untouched for few hours, monitor responds almost quickly ( ceph -s output under 1 sec ) and then as soon as i start ODS it starts to respond dead slow and again goes to probing --> electing --> leader.

Actions

Copy link

#10

Updated by Sage Weil about 9 years ago

Status changed from New to Can't reproduce

2015-04-07 16:57:15.380448 7f339e495700 1 mon.pouta-s01@0(leader).paxos(paxos updating c 1477137..1477789) accept timeout, calling fresh election

is the key line. one of the peons is taking too long to commit things. chech 'perf top' on the slow peon and see why it isn't responding quickly enough. in the past i've seen this trigger because of snappy in leveldb...

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #11313

Monitors not forming quorum : flapping : Cluster unreachable

Updated by Joao Eduardo Luis about 9 years ago

Updated by Joao Eduardo Luis about 9 years ago

Updated by karan singh about 9 years ago

Updated by karan singh about 9 years ago

Updated by Joao Eduardo Luis about 9 years ago

Updated by karan singh about 9 years ago

Updated by karan singh about 9 years ago

Updated by Sage Weil about 9 years ago

Updated by karan singh about 9 years ago

Updated by Sage Weil about 9 years ago