Bug #9757
closedmon: loops on osd pool create
0%
Description
mixed cluster
2014-10-13 06:30:30.367812 7f37a80dc700 10 mon.a@0(leader).paxosservice(osdmap 1..54674) dispatch mon_command({"prefix": "osd pool create", "pg_num": 16, "pool": "unique_pool_0"} v 0) v1 from client.4206 10.214.132.7:0/1021019 2014-10-13 06:30:30.367819 7f37a80dc700 5 mon.a@0(leader).paxos(paxos active c 162398..162973) is_readable now=2014-10-13 06:30:30.367820 lease_expire=2014-10-13 06:30:35.367759 has v0 lc 162973 2014-10-13 06:30:30.367829 7f37a80dc700 10 mon.a@0(leader).osd e54674 preprocess_query mon_command({"prefix": "osd pool create", "pg_num": 16, "pool": "unique_pool_0"} v 0) v1 from client.4206 10.214.132.7:0/1021019 2014-10-13 06:30:30.367886 7f37a80dc700 7 mon.a@0(leader).osd e54674 prepare_update mon_command({"prefix": "osd pool create", "pg_num": 16, "pool": "unique_pool_0"} v 0) v1 from client.4206 10.214.132.7:0/1021019 2014-10-13 06:30:30.368000 7f37a80dc700 20 mon.a@0(leader).osd e54674 erasure code profile default set 2014-10-13 06:30:30.368009 7f37a80dc700 10 mon.a@0(leader).osd e54674 should_propose 2014-10-13 06:30:30.368012 7f37a80dc700 10 mon.a@0(leader).paxosservice(osdmap 1..54674) setting proposal_timer 0x2f93830 with delay of 0.924279 2014-10-13 06:30:30.368022 7f37a80dc700 5 mon.a@0(leader).paxos(paxos active c 162398..162973) is_readable now=2014-10-13 06:30:30.368023 lease_expire=2014-10-13 06:30:35.367759 has v0 lc 162973 ... 2014-10-13 06:30:31.421248 7f37a80dc700 10 mon.a@0(leader).paxosservice(osdmap 1..54675) dispatch mon_command({"prefix": "osd pool create", "pg_num": 16, "pool": "unique_pool_0"} v 0) v1 from client.4206 10.214.132.7:0/1021019 2014-10-13 06:30:31.421261 7f37a80dc700 5 mon.a@0(leader).paxos(paxos active c 162398..162976) is_readable now=2014-10-13 06:30:31.421262 lease_expire=2014-10-13 06:30:36.421167 has v0 lc 162976 2014-10-13 06:30:31.421273 7f37a80dc700 10 mon.a@0(leader).osd e54675 preprocess_query mon_command({"prefix": "osd pool create", "pg_num": 16, "pool": "unique_pool_0"} v 0) v1 from client.4206 10.214.132.7:0/1021019 2014-10-13 06:30:31.421356 7f37a80dc700 7 mon.a@0(leader).osd e54675 prepare_update mon_command({"prefix": "osd pool create", "pg_num": 16, "pool": "unique_pool_0"} v 0) v1 from client.4206 10.214.132.7:0/1021019 2014-10-13 06:30:31.421525 7f37a80dc700 20 mon.a@0(leader).osd e54675 erasure code profile default set ...
Updated by Sage Weil over 9 years ago
also breaking teuthology-2014-10-09_19:30:01-upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi
Updated by Loïc Dachary over 9 years ago
- Assignee changed from Sage Weil to Loïc Dachary
Updated by Loïc Dachary over 9 years ago
This was run using the following backport https://github.com/ceph/ceph/commits/wip-9757
Updated by Loïc Dachary over 9 years ago
description: upgrade:dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/rbd-cls.yaml 6-next-mon/monb.yaml 7-workload/radosbench.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_14.04.yaml}
Updated by Loïc Dachary over 9 years ago
mon/OSDMonitor : Use user provided ruleset for replicated pool was never a bug, contrary to what the comment suggests, it was a feature. Prior to this patch pool create for a replicated pool is documented as:
ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated]
and this patch adds a new argument to it, to specify the ruleset.
Updated by Loïc Dachary over 9 years ago
- https://github.com/ceph/ceph/commit/fe43202449e3caf60e796f1205ef4303e905659d does not need to be backported because there only is one plugin in firefly.
- https://github.com/ceph/ceph/commit/1d5699853cae08443d31748bce5681b3104543f0 helps with typos and is nice to have but does not seem related to this problem.
Updated by Loïc Dachary over 9 years ago
This bug is dated october 12th with https://github.com/ceph/ceph/commit/0c1eafd7ab6f7d2a5eccd10ce267bde5e90932c5 which does not contain https://github.com/ceph/ceph/commit/cf4e30095e8149d1df0f2c9b4c93c9df0779ec84 that was added october 13th
I think the following happens:
- the erasure code profile is always created, despite the fact the pool is replicated
- it loops, waiting for the erasure code profile to be proposed https://github.com/ceph/ceph/blob/0c1eafd7ab6f7d2a5eccd10ce267bde5e90932c5/src/mon/OSDMonitor.cc#L4331
- because this is a mixed cluster dumpling / firefly, the OSDMap does not encode or interpret the erasure code profile incremental change
- after the paxos proposal, the osd pool create starts again, sees that the default profile is still missing and tries again, indefinitely
I think the bug has been resolved correctly by adding https://github.com/ceph/ceph/commit/cf4e30095e8149d1df0f2c9b4c93c9df0779ec84 to the firefly branch on october 13th. I also think there is no need for other patches
http://pulpito.ceph.com/sage-2014-10-13_20:41:16-upgrade:dumpling-x-wip-sam-firefly-testing-distro-basic-multi/ completed successfully
Updated by Loïc Dachary over 9 years ago
- Status changed from In Progress to Fix Under Review
Updated by Loïc Dachary over 9 years ago
- Status changed from Fix Under Review to Resolved
Updated by Sage Weil over 9 years ago
final commit is cf4e30095e8149d1df0f2c9b4c93c9df0779ec84