Project

General

Profile

Actions

Bug #9757

closed

mon: loops on osd pool create

Added by Sage Weil over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.ceph.com/sage-2014-10-12_09:13:46-upgrade:dumpling-x-wip-sam-firefly-testing-distro-basic-multi/541361/ (and others)

mixed cluster

2014-10-13 06:30:30.367812 7f37a80dc700 10 mon.a@0(leader).paxosservice(osdmap 1..54674) dispatch mon_command({"prefix": "osd pool create", "pg_num": 16, "pool": "unique_pool_0"} v 0) v1 from client.4206 10.214.132.7:0/1021019
2014-10-13 06:30:30.367819 7f37a80dc700  5 mon.a@0(leader).paxos(paxos active c 162398..162973) is_readable now=2014-10-13 06:30:30.367820 lease_expire=2014-10-13 06:30:35.367759 has v0 lc 162973
2014-10-13 06:30:30.367829 7f37a80dc700 10 mon.a@0(leader).osd e54674 preprocess_query mon_command({"prefix": "osd pool create", "pg_num": 16, "pool": "unique_pool_0"} v 0) v1 from client.4206 10.214.132.7:0/1021019
2014-10-13 06:30:30.367886 7f37a80dc700  7 mon.a@0(leader).osd e54674 prepare_update mon_command({"prefix": "osd pool create", "pg_num": 16, "pool": "unique_pool_0"} v 0) v1 from client.4206 10.214.132.7:0/1021019
2014-10-13 06:30:30.368000 7f37a80dc700 20 mon.a@0(leader).osd e54674 erasure code profile default set
2014-10-13 06:30:30.368009 7f37a80dc700 10 mon.a@0(leader).osd e54674 should_propose
2014-10-13 06:30:30.368012 7f37a80dc700 10 mon.a@0(leader).paxosservice(osdmap 1..54674)  setting proposal_timer 0x2f93830 with delay of 0.924279
2014-10-13 06:30:30.368022 7f37a80dc700  5 mon.a@0(leader).paxos(paxos active c 162398..162973) is_readable now=2014-10-13 06:30:30.368023 lease_expire=2014-10-13 06:30:35.367759 has v0 lc 162973
...
2014-10-13 06:30:31.421248 7f37a80dc700 10 mon.a@0(leader).paxosservice(osdmap 1..54675) dispatch mon_command({"prefix": "osd pool create", "pg_num": 16, "pool": "unique_pool_0"} v 0) v1 from client.4206 10.214.132.7:0/1021019
2014-10-13 06:30:31.421261 7f37a80dc700  5 mon.a@0(leader).paxos(paxos active c 162398..162976) is_readable now=2014-10-13 06:30:31.421262 lease_expire=2014-10-13 06:30:36.421167 has v0 lc 162976
2014-10-13 06:30:31.421273 7f37a80dc700 10 mon.a@0(leader).osd e54675 preprocess_query mon_command({"prefix": "osd pool create", "pg_num": 16, "pool": "unique_pool_0"} v 0) v1 from client.4206 10.214.132.7:0/1021019
2014-10-13 06:30:31.421356 7f37a80dc700  7 mon.a@0(leader).osd e54675 prepare_update mon_command({"prefix": "osd pool create", "pg_num": 16, "pool": "unique_pool_0"} v 0) v1 from client.4206 10.214.132.7:0/1021019
2014-10-13 06:30:31.421525 7f37a80dc700 20 mon.a@0(leader).osd e54675 erasure code profile default set
...
Actions #1

Updated by Sage Weil over 9 years ago

also breaking teuthology-2014-10-09_19:30:01-upgrade:dumpling-firefly-x:stress-split-giant-distro-basic-multi

Actions #2

Updated by Loïc Dachary over 9 years ago

  • Assignee changed from Sage Weil to Loïc Dachary
Actions #3

Updated by Sage Weil over 9 years ago

  • Priority changed from Immediate to Urgent
Actions #4

Updated by Loïc Dachary over 9 years ago

  • Description updated (diff)
Actions #5

Updated by Loïc Dachary over 9 years ago

This was run using the following backport https://github.com/ceph/ceph/commits/wip-9757

Actions #6

Updated by Loïc Dachary over 9 years ago

description: upgrade:dumpling-x/stress-split/{0-cluster/start.yaml 1-dumpling-install/dumpling.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/rbd-cls.yaml
  6-next-mon/monb.yaml 7-workload/radosbench.yaml 8-next-mon/monc.yaml 9-workload/{rados_api_tests.yaml
  rbd-python.yaml rgw-s3tests.yaml snaps-many-objects.yaml} distros/ubuntu_14.04.yaml}
Actions #7

Updated by Loïc Dachary over 9 years ago

mon/OSDMonitor : Use user provided ruleset for replicated pool was never a bug, contrary to what the comment suggests, it was a feature. Prior to this patch pool create for a replicated pool is documented as:

ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated]

and this patch adds a new argument to it, to specify the ruleset.

Actions #8

Updated by Loïc Dachary over 9 years ago

Actions #9

Updated by Loïc Dachary over 9 years ago

This bug is dated october 12th with https://github.com/ceph/ceph/commit/0c1eafd7ab6f7d2a5eccd10ce267bde5e90932c5 which does not contain https://github.com/ceph/ceph/commit/cf4e30095e8149d1df0f2c9b4c93c9df0779ec84 that was added october 13th

I think the following happens:

  • the erasure code profile is always created, despite the fact the pool is replicated
  • it loops, waiting for the erasure code profile to be proposed https://github.com/ceph/ceph/blob/0c1eafd7ab6f7d2a5eccd10ce267bde5e90932c5/src/mon/OSDMonitor.cc#L4331
  • because this is a mixed cluster dumpling / firefly, the OSDMap does not encode or interpret the erasure code profile incremental change
  • after the paxos proposal, the osd pool create starts again, sees that the default profile is still missing and tries again, indefinitely

I think the bug has been resolved correctly by adding https://github.com/ceph/ceph/commit/cf4e30095e8149d1df0f2c9b4c93c9df0779ec84 to the firefly branch on october 13th. I also think there is no need for other patches

http://pulpito.ceph.com/sage-2014-10-13_20:41:16-upgrade:dumpling-x-wip-sam-firefly-testing-distro-basic-multi/ completed successfully

Actions #10

Updated by Loïc Dachary over 9 years ago

  • Status changed from In Progress to Fix Under Review
Actions #11

Updated by Loïc Dachary over 9 years ago

  • Status changed from Fix Under Review to Resolved
Actions #12

Updated by Sage Weil over 9 years ago

final commit is cf4e30095e8149d1df0f2c9b4c93c9df0779ec84

Actions

Also available in: Atom PDF