Project

General

Profile

Bug #9835

osd: bug in misdirected op checks (firefly)

Added by Sage Weil over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-10-18_19:22:02-upgrade:firefly-x-giant-distro-basic-multi/556732

pg 3.cs0 maps to [-1,3,2] momentarily, and we get an op. firefly says

2014-10-20 04:02:08.246434 7f0d9f7f7700 7 osd.3 74 hit non-existent pg 3.cs0
2014-10-20 04:02:08.246441 7f0d9f7f7700 7 osd.3 74 we are valid target for op, waiting

because

    if (osdmap->get_pg_acting_role(pgid.pgid, whoami) >= 0) {
      dout(7) << "we are valid target for op, waiting" << dendl;
      waiting_for_pg[pgid].push_back(op);
      op->mark_delayed("waiting for pg to exist locally");
      return;
    }

but then the op never goes away because the pg never does get created on this node. this code is
all refactored in giant so the same problem doesn't exist there.


Related issues

Related to Ceph - Bug #9702: "MaxWhileTries: 'wait_until_healthy'reached maximum tries" in upgrade:firefly-x-giant-distro-basic-multi run Duplicate 10/08/2014
Related to Ceph - Bug #10178: mon rejects peer during election based on OSD_SET_ALLOC_HINT feature? Resolved 11/24/2014

Associated revisions

Revision 9e05ba08 (diff)
Added by Sage Weil over 9 years ago

osd/OSD: use OSDMap helper to determine if we are correct op target

Use the new helper. This fixes our behavior for EC pools where targetting
a different shard is not correct, while for replicated pools it may be. In
the EC case, it leaves the op hanging indefinitely in the OpTracker because
the pgid exists but as a different shard.

Fixes: #9835
Signed-off-by: Sage Weil <>

Revision 588602bf (diff)
Added by Sage Weil over 9 years ago

osd: discard rank > 0 ops on erasure pools

Erasure pools do not support read from replica, so we should drop
any rank > 0 requests.

This fixes a bug where an erasure pool maps to [1,2,3], temporarily maps
to [-1,2,3], sends a request to osd.2, and then remaps back to [1,2,3].
Because the 0 shard never appears on osd.2, the request sits in the
waiting_for_pg map indefinitely and cases slow request warnings.
This problem does not come up on replicated pools because all instances of
the PG are created equal.

Fix by only considering role == 0 for erasure pools as a correct mapping.

Fixes: #9835
Signed-off-by: Sage Weil <>

Revision fac16547 (diff)
Added by Samuel Just over 9 years ago

osd: use OSDMap helper to tell if ops are misdirected

calc_pg_role doesn't actually take into account primary affinity.

Fixes: #9835
Signed-off-by: Samuel Just <>

Revision ccfd2414 (diff)
Added by Sage Weil over 9 years ago

osd/OSD: use OSDMap helper to determine if we are correct op target

Use the new helper. This fixes our behavior for EC pools where targetting
a different shard is not correct, while for replicated pools it may be. In
the EC case, it leaves the op hanging indefinitely in the OpTracker because
the pgid exists but as a different shard.

Fixes: #9835
Signed-off-by: Sage Weil <>
(cherry picked from commit 9e05ba086a36ae9a04b347153b685c2b8adac2c3)

History

#1 Updated by Sage Weil over 9 years ago

  • Description updated (diff)

#2 Updated by Greg Farnum over 9 years ago

Maybe we need to adjust how we're handling waiting_for_pg, but I don't think that this particular check is a bug — this is an op that can be legitimately targeted at us (we're a replica, and we allow replica ops), but we don't have the PG. So we have to wait for it.

#3 Updated by Sage Weil over 9 years ago

  • Subject changed from osd: bug in misdirected op checks to osd: bug in misdirected op checks (firefly)
  • Status changed from New to Fix Under Review

#4 Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF