Project

General

Profile

Actions

Bug #9806

closed

Objecter: resend linger ops on split

Added by Samuel Just over 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
conflict
Backport:
firefly
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Otherwise, we can lose notifies.

cb9262abd7fd5f0a9f583bd34e4c425a049e56ce


Related issues 1 (0 open1 closed)

Copied to Ceph - Backport #11699: Objecter: resend linger ops on splitResolvedJosh Durgin10/17/2014Actions
Actions #1

Updated by Samuel Just over 9 years ago

  • Description updated (diff)
Actions #2

Updated by Josh Durgin over 9 years ago

  • Backport set to giant, firefly, dumpling
Actions #3

Updated by Josh Durgin over 9 years ago

  • Status changed from New to 7
  • Assignee set to Josh Durgin
Actions #4

Updated by Sage Weil over 9 years ago

  • Status changed from 7 to Resolved
Actions #5

Updated by Sage Weil over 9 years ago

  • Status changed from Resolved to Pending Backport
Actions #6

Updated by Loïc Dachary about 9 years ago

  • Description updated (diff)

The cb9262abd7fd5f0a9f583bd34e4c425a049e56ce does not apply cleanly on dumpling which suggests more should be backported for it to make sense. Should this be backported for v0.67.12 or can it wait ?

Actions #7

Updated by Loïc Dachary about 9 years ago

It won't be in dumpling v0.67.12 but ... it could be in v0.80.10 ;-) It looks like an important fix.

Actions #8

Updated by Loïc Dachary about 9 years ago

  • Backport changed from giant, firefly, dumpling to firefly, dumpling

already in giant

Actions #9

Updated by Loïc Dachary about 9 years ago

  • Backport changed from firefly, dumpling to firefly

dumpling is end of life

Actions #10

Updated by Loïc Dachary almost 9 years ago

  • Tags set to conflict
  • Regression set to No
Actions #11

Updated by Nathan Cutler almost 9 years ago

  • Status changed from Pending Backport to Resolved
Actions #12

Updated by Christian Theune over 8 years ago

As far as I understand this hurts snapshots. I'm on Firefly and getting bitten by this. Is there a work-around to get back to a usable snapshot state once this has kicked in?

Actions #13

Updated by Josh Durgin over 8 years ago

A workaround is to detach and reattach your images. This reopens them and reestablishes the watch.

Actions #14

Updated by Christian Theune over 8 years ago

Ah. So in that case restarting Qemu would be the specific action for that, right?

I'm currently trying this out. I'm a bit unclear on the specifics of the trigger. We have some automation code that causes pg_num and pgp_num to be automatically (slowly) adjusted for growing pools.

However, this was running for a while without the cluster exhibiting the issue in a way that we would notice. The specific point when we noticed was when we updated our tunables to the recommended settings for Firefly (and caused a large CRUSH rearrangement).

Would you think that in running operations stopping our automatic pg_num, pgp_num adaption would be sufficient to avoid this bug?

For further clarification: does this bug apply on a per-pool basis, per-image basis or cluster-wide? My guess would be this applies on a per-pool basis.

Thanks for the hint!

Actions #15

Updated by Christian Theune over 8 years ago

Ok, so I restarted one of the VMs exiting Qemu and starting afresh. Took a snapshot immediately after that and it's been giving a consistent hash of the mapped rbd device multiple times after that.

Actions #16

Updated by Josh Durgin over 8 years ago

Yes, restarting qemu will fix it. The trigger for the issue is pg split, so it would only affect pools where you had increased pg_num and pgp_num. If you avoid splitting, you avoid this bug. Other crush changes like straw2 or new tunables should not cause this issue.

Actions #17

Updated by Christian Theune over 8 years ago

That's a relief! Thanks for the explanation, I hope other people stumbling over this bug will find this helpful, too. :)

Actions

Also available in: Atom PDF