Project

General

Profile

Bug #2286

mon: different full/near_full values on different monitors

Added by Greg Farnum almost 12 years ago. Updated almost 12 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If you run vstart, you get

gregf@kai:~/ceph/src [wip-oc-perf]$ ./ceph pg dump -m 192.168.106.225:6789
dumped all in format plain
version 47
last_osdmap_epoch 5
last_pg_scan 1
full_ratio 0.95
nearfull_ratio 0.85

out of mon.0, but out of mon.1 and mon.2 you get
gregf@kai:~/ceph/src [wip-oc-perf]$ ./ceph pg dump -m 192.168.106.225:6791
dumped all in format plain
version 47
last_osdmap_epoch 5
last_pg_scan 1
full_ratio 0.8
nearfull_ratio 0.9

That's not so good, especially when you notice that full_ratio is lower than nearfull_ratio!

History

#1 Updated by Greg Farnum almost 12 years ago

  • Status changed from New to Fix Under Review

This got (obviously) broken by b6d1c0c9b7290a237560528b6ff0d6b2b2998ee2, which put in the use of magic numbers instead of g_conf for the PGMap constructor.

To fix it properly we need to make the leader tell the followers the initial state of it, which apparently isn't happening now because it only transmits diffs rather than full maps. So initialize all the related variables to 0, and unconditionally encode the [near]full_ratio in the Incremental.

Fixed (and briefly tested) in commit:3ec6425f412a802ccd3ab3fd2fb4d5f3d35d6c94 (branch wip-misc-fixes).

#2 Updated by Greg Farnum almost 12 years ago

Sage asked on irc about just setting it up on the initial create_empty. The problem with that the only data which is sent out to the other Monitors is the PGMap::Incremental, and (before this change) it's only filled in on changes.
We could special-case the first epoch, but that just seems like it's asking for trouble, and doesn't really help any.

#3 Updated by Sage Weil almost 12 years ago

pushed a patch that confines the logic of when to update this into a single bit of code. look okay?

i think the best solution might be to make this not a function of config (except maybe at mkfs time) at all, but to create an explicit mon command to do it. similar to the osd_pg_bits stuff. but i'm not sure it makes sense to go there yet until we clean up the mon interfaces some. this works for me, for now. what do you think?

#4 Updated by Greg Farnum almost 12 years ago

I'm looking at your patch and it doesn't make a lot of sense to me.
First off, when do you think that peon monitors are going to get their ratios set upon creation?
Second, it looks like there's nothing preventing a peon from making a local change to ratios that doesn't get propagated (this may have been an issue before).
Third, what's the penalty involved with filling in the incremental's ratios every time? (Since you got rid of that.)

#5 Updated by Sage Weil almost 12 years ago

Greg Farnum wrote:

I'm looking at your patch and it doesn't make a lot of sense to me.
First off, when do you think that peon monitors are going to get their ratios set upon creation?

It'll be zero. The first master will notice it's zero and set it to something non-zero using the normal update mechanism.

Second, it looks like there's nothing preventing a peon from making a local change to ratios that doesn't get propagated (this may have been an issue before).

There is no such thing as a local update now.. the only way it changes is via the propose in tick()...?

Third, what's the penalty involved with filling in the incremental's ratios every time? (Since you got rid of that.)

When it changes you ahve to recalculate the full and nearfull sets.. definitely not something you want to every time.

We want this to behave like any other field in the map, with updates via the incremental. We want to confine any weirdness to the logic that causes a new inc proposal. Because the data isn't fed into mkfs, just have an early monitor fill it in using a normal update.

#6 Updated by Greg Farnum almost 12 years ago

Oh, I see...I wasn't following that need_*_ratio_update stuff properly. And update_full_ratios() will be called on the initial startup, right?

You'll notice that I set up the apply_incremental() to only actually change the map's ratios if they differ from what's in the incremental. ;)

Anyway, this patch is a mostly orthogonal solution compared to mine. We should probably just make one that inits everything to zeros and does that check in update_from_paxos().

#7 Updated by Sage Weil almost 12 years ago

yeah. actually, i think the check should go in tick() inside the is_leader() block, and not update_from_paxos().

#8 Updated by Greg Farnum almost 12 years ago

  • Status changed from Fix Under Review to In Progress

Hmm. I looked at redoing this and got stuck on the semantics we want. If we're interested in full_ratio == 0 being an off switch, we need to be able to propagate that — which means that whenever the incremental's ratios don't match the current map's ratios, we need to take the Incremental's (including for 0, implying we fill it in all the time). I think this is fine (see my first patch) but you didn't seem to like the idea.

There's another problem with our current setup, which is that right now I don't think we can avoid reverting a change somebody's made online if they don't put it into their config. So I think we should divorce the post-creation changes from g_conf's settings. But I think we can accomplish the first version with just 3ec6425f412a802ccd3ab3fd2fb4d5f3d35d6c94 and 983ab06776168031dd75800e4549f5fae9b2b149. (I just don't see that your middle commit gets us anything, and in fact I think the current use of need_full_ratio_update could be greatly simplified.)

#9 Updated by Sage Weil almost 12 years ago

Greg Farnum wrote:

Hmm. I looked at redoing this and got stuck on the semantics we want. If we're interested in full_ratio == 0 being an off switch, we need to be able to propagate that — which means that whenever the incremental's ratios don't match the current map's ratios, we need to take the Incremental's (including for 0, implying we fill it in all the time). I think this is fine (see my first patch) but you didn't seem to like the idea.

i think it's okay, now that i see the sets are recalculated, although:

There's another problem with our current setup, which is that right now I don't think we can avoid reverting a change somebody's made online if they don't put it into their config. So I think we should divorce the post-creation changes from g_conf's settings. But I think we can accomplish the first version with just 3ec6425f412a802ccd3ab3fd2fb4d5f3d35d6c94 and 983ab06776168031dd75800e4549f5fae9b2b149. (I just don't see that your middle commit gets us anything, and in fact I think the current use of need_full_ratio_update could be greatly simplified.)

yeah, i think that's the right thing to do, and we may as well do it now. which means we could do:

- if it's 0, then set it to g_conf.
- otherwise, only change it via a mon command (not g_conf)

then we don't have to set it every time. or maybe we should anyway... but i'm still hesitant to do that if only because it diverges from the usual pattern of optional updates for everything in the incremental.

#10 Updated by Greg Farnum almost 12 years ago

  • Status changed from In Progress to Fix Under Review

Check out wip-2286-ratio-a and see what you think. It fills in the ratios from g_conf on create_initial, only changes them based on a monitor command (via the PGMonitor), and unconditionally sets and adopts (if they're different) the values in the Incremental. It's nice since this is a net code delete due to all the g_conf monitoring and hookups we had before.

If we don't want to let people set 0 to disable the ratios, we can get rid of the unconditional sets and adopts.

#11 Updated by Sage Weil almost 12 years ago

  • Status changed from Fix Under Review to Resolved

#12 Updated by Mark Nelson almost 12 years ago

  • Status changed from Resolved to 4
  • Target version deleted (v0.46)

Hi Guys,

After upgrading the patched-kernel btrfs test cluster from 0.45-1 to 0.45-281-g0777613, the full_ratio and nearfull_ratio both got set to zero:

ubuntu@plana77:/data/archive/test-0.45-281-g0777613$ ceph pg dump | grep full
full_ratio 0
nearfull_ratio 0

On the other clusters this was fine after the upgrade:

ubuntu@plana10:~$ ceph pg dump | grep full
full_ratio 0.95
nearfull_ratio 0.85

It looks like this might be related to the above bug so I thought I would reopen and comment. Please close if this is invalid or should be put in another bug report.

#13 Updated by Greg Farnum almost 12 years ago

  • Status changed from 4 to In Progress

#14 Updated by Greg Farnum almost 12 years ago

Okay, pretty sure this is a result of a pre-upgrade PGMap::Incremental being committed by post-upgrade code. Sage pointed out we can fix this by changing the encoding version and doing a selective interpretation on it, so I'll do that and push a fix.

#15 Updated by Greg Farnum almost 12 years ago

  • Status changed from In Progress to Fix Under Review

wip-2286-map-fix has the commit to fix this.
I couldn't come up with a great test for it but I did create a cluster with old code, shut it down, upgrade to new code, and run. Nothing broke and the [near]full_ratio values were preserved properly.

#16 Updated by Greg Farnum almost 12 years ago

  • Status changed from Fix Under Review to In Progress

Sage points out we need to convert the PGMap (and Incremental) encode/decode to use the auto-versioning stuff. I'll push a new version of the branch to go into master (but the current ones should go into next).

#17 Updated by Greg Farnum almost 12 years ago

  • Status changed from In Progress to Fix Under Review

Actually I broke out the ENCODE_START bit into #2344.

So Sage, you want to merge the new wip-2286-map-fix into next? (I tested it the same as before.)

#18 Updated by Sage Weil almost 12 years ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF