Project

General

Profile

Actions

Bug #2307

closed

OSD & Monitor disagree on the contents of pg_temp

Added by Greg Farnum about 12 years ago. Updated almost 12 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
OSD
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

See: http://marc.info/?t=133352732900001&r=1&w=2

It seems that (for example) pg 0.138 is in pg_temp, but the OSD can't find it when it goes looking. I obtained the maps from both, and their contents agree when you print them out, but when mapping the PG via --test-map-pg it doesn't contain the pg temp mapping. After a lot of looking, it turns out that the map has a pg_num of 8 and so the placement seed is getting inappropriately truncated (at least in the osdmaptool, and presumably on the OSD).

I suspect this is an encode/decode issue, but don't know for sure.

Actions #1

Updated by Sage Weil about 12 years ago

  • Priority changed from Normal to High
Actions #2

Updated by Sage Weil about 12 years ago

It looks to me liek the 'data' pool (0) was deleted, and then a new one (vmimages) was created. but somehow that was assigned an old pool id (0) instead of a new one. Or, a bug made us replace data with vmimages.. that's probably more likely!

Actions #3

Updated by Sage Weil about 12 years ago

nine:2307 03:56 PM $ osdmaptool osdmap_full/5754 -p | grep ^pool
pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 320 pgp_num 320 lpg_num 2 lpgp_num 2 last_change 1 owner 0 crash_replay_interval 60
pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 320 pgp_num 320 lpg_num 2 lpgp_num 2 last_change 1 owner 0
pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 320 pgp_num 320 lpg_num 2 lpgp_num 2 last_change 1 owner 0
nine:2307 03:56 PM $ osdmaptool osdmap_full/5755 -p | grep ^pool
pool 0 'vmimages' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 lpg_num 0 lpgp_num 0 last_change 5755 owner 18446744073709551615
pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 320 pgp_num 320 lpg_num 2 lpgp_num 2 last_change 1 owner 0
pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 320 pgp_num 320 lpg_num 2 lpgp_num 2 last_change 1 owner 0

Actions #4

Updated by Sage Weil about 12 years ago

at some point the osdmap pool_max got set to -1.

nine:2307 04:15 PM $ ~/src/ceph/src/ceph-dencoder type OSDMap import osdmap_full/50 decode dump_json | grep max
"pool_max": 2,
"max_osd": 5,
nine:2307 04:15 PM $ ~/src/ceph/src/ceph-dencoder type OSDMap import osdmap_full/5528 decode dump_json | grep max
"pool_max": -1,
"max_osd": 5,

Actions #5

Updated by Greg Farnum about 12 years ago

I'm confused how you're getting that pool_max printout — I don't see it at all when I run that command with a ceph-dencoder from latest master?
But given what I see from how the Incrementals change, that is indeed the problem...if only we can track down how it happened.
(I'm currently circling around the decoders and the fact that Incremental::new_pool_max is an int64_t, whereas pool_max is an int32_t, and the decoders are treating old versions of those as __u32...but I can't actually find a way in which it's broken, even if it is horrible.)

Actions #6

Updated by Sage Weil about 12 years ago

Greg Farnum wrote:

I'm confused how you're getting that pool_max printout — I don't see it at all when I run that command with a ceph-dencoder from latest master?
But given what I see from how the Incrementals change, that is indeed the problem...if only we can track down how it happened.
(I'm currently circling around the decoders and the fact that Incremental::new_pool_max is an int64_t, whereas pool_max is an int32_t, and the decoders are treating old versions of those as __u32...but I can't actually find a way in which it's broken, even if it is horrible.)

i had to fix the OSDMap::dump() method; i'll push that shortly.

and yeah, it looks like i goofed the 32->64 bit pool conversion and didn't actually change pool_max to an int64_t. i want to go back and check the original commits to make sure that's the case before fixing it. (not that 64-bit pool ids are all that useful!)

i'm hoping it won't be hard to scour OSDMap.cc for places where pool_max is assigned a new value and infer what went wrong... there can't be too many places.

Actions #7

Updated by Sage Weil almost 12 years ago

pushed workaround that will repair osdmaps that saw your corruption, eea982e56739a7a91ca907ccc5c5ec1f78d9460d.

Actions #8

Updated by Greg Farnum almost 12 years ago

  • Status changed from In Progress to 7

And I gave him a patched monitor so he could set pg_num, which should fix it. Waiting to hear back, and will apply that patch as well assuming it works.

Actions #9

Updated by Greg Farnum almost 12 years ago

  • Status changed from 7 to Resolved

Just changing the pg_num and pgp_num did fix it up, so with the osdmap workaround we should be all good now.

Actions

Also available in: Atom PDF