Bug #2307: OSD & Monitor disagree on the contents of pg_temp - Ceph - Ceph

Actions

Copy link

Bug #2307

closed

OSD & Monitor disagree on the contents of pg_temp

Added by Greg Farnum about 12 years ago. Updated almost 12 years ago.

Status:

Resolved

Priority:

High

Assignee:

Greg Farnum

Category:

OSD

Target version:

v0.47

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

See: http://marc.info/?t=133352732900001&r=1&w=2

It seems that (for example) pg 0.138 is in pg_temp, but the OSD can't find it when it goes looking. I obtained the maps from both, and their contents agree when you print them out, but when mapping the PG via --test-map-pg it doesn't contain the pg temp mapping. After a lot of looking, it turns out that the map has a pg_num of 8 and so the placement seed is getting inappropriately truncated (at least in the osdmaptool, and presumably on the OSD).

I suspect this is an encode/decode issue, but don't know for sure.

Actions

Copy link

Updated by Sage Weil about 12 years ago

Priority changed from Normal to High

Actions

Copy link

Updated by Sage Weil about 12 years ago

It looks to me liek the 'data' pool (0) was deleted, and then a new one (vmimages) was created. but somehow that was assigned an old pool id (0) instead of a new one. Or, a bug made us replace data with vmimages.. that's probably more likely!

Actions

Copy link

Updated by Sage Weil about 12 years ago

nine:2307 03:56 PM $ osdmaptool osdmap_full/5754 -p | grep ^pool
pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 320 pgp_num 320 lpg_num 2 lpgp_num 2 last_change 1 owner 0 crash_replay_interval 60
pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 320 pgp_num 320 lpg_num 2 lpgp_num 2 last_change 1 owner 0
pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 320 pgp_num 320 lpg_num 2 lpgp_num 2 last_change 1 owner 0
nine:2307 03:56 PM $ osdmaptool osdmap_full/5755 -p | grep ^pool
pool 0 'vmimages' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 lpg_num 0 lpgp_num 0 last_change 5755 owner 18446744073709551615
pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 320 pgp_num 320 lpg_num 2 lpgp_num 2 last_change 1 owner 0
pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 320 pgp_num 320 lpg_num 2 lpgp_num 2 last_change 1 owner 0

Actions

Copy link

Updated by Sage Weil about 12 years ago

at some point the osdmap pool_max got set to -1.

nine:2307 04:15 PM $ ~/src/ceph/src/ceph-dencoder type OSDMap import osdmap_full/50 decode dump_json | grep max
"pool_max": 2,
"max_osd": 5,
nine:2307 04:15 PM $ ~/src/ceph/src/ceph-dencoder type OSDMap import osdmap_full/5528 decode dump_json | grep max
"pool_max": -1,
"max_osd": 5,

Actions

Copy link

Updated by Greg Farnum about 12 years ago

I'm confused how you're getting that pool_max printout — I don't see it at all when I run that command with a ceph-dencoder from latest master?
But given what I see from how the Incrementals change, that is indeed the problem...if only we can track down how it happened.
(I'm currently circling around the decoders and the fact that Incremental::new_pool_max is an int64_t, whereas pool_max is an int32_t, and the decoders are treating old versions of those as __u32...but I can't actually find a way in which it's broken, even if it is horrible.)

Actions

Copy link

Updated by Sage Weil about 12 years ago

Greg Farnum wrote:

I'm confused how you're getting that pool_max printout — I don't see it at all when I run that command with a ceph-dencoder from latest master?
But given what I see from how the Incrementals change, that is indeed the problem...if only we can track down how it happened.
(I'm currently circling around the decoders and the fact that Incremental::new_pool_max is an int64_t, whereas pool_max is an int32_t, and the decoders are treating old versions of those as __u32...but I can't actually find a way in which it's broken, even if it is horrible.)

i had to fix the OSDMap::dump() method; i'll push that shortly.

and yeah, it looks like i goofed the 32->64 bit pool conversion and didn't actually change pool_max to an int64_t. i want to go back and check the original commits to make sure that's the case before fixing it. (not that 64-bit pool ids are all that useful!)

i'm hoping it won't be hard to scour OSDMap.cc for places where pool_max is assigned a new value and infer what went wrong... there can't be too many places.

Actions

Copy link

Updated by Sage Weil almost 12 years ago

pushed workaround that will repair osdmaps that saw your corruption, eea982e56739a7a91ca907ccc5c5ec1f78d9460d.

Actions

Copy link

Updated by Greg Farnum almost 12 years ago

Status changed from In Progress to 7

And I gave him a patched monitor so he could set pg_num, which should fix it. Waiting to hear back, and will apply that patch as well assuming it works.

Actions

Copy link

Updated by Greg Farnum almost 12 years ago

Status changed from 7 to Resolved

Just changing the pg_num and pgp_num did fix it up, so with the osdmap workaround we should be all good now.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #2307

OSD & Monitor disagree on the contents of pg_temp

Updated by Sage Weil about 12 years ago

Updated by Sage Weil about 12 years ago

Updated by Sage Weil about 12 years ago

Updated by Sage Weil about 12 years ago

Updated by Greg Farnum about 12 years ago

Updated by Sage Weil about 12 years ago

Updated by Sage Weil almost 12 years ago

Updated by Greg Farnum almost 12 years ago

Updated by Greg Farnum almost 12 years ago