Backport #13060
Updated by Loïc Dachary over 8 years ago
https://github.com/ceph/ceph/pull/5892
On Fri, 11 Sep 2015, Haomai Wang wrote:
> On Fri, Sep 11, 2015 at 8:56 PM, Sage Weil <sage@newdream.net> wrote:
> On Fri, 11 Sep 2015, ?? wrote:
> > Thank Sage Weil:
> >
> > 1. I delete some testing pools in the past, but is was a long
> time ago (may be 2 months ago), in recently upgrade, do not
> delete pools.
> > 2.? ceph osd dump please see the (attachment file
> ceph.osd.dump.log)
> > 3. debug osd = 20' and 'debug filestore = 20? (attachment file
> ceph.osd.5.log.tar.gz)
>
> This one is failing on pool 54, which has been deleted.? In this
> case you
> can work around it by renaming current/54.* out of the way.
>
> > 4. i install the ceph-test, but output error
> > ceph-kvstore-tool /ceph/data5/current/db list
> > Invalid argument: /ceph/data5/current/db: does not exist
> (create_if_missing is false)
>
> Sorry, I should have said current/omap, not current/db.? I'm
> still curious
> to see the key dump.? I'm not sure why the leveldb key for these
> pgs is
> missing...
>
>
> Yesterday I have a chat with wangrui and the reason is "infos"(legacy oid)
> is missing. I'm not sure why it's missing.
Probably
https://github.com/ceph/ceph/blob/hammer/src/osd/OSD.cc#L2908
Oh, I think I see what happened:
- the pg removal was aborted pre-hammer. On pre-hammer, thsi means that
load_pgs skips it here:
https://github.com/ceph/ceph/blob/firefly/src/osd/OSD.cc#L2121
- we upgrade to hammer. we skip this pg (same reason), don't upgrade it,
but delete teh legacy infos object
https://github.com/ceph/ceph/blob/hammer/src/osd/OSD.cc#L2908
- now we see this crash...
I think the fix is, in hammer, to bail out of peek_map_epoch if the infos
object isn't present, here
https://github.com/ceph/ceph/blob/hammer/src/osd/PG.cc#L2867
Probably we should restructure so we can return a 'fail' value
instead of a magic epoch_t meaning the same...