Project

General

Profile

Actions

Bug #14073

closed

osd: hammer: fail to start due to stray pglog object after firefly->hammer upgrade

Added by Haomai Wang over 8 years ago. Updated about 8 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have a large object storage cluster upgrade from firefly(0.80.10) to hammer(0.94.5). We found exists probability that this osd failed to upgrade the info_struct_v(https://github.com/ceph/ceph/blob/master/src/osd/PG.cc#L3041), we can observe this case from attached log:
2015-12-14 10:08:37.867333 7f4381ddb800 10 filestore(/var/lib/ceph/osd/ceph-45) stat 15.ce1_head/ce1//head//15 = 0 (size 0)
2015-12-14 10:08:37.915473 7f4381ddb800 10 filestore(/var/lib/ceph/osd/ceph-45) stat 15.ce5_head/ce5//head//15 = 0 (size 0)
2015-12-14 10:08:37.961296 7f4381ddb800 10 filestore(/var/lib/ceph/osd/ceph-45) stat 15.d22_head/d22//head//15 = 0 (size 0)
2015-12-14 10:08:38.005547 7f4381ddb800 10 filestore(/var/lib/ceph/osd/ceph-45) stat 15.df3_head/df3//head//15 = 0 (size 0)
2015-12-14 10:08:38.049795 7f4381ddb800 10 filestore(/var/lib/ceph/osd/ceph-45) stat 15.df7_head/df7//head//15 = 0 (size 0)
2015-12-14 10:08:38.093949 7f4381ddb800 10 filestore(/var/lib/ceph/osd/ceph-45) stat 15.e6f_head/e6f//head//15 = 0 (size 0)
2015-12-14 10:08:38.139775 7f4381ddb800 10 filestore(/var/lib/ceph/osd/ceph-45) stat 15.ed8_head/ed8//head//15 = 0 (size 0)
2015-12-14 10:08:38.186538 7f4381ddb800 10 filestore(/var/lib/ceph/osd/ceph-45) stat meta/bd6a2c2d/pglog_15.ee3/0//-1 = -2

The last line tells osd failed to get v8 structure, so it will read legacy meta oid which actually not existed. From omap_get_values filtered logs:

2015-12-14 10:08:38.005426 7f4381ddb800 15 filestore(/var/lib/ceph/osd/ceph-45) omap_get_values 15.df3_head/df3//head//15
2015-12-14 10:08:38.049560 7f4381ddb800 15 filestore(/var/lib/ceph/osd/ceph-45) omap_get_values 15.df7_head/df7//head//15
2015-12-14 10:08:38.049682 7f4381ddb800 15 filestore(/var/lib/ceph/osd/ceph-45) omap_get_values 15.df7_head/df7//head//15
2015-12-14 10:08:38.093687 7f4381ddb800 15 filestore(/var/lib/ceph/osd/ceph-45) omap_get_values 15.e6f_head/e6f//head//15
2015-12-14 10:08:38.093813 7f4381ddb800 15 filestore(/var/lib/ceph/osd/ceph-45) omap_get_values 15.e6f_head/e6f//head//15
2015-12-14 10:08:38.139493 7f4381ddb800 15 filestore(/var/lib/ceph/osd/ceph-45) omap_get_values 15.ed8_head/ed8//head//15
2015-12-14 10:08:38.139641 7f4381ddb800 15 filestore(/var/lib/ceph/osd/ceph-45) omap_get_values 15.ed8_head/ed8//head//15
2015-12-14 10:08:38.186081 7f4381ddb800 15 filestore(/var/lib/ceph/osd/ceph-45) omap_get_values 15.ee3_head/ee3//head//15
2015-12-14 10:08:38.186159 7f4381ddb800 15 filestore(/var/lib/ceph/osd/ceph-45) omap_get_values meta/16ef7597/infos/head//-1
2015-12-14 10:08:38.186384 7f4381ddb800 15 filestore(/var/lib/ceph/osd/ceph-45) omap_get_values 15.ee3_head/ee3//head//15
2015-12-14 10:08:38.186422 7f4381ddb800 15 filestore(/var/lib/ceph/osd/ceph-45) omap_get_values meta/16ef7597/infos/head//-1

The last line verify the guess.

So is it possible that when we upgrade osd from firefly to hammer, it may delete the legacy oid but 'info_struct_v' not successfully update? Or anything else, I haven't dive into this, maybe later can have a try.

Actions #1

Updated by Samuel Just about 8 years ago

  • Priority changed from Normal to Urgent

Those should be updated atomically, please reproduce with filestore,osd,ms debugging.

debug filestore = 20
debug ms = 1
debug osd = 20

Actions #2

Updated by Sage Weil about 8 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF