Project

General

Profile

Feature #18409

scrub: tolerate and repair case where an object has an object_info with a different name

Added by Samuel Just 3 months ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
01/03/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
User Impact:
Affected Versions:
Release:
Needs Doc:
No

Description

This came up on the sepia cluster due (presumably) to an xfs bug in the murky past. It happened to be harmless until we changed filestore to crash on ENOENT for setattrs. I'm adding a very simple workaround for kraken, but since this is apparently possible we should add support (and testing) for detecting and repairing this case.

The symptom on the sepia cluster was an object with a valid object_info attr for a completely different object in a completely different PG.

2016-12-25 04:18:27.740503 7f9beb44c700 20 osd.17 pg_epoch: 762465 pg[0.fc7( v 762449'1053074 (760775'1050064,762449'1053074] local-les=762464 n=12172 ec=1 les/c/f 762464/762465/735559 762462/762463/762455) [17,7,67] r=0 lpr=762463 crt=762449'1053074 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep] deep-scrub 0:e3fd9586:::1000007a100.00000000:head 0:432c46d8:::10002161725.00000000:head(349432'435650 mds.0.32:4372567 dirty s 1822 uv 435650 alloc_hint [0 0 0])
2016-12-25 04:18:27.740519 7f9beb44c700 20 osd.17 pg_epoch: 762465 pg[0.fc7( v 762449'1053074 (760775'1050064,762449'1053074] local-les=762464 n=12172 ec=1 les/c/f 762464/762465/735559 762462/762463/762455) [17,7,67] r=0 lpr=762463 crt=762449'1053074 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep] scrub_snapshot_metadata deep-scrub new head 0:e3fd9586:::1000007a100.00000000:head
...
2016-12-25 04:18:27.740707 7f9beb44c700 20 osd.17 pg_epoch: 762465 pg[0.fc7( v 762449'1053074 (760775'1050064,762449'1053074] local-les=762464 n=12172 ec=1 les/c/f 762464/762465/735559 762462/762463/762455) [17,7,67] r=0 lpr=762463 crt=762449'1053074 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep] scrub_snapshot_metadata deep-scrub new head 0:e3fd9489:::1001f2cbe83.00000000:head
2016-12-25 04:18:27.740724 7f9beb44c700 10 osd.17 pg_epoch: 762465 pg[0.fc7( v 762449'1053074 (760775'1050064,762449'1053074] local-les=762464 n=12172 ec=1 les/c/f 762464/762465/735559 762462/762463/762455) [17,7,67] r=0 lpr=762463 crt=762449'1053074 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep] scrub_snapshot_metadata recording digests for 0:e3fd9586:::1000007a100.00000000:head
2016-12-25 04:18:27.740740 7f9beb44c700 10 osd.17 pg_epoch: 762465 pg[0.fc7( v 762449'1053074 (760775'1050064,762449'1053074] local-les=762464 n=12172 ec=1 les/c/f 762464/762465/735559 762462/762463/762455) [17,7,67] r=0 lpr=762463 crt=762449'1053074 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep] get_object_context: obc NOT found in cache: 0:e3fd9586:::1000007a100.00000000:head
2016-12-25 04:18:27.740756 7f9beb44c700 15 filestore(/var/lib/ceph/osd/ceph-17) getattr 0.fc7_head/#0:e3fd9586:::1000007a100.00000000:head# '_'
2016-12-25 04:18:27.740819 7f9beb44c700 10 filestore(/var/lib/ceph/osd/ceph-17) getattr 0.fc7_head/#0:e3fd9586:::1000007a100.00000000:head# '_' = 239
2016-12-25 04:18:27.740838 7f9beb44c700 15 filestore(/var/lib/ceph/osd/ceph-17) getattr 0.fc7_head/#0:e3fd9586:::1000007a100.00000000:head# 'snapset'
2016-12-25 04:18:27.740851 7f9beb44c700 10 filestore(/var/lib/ceph/osd/ceph-17) getattr 0.fc7_head/#0:e3fd9586:::1000007a100.00000000:head# 'snapset' = 31
2016-12-25 04:18:27.740863 7f9beb44c700 10 osd.17 pg_epoch: 762465 pg[0.fc7( v 762449'1053074 (760775'1050064,762449'1053074] local-les=762464 n=12172 ec=1 les/c/f 762464/762465/735559 762462/762463/762455) [17,7,67] r=0 lpr=762463 crt=762449'1053074 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep] populate_obc_watchers 0:432c46d8:::10002161725.00000000:head
2016-12-25 04:18:27.740879 7f9beb44c700 20 osd.17 pg_epoch: 762465 pg[0.fc7( v 762449'1053074 (760775'1050064,762449'1053074] local-les=762464 n=12172 ec=1 les/c/f 762464/762465/735559 762462/762463/762455) [17,7,67] r=0 lpr=762463 crt=762449'1053074 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep] PrimaryLogPG::check_blacklisted_obc_watchers for obc 0:432c46d8:::10002161725.00000000:head
2016-12-25 04:18:27.740892 7f9beb44c700 10 osd.17 pg_epoch: 762465 pg[0.fc7( v 762449'1053074 (760775'1050064,762449'1053074] local-les=762464 n=12172 ec=1 les/c/f 762464/762465/735559 762462/762463/762455) [17,7,67] r=0 lpr=762463 crt=762449'1053074 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep] get_object_context: creating obc from disk: 0x7f9c3decc580
2016-12-25 04:18:27.740907 7f9beb44c700 10 osd.17 pg_epoch: 762465 pg[0.fc7( v 762449'1053074 (760775'1050064,762449'1053074] local-les=762464 n=12172 ec=1 les/c/f 762464/762465/735559 762462/762463/762455) [17,7,67] r=0 lpr=762463 crt=762449'1053074 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep] get_object_context: 0x7f9c3decc580 0:e3fd9586:::1000007a100.00000000:head rwstate(none n=0 w=0) oi: 0:432c46d8:::10002161725.00000000:head(349432'435650 mds.0.32:4372567 dirty s 1822 uv 435650 alloc_hint [0 0 0]) ssc: 0x7f9c3d9c4bc0 snapset: 1=[]:[]+head
2016-12-25 04:18:27.740926 7f9beb44c700 20 osd.17 pg_epoch: 762465 pg[0.fc7( v 762449'1053074 (760775'1050064,762449'1053074] local-les=762464 n=12172 ec=1 les/c/f 762464/762465/735559 762462/762463/762455) [17,7,67] r=0 lpr=762463 crt=762449'1053074 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep] simple_opc_create 0:432c46d8:::10002161725.00000000:head
2016-12-25 04:18:27.740945 7f9beb44c700 20 osd.17 pg_epoch: 762465 pg[0.fc7( v 762449'1053074 (760775'1050064,762449'1053074] local-les=762464 n=12172 ec=1 les/c/f 762464/762465/735559 762462/762463/762455) [17,7,67] r=0 lpr=762463 crt=762449'1053074 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep] finish_ctx 0:432c46d8:::10002161725.00000000:head 0x7f9c3ea4c400 op modify
2016-12-25 04:18:27.740963 7f9beb44c700 10 osd.17 pg_epoch: 762465 pg[0.fc7( v 762449'1053074 (760775'1050064,762449'1053074] local-les=762464 n=12172 ec=1 les/c/f 762464/762465/735559 762462/762463/762455) [17,7,67] r=0 lpr=762463 crt=762449'1053074 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep] mtime unchanged at 2014-08-14 02:59:04.237496
2016-12-25 04:18:27.740984 7f9beb44c700 10 osd.17 pg_epoch: 762465 pg[0.fc7( v 762449'1053074 (760775'1050064,762449'1053074] local-les=762464 n=12172 ec=1 les/c/f 762464/762465/735559 762462/762463/762455) [17,7,67] r=0 lpr=762463 crt=762449'1053074 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep] final snapset 1=[]:[]+head in 0:432c46d8:::10002161725.00000000:head
2016-12-25 04:18:27.741013 7f9beb44c700 10 osd.17 pg_epoch: 762465 pg[0.fc7( v 762449'1053074 (760775'1050064,762449'1053074] local-les=762464 n=12172 ec=1 les/c/f 762464/762465/735559 762462/762463/762455) [17,7,67] r=0 lpr=762463 crt=762449'1053074 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep] new_repop rep_tid 238 (no op)
2016-12-25 04:18:27.741026 7f9beb44c700 10 osd.17 pg_epoch: 762465 pg[0.fc7( v 762449'1053074 (760775'1050064,762449'1053074] local-les=762464 n=12172 ec=1 les/c/f 762464/762465/735559 762462/762463/762455) [17,7,67] r=0 lpr=762463 crt=762449'1053074 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep] new_repop: repgather(0x7f9c3cd7b9e0 0'0 rep_tid=238 committed?=0 applied?=0 r=0)
2016-12-25 04:18:27.741040 7f9beb44c700 20 osd.17 pg_epoch: 762465 pg[0.fc7( v 762449'1053074 (760775'1050064,762449'1053074] local-les=762464 n=12172 ec=1 les/c/f 762464/762465/735559 762462/762463/762455) [17,7,67] r=0 lpr=762463 crt=762449'1053074 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep] simple_opc_submit 0x7f9c3cd7b9e0
2016-12-25 04:18:27.741054 7f9beb44c700 7 osd.17 pg_epoch: 762465 pg[0.fc7( v 762449'1053074 (760775'1050064,762449'1053074] local-les=762464 n=12172 ec=1 les/c/f 762464/762465/735559 762462/762463/762455) [17,7,67] r=0 lpr=762463 crt=762449'1053074 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep] issue_repop rep_tid 238 o 0:432c46d8:::10002161725.00000000:head
...
2016-12-25 04:18:27.741860 7f9bfe396700 10 osd.17 pg_epoch: 762465 pg[0.fc7( v 762465'1053075 (760775'1050064,762465'1053075] local-les=762464 n=12172 ec=1 les/c/f 762464/762465/735559 762462/762463/762455) [17,7,67] r=0 lpr=762463 luod=762449'1053074 lua=762449'1053074 crt=762465'1053075 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep] op_commit: 238
2016-12-25 04:18:27.741965 7f9bff398700 10 filestore(/var/lib/ceph/osd/ceph-17) error opening file /var/lib/ceph/osd/ceph-17/current/0.fc7_head/10002161725.00000000__head_1B6234C2__0 with flags=2: (2) No such file or directory
2016-12-25 04:18:27.741985 7f9bff398700 10 filestore(/var/lib/ceph/osd/ceph-17) setattrs 0.fc7_head/#0:432c46d8:::10002161725.00000000:head# = -2
2016-12-25 04:18:27.741993 7f9bff398700 -1 filestore(/var/lib/ceph/osd/ceph-17) error (2) No such file or directory not handled on operation 0x7f9c3d658d00 (126727522.0.0, or op 0, counting from 0)
2016-12-25 04:18:27.742001 7f9bff398700 0 filestore(/var/lib/ceph/osd/ceph-17) unexpected error code

0:e3fd9586:::1000007a100.00000000:head was listed using collection_list*, but it's object_info is 0:432c46d8:::10002161725.00000000:head(349432'435650 mds.0.32:4372567 dirty s 1822 uv 435650 alloc_hint [0 0 0]). A repop is then issued on 0:432c46d8:::10002161725.00000000:head to update the digest, but that object doesn't actually exist in that collection and the osd crashes.

Also available in: Atom PDF