Project

General

Profile

Actions

Bug #19058

open

osd: backfill failed to remove racing evict

Added by Sage Weil about 7 years ago. Updated almost 7 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
Tiering
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

we are backfilling...

2017-02-22 02:37:35.460572 7f751e226700 10 osd.4 pg_epoch: 108 pg[2.0( v 108'7689 (49'4406,108'7689] local-les=97 n=1007 ec=12 les/c/f 97/93/0 96/96/92) [3,4]/[4,1] r=0 lpr=96 pi=92-95/1 luod=108'7688 bft=3 crt=108'7689 lcod 108'7687 mlcod 108'7687 active+remapped+backfilling] recover_backfill (1) bft=3 last_backfill_started 2:10e92004:::smithi16927945-536:head
2017-02-22 02:37:35.460590 7f751e226700 10 osd.4 pg_epoch: 108 pg[2.0( v 108'7689 (49'4406,108'7689] local-les=97 n=1007 ec=12 les/c/f 97/93/0 96/96/92) [3,4]/[4,1] r=0 lpr=96 pi=92-95/1 luod=108'7688 bft=3 crt=108'7689 lcod 108'7687 mlcod 108'7687 active+remapped+backfilling] peer osd.3 info 2.0( v 108'7689 (49'4479,108'7689] lb 2:10ddf223:::smithi16927945-1391 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head (bitwise) local-les=97 n=266 ec=12 les/c/f 97/93/0 96/96/92) interval 2:11364e36:::smithi16927945-846:head-2:30816d65:::smithi16927945-2311 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head 309 objects

and we do an evict past last_backfill and last_backfill_started

2017-02-22 02:37:35.855653 7f7515a15700 10 osd.4 pg_epoch: 108 pg[2.0( v 108'7723 (49'4406,108'7723] local-les=97 n=987 ec=12 les/c/f 97/93/0 96/96/92) [3,4]/[4,1] r=0 lpr=96 pi=92-95/1 bft=3 crt=108'7723 lcod 108'7722 mlcod 108'7722 active+remapped+backfilling] issue_repop shipping empty opt to osd.3, object 2:15a22f96:::smithi16927945-2044:head beyond MAX(last_backfill_started , pinfo.last_backfill 2:10e92004:::smithi16927945-536:head)

but then later, on the next recover_backfill,

2017-02-22 02:37:36.618019 7f751ba21700 10 osd.4 pg_epoch: 108 pg[2.0( v 108'7739 (49'4406,108'7739] local-les=97 n=988 ec=12 les/c/f 97/93/0 96/96/92) [3,4]/[4,1] r=0 lpr=96 pi=92-95/1 bft=3 crt=108'7739 lcod 108'7738 mlcod 108'7738 active+remapped+backfilling] recover_backfill (1) bft=3 last_backfill_started 2:10f28630:::smithi16927945-5597 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head
2017-02-22 02:37:36.618030 7f751aa1f700 10 osd.4 pg_epoch: 108 pg[1.2s0( v 108'818 (0'0,108'818] local-les=88 n=811 ec=9 les/c/f 88/25/0 87/87/9) [4,5,2] r=0 lpr=87 pi=24-86/1 luod=108'816 rops=2 crt=108'816 lcod 108'815 mlcod 42'163 active+recovering+degraded] continue_recovery_op: on_peer_recover on 5(1), obj 1:5bfb7535:::smithi16927945-2050:head
2017-02-22 02:37:36.618031 7f751ba21700 10 osd.4 pg_epoch: 108 pg[2.0( v 108'7739 (49'4406,108'7739] local-les=97 n=988 ec=12 les/c/f 97/93/0 96/96/92) [3,4]/[4,1] r=0 lpr=96 pi=92-95/1 bft=3 crt=108'7739 lcod 108'7738 mlcod 108'7738 active+remapped+backfilling] peer osd.3 info 2.0( v 108'7739 (49'4479,108'7739] lb 2:10e92004:::smithi16927945-536:head (bitwise) local-les=97 n=263 ec=12 les/c/f 97/93/0 96/96/92) interval 2:11364e36:::smithi16927945-846:head-2:30816d65:::smithi16927945-2311 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head 309 objects
2017-02-22 02:37:36.618047 7f751ba21700 10 osd.4 pg_epoch: 108 pg[2.0( v 108'7739 (49'4406,108'7739] local-les=97 n=988 ec=12 les/c/f 97/93/0 96/96/92) [3,4]/[4,1] r=0 lpr=96 pi=92-95/1 bft=3 crt=108'7739 lcod 108'7738 mlcod 108'7738 active+remapped+backfilling] update_range: bi is old, (108'7689) can be updated with log to projected_last_update 108'7739
...
2017-02-22 02:37:36.618575 7f751ba21700 10 osd.4 pg_epoch: 108 pg[2.0( v 108'7739 (49'4406,108'7739] local-les=97 n=988 ec=12 les/c/f 97/93/0 96/96/92) [3,4]/[4,1] r=0 lpr=96 pi=92-95/1 bft=3 crt=108'7739 lcod 108'7738 mlcod 108'7738 active+remapped+backfilling] operator(): 2:15a22f96:::smithi16927945-2044:head removed

...which prevents us from removing it on the target.

/a/sage-2017-02-21_20:58:58-rados-wip-sage-testing---basic-smithi/844731


Related issues 1 (1 open0 closed)

Related to RADOS - Bug #19092: cluster [ERR] scrub 2.1 ... is an unexpected clone" in cluster logNew02/27/2017

Actions
Actions #1

Updated by Sage Weil about 7 years ago

/a/sage-2017-02-21_20:58:58-rados-wip-sage-testing---basic-smithi/844754

Actions #2

Updated by Sage Weil about 7 years ago

  • Assignee set to Sage Weil
Actions #3

Updated by Kefu Chai about 7 years ago

  • Related to Bug #19092: cluster [ERR] scrub 2.1 ... is an unexpected clone" in cluster log added
Actions #4

Updated by Greg Farnum almost 7 years ago

  • Project changed from Ceph to RADOS
  • Category set to Tiering
  • Component(RADOS) OSD added
Actions #5

Updated by Sage Weil almost 7 years ago

  • Priority changed from Urgent to Normal
Actions

Also available in: Atom PDF