Project

General

Profile

Actions

Bug #7592

closed

hit_set_trim() removal races with backfill

Added by David Zafman about 10 years ago. Updated about 10 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/dzafman-2014-02-28_12:09:58-rados:thrash-wip-7458-testing-basic-plana/111513

osd.4 the primary was backfilling osd.1. The hit_set_trim() appears to be simply adding a log entry for the remove.

2014-02-28 22:15:30.621000 7fa756a48700 20 osd.4 pg_epoch: 62 pg[4.2( v 62'5997 (37'2845,62'5997] local-les=62 n=1251 ec=8 les/c 62/55 60/61/61) [1,5]/[4,5] r=-1 lpr=61 pi=54-60/2 rops=1 bft=(1,255) crt=23'1466 lcod 62'5996 mlcod 62'5996 active+remapped+backfilling] hit_set_trim removing 2/hit_set_4.2_archive_2014-02-28 22:06:57.571864_2014-02-28 22:06:57.661983/head/.ceph-internal/4
2014-02-28 22:15:30.621058 7fa756a48700 10 filestore(/var/lib/ceph/osd/ceph-4) stat 4.2_head/2/hit_set_4.2_archive_2014-02-28 22:06:57.571864_2014-02-28 22:06:57.661983/head/.ceph-internal/4 = 0 (size 857)
2014-02-28 22:15:30.621064 7fa756a48700 20 osd.4 pg_epoch: 62 pg[4.2( v 62'5997 (37'2845,62'5997] local-les=62 n=1251 ec=8 les/c 62/55 60/61/61) [1,5]/[4,5] r=-1 lpr=61 pi=54-60/2 rops=1 bft=(1,255) crt=23'1466 lcod 62'5996 mlcod 62'5996 active+remapped+backfilling] simple_repop_submit 0x3a9a160

Later when osd.1 becomes primary it also appends the removal to its log but WITHOUT deleting the archive.

2014-02-28 22:15:30.624287 7faf67dae700 10 osd.1 pg_epoch: 62 pg[4.2( v 62'5997 (37'2846,62'5997] lb 2/hit_set_4.2_archive_2014-02-28 22:08:52.898029_2014-02-28 22:09:49.099130/head/.ceph-internal/4 local-les=0 n=6 ec=8 les/c 62/55 60/61/61) [1,5]/[4,5] r=-1 lpr=61 pi=8-60/7 luod=0'0 crt=62'5997 active+remapped] append_log log((37'2846,62'5997], crt=62'5997) [62'5998 (62'5998) modify   2/hit_set_4.2_archive_2014-02-28 22:14:20.058350_2014-02-28 22:15:30.620290/head/.ceph-internal/4 by unknown.0.0:0 2014-02-28 22:15:30.620622,62'5999 (25'1467) delete   2/hit_set_4.2_archive_2014-02-28 22:06:57.571864_2014-02-28 22:06:57.661983/head/.ceph-internal/4 by unknown.0.0:0 2014-02-28 22:15:30.620622]
2014-02-28 22:15:30.624354 7faf67dae700 10 osd.1 pg_epoch: 62 pg[4.2( v 62'5999 (37'2846,62'5999] lb 2/hit_set_4.2_archive_2014-02-28 22:08:52.898029_2014-02-28 22:09:49.099130/head/.ceph-internal/4 local-les=0 n=6 ec=8 les/c 62/55 60/61/61) [1,5]/[4,5] r=-1 lpr=61 pi=8-60/7 luod=0'0 crt=62'5997 lcod 62'5997 active+remapped] add_log_entry 62'5999 (25'1467) delete   2/hit_set_4.2_archive_2014-02-28 22:06:57.571864_2014-02-28 22:06:57.661983/head/.ceph-internal/4 by unknown.0.0:0 2014-02-28 22:15:30.620622 
2014-02-28 22:22:12.231378 7faf67dae700 -1 osd.1 pg_epoch: 64 pg[4.2( v 62'9435 (37'2846,62'9435] local-les=64 n=1253 ec=8 les/c 62/62 60/63/63) [1,5] r=-1 lpr=63 pi=8-62/8 crt=62'9349 mlcod 0'0 active] check_local 2/hit_set_4.2_archive_2014-02-28 22:06:57.571864_2014-02-28 22:06:57.661983/head/.ceph-internal/4 exists, but should have been deleted
2014-02-28 22:22:12.259685 7faf67dae700 -1 osd/ReplicatedPG.cc: In function 'virtual void ReplicatedPG::check_local()' thread 7faf67dae700 time 2014-02-28 22:22:12.231393
osd/ReplicatedPG.cc: 10120: FAILED assert(0 == "erroneously present object")
Actions #1

Updated by David Zafman about 10 years ago

  • Description updated (diff)
Actions #2

Updated by Ian Colle about 10 years ago

  • Assignee set to David Zafman
Actions #3

Updated by David Zafman about 10 years ago

  • Assignee changed from David Zafman to Sage Weil
Actions #4

Updated by Sage Weil about 10 years ago

  • Assignee deleted (Sage Weil)
Actions #5

Updated by Sage Weil about 10 years ago

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-03-09_03:00:01-rados-firefly-testing-basic-plana/123754

Actions #6

Updated by David Zafman about 10 years ago

  • Status changed from New to In Progress
  • Assignee set to David Zafman
Actions #7

Updated by David Zafman about 10 years ago

  • Status changed from In Progress to Fix Under Review
Actions #8

Updated by Sage Weil about 10 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF