Actions
Bug #7592
closedhit_set_trim() removal races with backfill
Status:
Resolved
Priority:
Urgent
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
/a/dzafman-2014-02-28_12:09:58-rados:thrash-wip-7458-testing-basic-plana/111513
osd.4 the primary was backfilling osd.1. The hit_set_trim() appears to be simply adding a log entry for the remove.
2014-02-28 22:15:30.621000 7fa756a48700 20 osd.4 pg_epoch: 62 pg[4.2( v 62'5997 (37'2845,62'5997] local-les=62 n=1251 ec=8 les/c 62/55 60/61/61) [1,5]/[4,5] r=-1 lpr=61 pi=54-60/2 rops=1 bft=(1,255) crt=23'1466 lcod 62'5996 mlcod 62'5996 active+remapped+backfilling] hit_set_trim removing 2/hit_set_4.2_archive_2014-02-28 22:06:57.571864_2014-02-28 22:06:57.661983/head/.ceph-internal/4 2014-02-28 22:15:30.621058 7fa756a48700 10 filestore(/var/lib/ceph/osd/ceph-4) stat 4.2_head/2/hit_set_4.2_archive_2014-02-28 22:06:57.571864_2014-02-28 22:06:57.661983/head/.ceph-internal/4 = 0 (size 857) 2014-02-28 22:15:30.621064 7fa756a48700 20 osd.4 pg_epoch: 62 pg[4.2( v 62'5997 (37'2845,62'5997] local-les=62 n=1251 ec=8 les/c 62/55 60/61/61) [1,5]/[4,5] r=-1 lpr=61 pi=54-60/2 rops=1 bft=(1,255) crt=23'1466 lcod 62'5996 mlcod 62'5996 active+remapped+backfilling] simple_repop_submit 0x3a9a160
Later when osd.1 becomes primary it also appends the removal to its log but WITHOUT deleting the archive.
2014-02-28 22:15:30.624287 7faf67dae700 10 osd.1 pg_epoch: 62 pg[4.2( v 62'5997 (37'2846,62'5997] lb 2/hit_set_4.2_archive_2014-02-28 22:08:52.898029_2014-02-28 22:09:49.099130/head/.ceph-internal/4 local-les=0 n=6 ec=8 les/c 62/55 60/61/61) [1,5]/[4,5] r=-1 lpr=61 pi=8-60/7 luod=0'0 crt=62'5997 active+remapped] append_log log((37'2846,62'5997], crt=62'5997) [62'5998 (62'5998) modify 2/hit_set_4.2_archive_2014-02-28 22:14:20.058350_2014-02-28 22:15:30.620290/head/.ceph-internal/4 by unknown.0.0:0 2014-02-28 22:15:30.620622,62'5999 (25'1467) delete 2/hit_set_4.2_archive_2014-02-28 22:06:57.571864_2014-02-28 22:06:57.661983/head/.ceph-internal/4 by unknown.0.0:0 2014-02-28 22:15:30.620622] 2014-02-28 22:15:30.624354 7faf67dae700 10 osd.1 pg_epoch: 62 pg[4.2( v 62'5999 (37'2846,62'5999] lb 2/hit_set_4.2_archive_2014-02-28 22:08:52.898029_2014-02-28 22:09:49.099130/head/.ceph-internal/4 local-les=0 n=6 ec=8 les/c 62/55 60/61/61) [1,5]/[4,5] r=-1 lpr=61 pi=8-60/7 luod=0'0 crt=62'5997 lcod 62'5997 active+remapped] add_log_entry 62'5999 (25'1467) delete 2/hit_set_4.2_archive_2014-02-28 22:06:57.571864_2014-02-28 22:06:57.661983/head/.ceph-internal/4 by unknown.0.0:0 2014-02-28 22:15:30.620622 2014-02-28 22:22:12.231378 7faf67dae700 -1 osd.1 pg_epoch: 64 pg[4.2( v 62'9435 (37'2846,62'9435] local-les=64 n=1253 ec=8 les/c 62/62 60/63/63) [1,5] r=-1 lpr=63 pi=8-62/8 crt=62'9349 mlcod 0'0 active] check_local 2/hit_set_4.2_archive_2014-02-28 22:06:57.571864_2014-02-28 22:06:57.661983/head/.ceph-internal/4 exists, but should have been deleted 2014-02-28 22:22:12.259685 7faf67dae700 -1 osd/ReplicatedPG.cc: In function 'virtual void ReplicatedPG::check_local()' thread 7faf67dae700 time 2014-02-28 22:22:12.231393 osd/ReplicatedPG.cc: 10120: FAILED assert(0 == "erroneously present object")
Updated by David Zafman about 10 years ago
- Assignee changed from David Zafman to Sage Weil
Updated by Sage Weil about 10 years ago
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-03-09_03:00:01-rados-firefly-testing-basic-plana/123754
Updated by David Zafman about 10 years ago
- Status changed from New to In Progress
- Assignee set to David Zafman
Updated by David Zafman about 10 years ago
- Status changed from In Progress to Fix Under Review
Updated by Sage Weil about 10 years ago
- Status changed from Fix Under Review to Resolved
Actions