Project

General

Profile

Actions

Bug #12687

open

osd thrashing + pg import/export can cause maybe_went_rw intervals to be missed

Added by Sage Weil over 8 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Tests
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

symptom is

2015-08-12T16:29:07.660 INFO:tasks.rados.rados.0.burnupi49.stderr:1396: oid 174 version is 266 and expected 1350
2015-08-12T16:29:07.660 INFO:tasks.rados.rados.0.burnupi49.stderr:./test/osd/RadosModel.h: In function 'virtual void ReadOp::_finish(TestOp::CallbackInfo*)' thread 7f09737fe700 time 2015-08-12 16:29:07.681113
2015-08-12T16:29:07.661 INFO:tasks.rados.rados.0.burnupi49.stderr:./test/osd/RadosModel.h: 1146: FAILED assert(version == old_value.version)

pg maps to [0,5]
does updates to 100
pg maps to [0]
does updates to 150
osd.0 stopped
pg maps to [4,5] (or whatever)
1.1 exported from 0
1.1 imported to 1
pg primary queries 0 and gets an empty info, assumes osd.5's 100 is latest
pg goes active
osd.1 comes up with v 150, divergent and ignored

see pg 1.1 in this run: /a/sage-2015-08-12_12:21:53-rados:thrash-wip-newstore-sort-distro-basic-multi/1012192


Related issues 2 (0 open2 closed)

Has duplicate Ceph - Bug #12179: osd: PG.cc: FAILED assert(info.last_epoch_started >= info.history.last_epoch_started)DuplicateSamuel Just06/26/2015

Actions
Has duplicate Ceph - Bug #13216: import/export bug: ceph_test_rados read incorrect bufferDuplicate09/23/2015

Actions
Actions #1

Updated by Samuel Just over 8 years ago

ubuntu@teuthology:/a/samuelj-2015-08-19_00:45:42-rados-wip-sam-working-distro-basic-multi/1022696/remote

Similar pattern, results in osd/PG.cc: 290: FAILED assert(info.last_epoch_started >= info.history.last_epoch_started)

0.14 is moved from 0 to 3. 0 is queried and has a useless history.les, so we go with a master log from a prior interval.

Actions #2

Updated by Sage Weil over 8 years ago

  • Status changed from New to 12
Actions #3

Updated by Sage Weil over 8 years ago

maybe/probably also this run on 1/d6c871c1/plana798404-412/head in 1.1, run /a/sage-2015-08-26_09:07:57-rados-wip-sage-testing---basic-multi/1033682 . it reverted to a prior version.

Actions #4

Updated by Sage Weil over 8 years ago

Here's one possible workaround: leave the exporting osd down until the importing osd is up and the pg has completely peered. this ensures that we never draw the wrong conclusion by probing the exporting osd and getting a 'dne' response.

Actions #5

Updated by Samuel Just over 7 years ago

  • Priority changed from Urgent to Normal
Actions #6

Updated by Greg Farnum almost 7 years ago

  • Project changed from Ceph to RADOS
  • Category set to Tests
Actions #7

Updated by Patrick Donnelly over 4 years ago

  • Status changed from 12 to New
Actions

Also available in: Atom PDF