Project

General

Profile

Bug #11296

"FAILED assert(!old_value.deleted())" in upgrade:giant-x-hammer-distro-basic-multi run

Added by Yuri Weinstein about 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
03/31/2015
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/giant-x
Pull request ID:

Description

Run: http://pulpito.ceph.com/teuthology-2015-03-30_17:05:01-upgrade:giant-x-hammer-distro-basic-multi/
Job: ['829317']
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-03-30_17:05:01-upgrade:giant-x-hammer-distro-basic-multi/829317/teuthology.log

2015-03-30T18:22:22.038 INFO:tasks.rados.rados.0.burnupi26.stdout: waiting on 2
2015-03-30T18:22:22.039 INFO:tasks.rados.rados.0.burnupi26.stdout:3799: done (1 left)
2015-03-30T18:22:22.039 INFO:tasks.rados.rados.0.burnupi26.stdout: waiting on 1
2015-03-30T18:22:22.129 INFO:tasks.rados.rados.0.burnupi26.stdout:3801:  expect (ObjNum 51074224 snap 0 seq_num 1744847504)
2015-03-30T18:22:22.131 INFO:tasks.rados.rados.0.burnupi26.stderr:./test/osd/RadosModel.h: In function 'virtual void ReadOp::_finish(TestOp::CallbackInfo*)' thread 7f2684ff9700 time 2015-03-30 18:22:22.127651
2015-03-30T18:22:22.131 INFO:tasks.rados.rados.0.burnupi26.stderr:./test/osd/RadosModel.h: 1063: FAILED assert(!old_value.deleted())
2015-03-30T18:22:22.132 INFO:tasks.rados.rados.0.burnupi26.stderr: ceph version 0.87.1-101-g90b37d9 (90b37d9bdcc044e26f978632cd68f19ece82d19a)
2015-03-30T18:22:22.132 INFO:tasks.rados.rados.0.burnupi26.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f2692f1b80b]
2015-03-30T18:22:22.132 INFO:tasks.rados.rados.0.burnupi26.stderr: 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0xe08) [0x421088]
2015-03-30T18:22:22.133 INFO:tasks.rados.rados.0.burnupi26.stderr: 3: (librados::C_AioComplete::finish(int)+0x1d) [0x7f2692e7d59d]
2015-03-30T18:22:22.133 INFO:tasks.rados.rados.0.burnupi26.stderr: 4: (Context::complete(int)+0x9) [0x7f2692e591e9]
2015-03-30T18:22:22.133 INFO:tasks.rados.rados.0.burnupi26.stderr: 5: (Finisher::finisher_thread_entry()+0x158) [0x7f2692f1a8f8]
2015-03-30T18:22:22.133 INFO:tasks.rados.rados.0.burnupi26.stderr: 6: (()+0x8182) [0x7f2692a57182]
2015-03-30T18:22:22.134 INFO:tasks.rados.rados.0.burnupi26.stderr: 7: (clone()+0x6d) [0x7f269226a38d]
2015-03-30T18:22:22.134 INFO:tasks.rados.rados.0.burnupi26.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Related issues

Related to Ceph - Bug #9324: FAILED assert(!old_value.deleted()), expect (ObjNum 34580816 snap 0 seq_num 3355477840) Rejected 09/02/2014
Copied to Ceph - Backport #11983: "FAILED assert(!old_value.deleted())" in upgrade:giant-x-hammer-distro-basic-multi run Resolved 03/31/2015

History

#1 Updated by Samuel Just about 4 years ago

  • Status changed from New to Verified
  • Assignee set to Samuel Just

Huh, this is an old bug with cache tiers and snaps. I'll add more detail tomorrow, but the problem is in finish_promote. If we are promoting a snap which maps to the head object on the backend, we might find that all snaps which it maps to have been trimmed. Actually, that problem can still happen if we promoted a clone whose last snap was trimmed between when the backing tier responded to the promote and now. Either way, we have to filter out the removed clones and adjust the snapset as necessary.

Second, there is an off-by-one error in the handling of the snap->head case which resulted in 1b0 being included in the snaps vector (should have been empty and crashed due to the above case not being handled right).

The information for this is in ubuntu@teuthology:/a/teuthology-2015-03-30_17:05:01-upgrade:giant-x-hammer-distro-basic-multi/829317/remote/ceph-osd.burnupi2612242-86. Search backwards from the end for finish_promote.

See also grep 'make_writeable.*burnupi2612242-86\|\(<==\|-->\).*osd_op.*burnupi2612242-86\( \|@\)' ceph-osd.burnupi2612242-86 |less.

I'll look into a fix tomorrow. I don't think this is a blocker per-se, it's been around since firefly.

#2 Updated by Samuel Just about 4 years ago

  • Status changed from Verified to Testing
  • Regression set to No

#3 Updated by Sage Weil almost 4 years ago

  • Target version set to v9.0.3
  • Backport set to hammer

#4 Updated by Samuel Just almost 4 years ago

  • Status changed from Testing to Pending Backport

#7 Updated by Loic Dachary over 3 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF