Project

General

Profile

Actions

Bug #9203

closed

ceph_test_rados: ObjectDesc::iterator::advance(bool): Assertion `pos < limit' failed.

Added by John Spray over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
firefly
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.front.sepia.ceph.com/john-2014-08-22_02:21:21-rados-wip-objecter-testing-basic-multi/440722/
http://pulpito.front.sepia.ceph.com/john-2014-08-22_02:21:21-rados-wip-objecter-testing-basic-multi/440739/

2014-08-21T19:46:25.790 INFO:tasks.rados.rados.0.plana30.stdout:1532:  finishing copy_from to plana3025171-487
2014-08-21T19:46:25.790 INFO:tasks.rados.rados.0.plana30.stdout:update_object_version oid 487 v 29 (ObjNum 24 snap 0 seq_num 24) dirty exists
2014-08-21T19:46:25.790 INFO:tasks.rados.rados.0.plana30.stdout:1528: done (5 left)
2014-08-21T19:46:25.791 INFO:tasks.rados.rados.0.plana30.stdout:1529: done (4 left)
2014-08-21T19:46:25.791 INFO:tasks.rados.rados.0.plana30.stdout:1531: done (3 left)
2014-08-21T19:46:25.791 INFO:tasks.rados.rados.0.plana30.stdout: waiting on 3
2014-08-21T19:46:25.801 INFO:tasks.rados.rados.0.plana30.stdout:1536: done (2 left)
2014-08-21T19:46:25.801 INFO:tasks.rados.rados.0.plana30.stdout: waiting on 2
2014-08-21T19:46:25.823 INFO:tasks.rados.rados.0.plana30.stdout:1534:  expect (ObjNum 38031968 snap 0 seq_num 2149557312)
2014-08-21T19:46:25.827 INFO:tasks.rados.rados.0.plana30.stderr:./test/osd/RadosModel.h: In function 'virtual void ReadOp::_finish(TestOp::CallbackInfo*)' thread 7f36c6ffd700 time 2014-08-21 19:46:25.822360
2014-08-21T19:46:25.827 INFO:tasks.rados.rados.0.plana30.stderr:./test/osd/RadosModel.h: 1063: FAILED assert(!old_value.deleted())
2014-08-21T19:46:25.828 INFO:tasks.rados.rados.0.plana30.stderr: ceph version 0.84-630-gc10b489 (c10b48963dc879ec97a238447905abc34e06caa6)
2014-08-21T19:46:25.828 INFO:tasks.rados.rados.0.plana30.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0x7f36d315ffcf]
2014-08-21T19:46:25.829 INFO:tasks.rados.rados.0.plana30.stderr: 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0x159d) [0x41c34d]
2014-08-21T19:46:25.829 INFO:tasks.rados.rados.0.plana30.stderr: 3: (librados::C_AioComplete::finish(int)+0x1d) [0x7f36d30c308d]
2014-08-21T19:46:25.829 INFO:tasks.rados.rados.0.plana30.stderr: 4: (Context::complete(int)+0x9) [0x7f36d309f449]
2014-08-21T19:46:25.830 INFO:tasks.rados.rados.0.plana30.stderr: 5: (Finisher::finisher_thread_entry()+0x160) [0x7f36d315f1a0]
2014-08-21T19:46:25.830 INFO:tasks.rados.rados.0.plana30.stderr: 6: (()+0x7e9a) [0x7f36d2d02e9a]
2014-08-21T19:46:25.830 INFO:tasks.rados.rados.0.plana30.stderr: 7: (clone()+0x6d) [0x7f36d25193fd]

Files

test.log (1.05 MB) test.log John Spray, 08/22/2014 12:02 PM
ceph-client.0.25123.log (10.6 MB) ceph-client.0.25123.log John Spray, 08/22/2014 12:02 PM
crash2_minimal_a.yaml (553 Bytes) crash2_minimal_a.yaml John Spray, 08/23/2014 03:13 PM

Updated by John Spray over 9 years ago

Does not reproduce very often, but eventually caught in the act with debug turned up.

The oid in the asserting operation is "205",

#0  0x00007f5d0f341425 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007f5d0f344b8b in __GI_abort () at abort.c:91
#2  0x00007f5d0f99869d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007f5d0f996846 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007f5d0f996873 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007f5d0f99696e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007f5d10046849 in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=1063, func=0x4292a0 "virtual void ReadOp::_finish(TestOp::CallbackInfo*)")
    at common/assert.cc:77
#7  0x000000000041c34d in ReadOp::_finish (this=0x8d97b0, info=<optimized out>) at ./test/osd/RadosModel.h:1063
#8  0x00007f5d0ffa914d in librados::C_AioComplete::finish (this=<optimized out>, r=<optimized out>) at ./librados/AioCompletionImpl.h:180
---Type <return> to continue, or q <return> to quit---
#9  0x00007f5d0ff854d9 in Context::complete (this=0x7f5cb40020c0, r=<optimized out>) at ./include/Context.h:64
#10 0x00007f5d10045840 in Finisher::finisher_thread_entry (this=0x7cce90) at common/Finisher.cc:56
#11 0x00007f5d0fbe8e9a in start_thread (arg=0x7f5cf3fff700) at pthread_create.c:308
#12 0x00007f5d0f3ff3fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#13 0x0000000000000000 in ?? ()
Actions #2

Updated by John Spray over 9 years ago

So it turns out that ceph_test_rados is also crashy on master, as I found when I took my reproducer for this issue and ran it there. Can't say 100% that it's the same issue we're seeing here, but in both cases the badness is in structures belonging to the test itself. Crash I get on master is this:

2014-08-23 19:19:19,493.493 INFO:tasks.rados.rados.0.mira097.stdout:261: done (0 left)
2014-08-23 19:19:19,726.726 INFO:tasks.rados.rados.0.mira097.stdout:266: done (0 left)
2014-08-23 19:19:19,726.726 INFO:tasks.rados.rados.0.mira097.stdout:267: read oid 144 snap -1
2014-08-23 19:19:19,727.727 INFO:tasks.rados.rados.0.mira097.stdout:268: copy_from oid 18 from oid 155 current snap is 1
2014-08-23 19:19:19,727.727 INFO:tasks.rados.rados.0.mira097.stdout:269: delete oid 108 current snap is 1
2014-08-23 19:19:19,729.729 INFO:tasks.rados.rados.0.mira097.stdout:267:  expect (ObjNum 143 snap 0 seq_num 143)
2014-08-23 19:19:19,729.729 INFO:tasks.rados.rados.0.mira097.stderr:ceph_test_rados: test/osd/Object.cc:90: ObjectDesc::iterator& ObjectDesc::iterator::advance(bool): Assertion `pos < limit' failed.

Minimal teuthology reproducer attached.

Actions #3

Updated by John Spray over 9 years ago

  • Subject changed from (wip-objecter) RadosModel.h: 1063: FAILED assert(!old_value.deleted()) to RadosModel.h: 1063: FAILED assert(!old_value.deleted())
Actions #4

Updated by John Spray over 9 years ago

fwiw the reproducer hits a crash on firefly, but not emperor or dumpling. A fair bit changed in ceph_test_rados for firefly though, so that's not too much extra info.

Actions #6

Updated by John Spray over 9 years ago

  • Subject changed from RadosModel.h: 1063: FAILED assert(!old_value.deleted()) to ceph_test_rados: ObjectDesc::iterator::advance(bool): Assertion `pos < limit' failed.

#9054 already existed for the !old_value.deleted issue specifically, renaming this ticket for the issues found by the reproducer yaml here.

Actions #7

Updated by Sage Weil over 9 years ago

  • Priority changed from Normal to Urgent
  • Source changed from other to Q/A
Actions #8

Updated by Sage Weil over 9 years ago

can we retest now that #9054 fix is merged?

Actions #9

Updated by John Spray over 9 years ago

Still fails on master e11f061

Actions #10

Updated by Sage Weil over 9 years ago

  • Status changed from New to In Progress
  • Assignee set to Sage Weil
Actions #11

Updated by Samuel Just over 9 years ago

  • Status changed from In Progress to 7
  • Assignee changed from Sage Weil to Samuel Just
Actions #12

Updated by Samuel Just over 9 years ago

  • Status changed from 7 to Fix Under Review
Actions #13

Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to Resolved
  • Priority changed from Urgent to High
  • Backport set to firefly
Actions

Also available in: Atom PDF