Bug #9203
closedceph_test_rados: ObjectDesc::iterator::advance(bool): Assertion `pos < limit' failed.
0%
Description
http://pulpito.front.sepia.ceph.com/john-2014-08-22_02:21:21-rados-wip-objecter-testing-basic-multi/440722/
http://pulpito.front.sepia.ceph.com/john-2014-08-22_02:21:21-rados-wip-objecter-testing-basic-multi/440739/
2014-08-21T19:46:25.790 INFO:tasks.rados.rados.0.plana30.stdout:1532: finishing copy_from to plana3025171-487 2014-08-21T19:46:25.790 INFO:tasks.rados.rados.0.plana30.stdout:update_object_version oid 487 v 29 (ObjNum 24 snap 0 seq_num 24) dirty exists 2014-08-21T19:46:25.790 INFO:tasks.rados.rados.0.plana30.stdout:1528: done (5 left) 2014-08-21T19:46:25.791 INFO:tasks.rados.rados.0.plana30.stdout:1529: done (4 left) 2014-08-21T19:46:25.791 INFO:tasks.rados.rados.0.plana30.stdout:1531: done (3 left) 2014-08-21T19:46:25.791 INFO:tasks.rados.rados.0.plana30.stdout: waiting on 3 2014-08-21T19:46:25.801 INFO:tasks.rados.rados.0.plana30.stdout:1536: done (2 left) 2014-08-21T19:46:25.801 INFO:tasks.rados.rados.0.plana30.stdout: waiting on 2 2014-08-21T19:46:25.823 INFO:tasks.rados.rados.0.plana30.stdout:1534: expect (ObjNum 38031968 snap 0 seq_num 2149557312) 2014-08-21T19:46:25.827 INFO:tasks.rados.rados.0.plana30.stderr:./test/osd/RadosModel.h: In function 'virtual void ReadOp::_finish(TestOp::CallbackInfo*)' thread 7f36c6ffd700 time 2014-08-21 19:46:25.822360 2014-08-21T19:46:25.827 INFO:tasks.rados.rados.0.plana30.stderr:./test/osd/RadosModel.h: 1063: FAILED assert(!old_value.deleted()) 2014-08-21T19:46:25.828 INFO:tasks.rados.rados.0.plana30.stderr: ceph version 0.84-630-gc10b489 (c10b48963dc879ec97a238447905abc34e06caa6) 2014-08-21T19:46:25.828 INFO:tasks.rados.rados.0.plana30.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0x7f36d315ffcf] 2014-08-21T19:46:25.829 INFO:tasks.rados.rados.0.plana30.stderr: 2: (ReadOp::_finish(TestOp::CallbackInfo*)+0x159d) [0x41c34d] 2014-08-21T19:46:25.829 INFO:tasks.rados.rados.0.plana30.stderr: 3: (librados::C_AioComplete::finish(int)+0x1d) [0x7f36d30c308d] 2014-08-21T19:46:25.829 INFO:tasks.rados.rados.0.plana30.stderr: 4: (Context::complete(int)+0x9) [0x7f36d309f449] 2014-08-21T19:46:25.830 INFO:tasks.rados.rados.0.plana30.stderr: 5: (Finisher::finisher_thread_entry()+0x160) [0x7f36d315f1a0] 2014-08-21T19:46:25.830 INFO:tasks.rados.rados.0.plana30.stderr: 6: (()+0x7e9a) [0x7f36d2d02e9a] 2014-08-21T19:46:25.830 INFO:tasks.rados.rados.0.plana30.stderr: 7: (clone()+0x6d) [0x7f36d25193fd]
Files
Updated by John Spray over 9 years ago
- File test.log test.log added
- File ceph-client.0.25123.log ceph-client.0.25123.log added
Does not reproduce very often, but eventually caught in the act with debug turned up.
The oid in the asserting operation is "205",
#0 0x00007f5d0f341425 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007f5d0f344b8b in __GI_abort () at abort.c:91 #2 0x00007f5d0f99869d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #3 0x00007f5d0f996846 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #4 0x00007f5d0f996873 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #5 0x00007f5d0f99696e in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #6 0x00007f5d10046849 in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=1063, func=0x4292a0 "virtual void ReadOp::_finish(TestOp::CallbackInfo*)") at common/assert.cc:77 #7 0x000000000041c34d in ReadOp::_finish (this=0x8d97b0, info=<optimized out>) at ./test/osd/RadosModel.h:1063 #8 0x00007f5d0ffa914d in librados::C_AioComplete::finish (this=<optimized out>, r=<optimized out>) at ./librados/AioCompletionImpl.h:180 ---Type <return> to continue, or q <return> to quit--- #9 0x00007f5d0ff854d9 in Context::complete (this=0x7f5cb40020c0, r=<optimized out>) at ./include/Context.h:64 #10 0x00007f5d10045840 in Finisher::finisher_thread_entry (this=0x7cce90) at common/Finisher.cc:56 #11 0x00007f5d0fbe8e9a in start_thread (arg=0x7f5cf3fff700) at pthread_create.c:308 #12 0x00007f5d0f3ff3fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #13 0x0000000000000000 in ?? ()
Updated by John Spray over 9 years ago
- File crash2_minimal_a.yaml crash2_minimal_a.yaml added
So it turns out that ceph_test_rados is also crashy on master, as I found when I took my reproducer for this issue and ran it there. Can't say 100% that it's the same issue we're seeing here, but in both cases the badness is in structures belonging to the test itself. Crash I get on master is this:
2014-08-23 19:19:19,493.493 INFO:tasks.rados.rados.0.mira097.stdout:261: done (0 left) 2014-08-23 19:19:19,726.726 INFO:tasks.rados.rados.0.mira097.stdout:266: done (0 left) 2014-08-23 19:19:19,726.726 INFO:tasks.rados.rados.0.mira097.stdout:267: read oid 144 snap -1 2014-08-23 19:19:19,727.727 INFO:tasks.rados.rados.0.mira097.stdout:268: copy_from oid 18 from oid 155 current snap is 1 2014-08-23 19:19:19,727.727 INFO:tasks.rados.rados.0.mira097.stdout:269: delete oid 108 current snap is 1 2014-08-23 19:19:19,729.729 INFO:tasks.rados.rados.0.mira097.stdout:267: expect (ObjNum 143 snap 0 seq_num 143) 2014-08-23 19:19:19,729.729 INFO:tasks.rados.rados.0.mira097.stderr:ceph_test_rados: test/osd/Object.cc:90: ObjectDesc::iterator& ObjectDesc::iterator::advance(bool): Assertion `pos < limit' failed.
Minimal teuthology reproducer attached.
Updated by John Spray over 9 years ago
- Subject changed from (wip-objecter) RadosModel.h: 1063: FAILED assert(!old_value.deleted()) to RadosModel.h: 1063: FAILED assert(!old_value.deleted())
Updated by John Spray over 9 years ago
fwiw the reproducer hits a crash on firefly, but not emperor or dumpling. A fair bit changed in ceph_test_rados for firefly though, so that's not too much extra info.
Updated by John Spray over 9 years ago
Updated by John Spray over 9 years ago
- Subject changed from RadosModel.h: 1063: FAILED assert(!old_value.deleted()) to ceph_test_rados: ObjectDesc::iterator::advance(bool): Assertion `pos < limit' failed.
#9054 already existed for the !old_value.deleted issue specifically, renaming this ticket for the issues found by the reproducer yaml here.
Updated by Sage Weil over 9 years ago
- Priority changed from Normal to Urgent
- Source changed from other to Q/A
Updated by Sage Weil over 9 years ago
can we retest now that #9054 fix is merged?
Updated by Sage Weil over 9 years ago
- Status changed from New to In Progress
- Assignee set to Sage Weil
Updated by Samuel Just over 9 years ago
- Status changed from In Progress to 7
- Assignee changed from Sage Weil to Samuel Just
Updated by Samuel Just over 9 years ago
- Status changed from 7 to Fix Under Review
Updated by Sage Weil over 9 years ago
- Status changed from Fix Under Review to Resolved
- Priority changed from Urgent to High
- Backport set to firefly