Bug #22743
open
"RadosModel.h: 854: FAILED assert(0)" in upgrade:hammer-x-jewel-distro-basic-smithi
Added by Yuri Weinstein over 6 years ago.
Updated about 6 years ago.
Category:
Correctness/Safety
ceph-qa-suite:
upgrade/hammer-x
Description
Run: http://pulpito.ceph.com/teuthology-2018-01-19_01:15:02-upgrade:hammer-x-jewel-distro-basic-smithi/
Job: 2088826
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2018-01-19_01:15:02-upgrade:hammer-x-jewel-distro-basic-smithi/2088826/teuthology.log
2018-01-21T16:58:24.901 INFO:tasks.rados.rados.0.smithi076.stderr:Error: finished tid 1 when last_acked_tid was 6
2018-01-21T16:58:24.906 INFO:tasks.rados.rados.0.smithi076.stderr:./test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7f35dbfff700 time 2018-01-21 16:58:24.910783
2018-01-21T16:58:24.906 INFO:tasks.rados.rados.0.smithi076.stderr:./test/osd/RadosModel.h: 854: FAILED assert(0)
2018-01-21T16:58:24.906 INFO:tasks.rados.rados.0.smithi076.stderr: ceph version 0.94.10-85-ga8e54ce (a8e54cee69fc2fdc8df27f35ebe1b56444f43317)
2018-01-21T16:58:24.906 INFO:tasks.rados.rados.0.smithi076.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x4e147b]
2018-01-21T16:58:24.907 INFO:tasks.rados.rados.0.smithi076.stderr: 2: (WriteOp::_finish(TestOp::CallbackInfo*)+0x46e) [0x4c8bfe]
2018-01-21T16:58:24.907 INFO:tasks.rados.rados.0.smithi076.stderr: 3: (write_callback(void*, void*)+0x19) [0x4d8a79]
2018-01-21T16:58:24.907 INFO:tasks.rados.rados.0.smithi076.stderr: 4: (()+0x9526d) [0x7f35f4f3526d]
2018-01-21T16:58:24.907 INFO:tasks.rados.rados.0.smithi076.stderr: 5: (()+0x704c9) [0x7f35f4f104c9]
2018-01-21T16:58:24.907 INFO:tasks.rados.rados.0.smithi076.stderr: 6: (()+0x1351e8) [0x7f35f4fd51e8]
2018-01-21T16:58:24.907 INFO:tasks.rados.rados.0.smithi076.stderr: 7: (()+0x8184) [0x7f35f4a74184]
2018-01-21T16:58:24.907 INFO:tasks.rados.rados.0.smithi076.stderr: 8: (clone()+0x6d) [0x7f35f33f4ffd]
2018-01-21T16:58:24.907 INFO:tasks.rados.rados.0.smithi076.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
- Project changed from Ceph to RADOS
- Category set to Correctness/Safety
- Priority changed from Urgent to Normal
- Release set to hammer
I don't think a bug in a hammer binary during an upgrade test to jewel is an urgent problem at this point?
- Related to Bug #22064: "RadosModel.h: 865: FAILED assert(0)" in rados-jewel-distro-basic-smithi added
- Related to Bug #22123: osd: objecter sends out of sync with pg epochs for proxied ops added
@David I guess this is a duplicate, too?
@Nathan Weinberg This doesn't have cache tier, so it would be a different issue. Maybe related to upgrade?
I looked at it briefly and the output doesn't make any sense to me, but I don't have a lot of context around what these tests are doing. Namely, I don't see a tid write 6 being sent out anywhere nearby.
2018-02-16T02:28:25.685 INFO:tasks.rados.rados.0.smithi141.stdout:3573: writing smithi14116405-7 0123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299 from 533518 to 1261370 tid 1
2018-02-16T02:28:25.713 INFO:tasks.rados.rados.0.smithi141.stdout:3573: writing smithi14116405-7 0123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299 from 2029614 to 2524516 tid 2
2018-02-16T02:28:25.739 INFO:tasks.rados.rados.0.smithi141.stdout:3573: writing smithi14116405-7 0123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299 from 3193506 to 3733518 tid 3
2018-02-16T02:28:25.740 INFO:tasks.rados.rados.0.smithi141.stdout:3574: read oid 42 snap -1
2018-02-16T02:28:25.740 INFO:tasks.rados.rados.0.smithi141.stdout:3574: expect deleted
2018-02-16T02:28:25.740 INFO:tasks.rados.rados.0.smithi141.stdout:3575: snap_create
2018-02-16T02:28:25.742 INFO:tasks.rados.rados.0.smithi141.stdout:3573: finishing write tid 6 to smithi14116405-7 0123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299
2018-02-16T02:28:26.651 INFO:tasks.ceph.mon.c.smithi184.stderr:2018-02-16 02:28:26.649819 7fb7d3e73700 -1 mon.c@2(peon).osd e1600 update_from_paxos full map CRC mismatch, resetting to canonical
2018-02-16T02:28:26.652 INFO:tasks.ceph.mon.b.smithi184.stderr:2018-02-16 02:28:26.650992 7fb08a9bb700 -1 mon.b@1(peon).osd e1600 update_from_paxos full map CRC mismatch, resetting to canonical
2018-02-16T02:28:26.654 INFO:tasks.rados.rados.0.smithi141.stdout:3574: done (5 left)
2018-02-16T02:28:26.654 INFO:tasks.rados.rados.0.smithi141.stdout:3575: done (4 left)
2018-02-16T02:28:26.655 INFO:tasks.rados.rados.0.smithi141.stdout:3576: snap_create
2018-02-16T02:28:27.431 INFO:tasks.ceph.mon.b.smithi184.stderr:2018-02-16 02:28:27.429488 7fb08a9bb700 -1 mon.b@1(peon).osd e1601 update_from_paxos full map CRC mismatch, resetting to canonical
2018-02-16T02:28:27.432 INFO:tasks.ceph.mon.c.smithi184.stderr:2018-02-16 02:28:27.429488 7fb7d3e73700 -1 mon.c@2(peon).osd e1601 update_from_paxos full map CRC mismatch, resetting to canonical
2018-02-16T02:28:27.434 INFO:tasks.rados.rados.0.smithi141.stdout:3576: done (4 left)
2018-02-16T02:28:27.435 INFO:tasks.rados.rados.0.smithi141.stdout:3577: rollback oid 36 current snap is 456
2018-02-16T02:28:27.435 INFO:tasks.rados.rados.0.smithi141.stdout:rollback oid 36 to 446
2018-02-16T02:28:27.435 INFO:tasks.rados.rados.0.smithi141.stdout:3578: snap_create
2018-02-16T02:28:27.457 INFO:tasks.rados.rados.0.smithi141.stdout:3573: finishing write tid 1 to smithi14116405-7 0123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299
2018-02-16T02:28:27.457 INFO:tasks.rados.rados.0.smithi141.stderr:Error: finished tid 1 when last_acked_tid was 6
So I guess it's not outputting all the tids it's using, but at a quick skim of the source I'm not sure how. Nonetheless, it's obviously happening, and if you scroll back searching for object smithi14116405-7, you'll find that same basic pattern of writing tids 1-3 or 1-4, and then getting "finishing write tid" statements for 1-6. All in all, I'm confuzzled.
Also available in: Atom
PDF