Bug #16827
closed"FAILED assert(0 == "racing read got wrong version")" in rados-jewel-distro-basic-smithi
0%
Description
Run: http://pulpito.ceph.com/teuthology-2016-07-20_22:00:01-rados-jewel-distro-basic-smithi/
Jobs: 326655, 326535
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2016-07-20_22:00:01-rados-jewel-distro-basic-smithi/326535/teuthology.log
2016-07-25T21:17:32.196 INFO:tasks.rados.rados.0.smithi048.stderr:./test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7f6488ff9700 time 2016-07-26 04:17:32.189157 2016-07-25T21:17:32.197 INFO:tasks.rados.rados.0.smithi048.stderr:./test/osd/RadosModel.h: 892: FAILED assert(0 == "racing read got wrong version") 2016-07-25T21:17:32.197 INFO:tasks.rados.rados.0.smithi048.stderr: ceph version 10.2.2-101-gb15cf42 (b15cf42a4be7bb290e095cd5027d7f9ac604a97d) 2016-07-25T21:17:32.198 INFO:tasks.rados.rados.0.smithi048.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f649f76a67b] 2016-07-25T21:17:32.198 INFO:tasks.rados.rados.0.smithi048.stderr: 2: (WriteOp::_finish(TestOp::CallbackInfo*)+0x406) [0x7f649f754966] 2016-07-25T21:17:32.198 INFO:tasks.rados.rados.0.smithi048.stderr: 3: (write_callback(void*, void*)+0x19) [0x7f649f761989] 2016-07-25T21:17:32.199 INFO:tasks.rados.rados.0.smithi048.stderr: 4: (()+0x998ad) [0x7f6495d768ad] 2016-07-25T21:17:32.199 INFO:tasks.rados.rados.0.smithi048.stderr: 5: (()+0x83049) [0x7f6495d60049] 2016-07-25T21:17:32.199 INFO:tasks.rados.rados.0.smithi048.stderr: 6: (()+0x164036) [0x7f6495e41036] 2016-07-25T21:17:32.200 INFO:tasks.rados.rados.0.smithi048.stderr: 7: (()+0x8182) [0x7f64956ad182] 2016-07-25T21:17:32.200 INFO:tasks.rados.rados.0.smithi048.stderr: 8: (clone()+0x6d) [0x7f649401e47d] 2016-07-25T21:17:32.200 INFO:tasks.rados.rados.0.smithi048.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by David Zafman over 7 years ago
My analysis is that a proxy write interleaved with a promotion. The promotion expected the user version to be 3513 and I assume that the "commit on proxywrite" with RETRY=1 caused the user version to bump to 3514 at the base object.
2016-07-26 04:17:14.822591 7f7036891700 10 osd.1 pg_epoch: 499 pg[2.2( v 490'371 (333'211,490'371] local-les=499 n=20 ec=8 les/c/f 499/432/0 498/498/498) [1,5] r=0 lpr=498 pi=380-497/3 crt=477'367 lcod 490'370 mlcod 0'0 active NIBBLEWISE] do_op osd_op(client.4121.0:7215 2.5cebb3d2 smithi04822435-479 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo [read 0~1] snapc 0=[] RETRY=1 ack+retry+read+rwordered+known_if_redirected e498) v7 may_read -> write-ordered flags ack+retry+read+rwordered+known_if_redirected 2016-07-26 04:17:14.822610 7f7036891700 10 osd.1 pg_epoch: 499 pg[2.2( v 490'371 (333'211,490'371] local-les=499 n=20 ec=8 les/c/f 499/432/0 498/498/498) [1,5] r=0 lpr=498 pi=380-497/3 crt=477'367 lcod 490'370 mlcod 0'0 active NIBBLEWISE] wait_for_blocked_object 2:4bcdd73a:::smithi04822435-479 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head 0x7f7057f92600 2016-07-26 04:17:32.029229 7f7030084700 10 osd.1 pg_epoch: 503 pg[2.2( v 490'371 (333'211,490'371] local-les=499 n=20 ec=8 les/c/f 499/499/0 498/498/498) [1,5] r=0 lpr=498 crt=477'367 lcod 490'370 mlcod 0'0 active+clean NIBBLEWISE] finish_proxy_write 2:4bcdd73a:::smithi04822435-479 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head tid 287 (0) Success 2016-07-26 04:17:32.029250 7f7030084700 10 osd.1 pg_epoch: 503 pg[2.2( v 490'371 (333'211,490'371] local-les=499 n=20 ec=8 les/c/f 499/499/0 498/498/498) [1,5] r=0 lpr=498 crt=477'367 lcod 490'370 mlcod 0'0 active+clean NIBBLEWISE] sending commit on proxywrite(0x7f7030082ad0 3507 pwop_tid=287 op=osd_op(client.4121.0:7211 2.5cebb3d2 smithi04822435-479 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo [write 645699~748749,stat] snapc 1a6=[1a6,1a3,1a1,19d,19c,19b,199,197,196,195,191,184,182,177,176,173,172] RETRY=1 ack+ondisk+retry+write+known_if_redirected e498) v7) 0x7f7057c10580 2016-07-26 04:17:32.059618 7f7030084700 10 osd.1 pg_epoch: 503 pg[2.2( v 503'372 (333'211,503'372] local-les=499 n=21 ec=8 les/c/f 499/499/0 498/498/498) [1,5] r=0 lpr=498 crt=477'367 lcod 490'371 mlcod 490'371 active+clean NIBBLEWISE] process_copy_chunk 2:4bcdd73a:::smithi04822435-479 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head tid 289 (0) Success 2016-07-26 04:17:32.066920 7f7030084700 10 osd.1 pg_epoch: 503 pg[2.2( v 503'372 (333'211,503'372] local-les=499 n=21 ec=8 les/c/f 499/499/0 498/498/498) [1,5] r=0 lpr=498 crt=490'371 lcod 490'371 mlcod 490'371 active+clean NIBBLEWISE] process_copy_chunk 2:4bcdd73a:::smithi04822435-479 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head tid 294 (34) Numerical result out of range 2016-07-26 04:17:32.067012 7f7030084700 10 osd.1 pg_epoch: 503 pg[2.2( v 503'372 (333'211,503'372] local-les=499 n=21 ec=8 les/c/f 499/499/0 498/498/498) [1,5] r=0 lpr=498 crt=490'371 lcod 490'371 mlcod 490'371 active+clean NIBBLEWISE] finish_promote 2:4bcdd73a:::smithi04822435-479 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head r=-34 uv3513
Updated by Samuel Just over 7 years ago
- Is duplicate of Bug #16236: cache/proxied ops from different primaries (cache interval change) don't order propertly, -ERANGE on write in ceph_test_rados added
Updated by Yuri Weinstein over 7 years ago
2016-08-08T13:08:36.839 INFO:tasks.rados.rados.0.smithi046.stderr:Error: oid 14 write returned error code -34 2016-08-08T13:08:36.841 INFO:tasks.rados.rados.0.smithi046.stderr:Error: oid 14 write returned error code -34 2016-08-08T13:08:36.842 INFO:tasks.rados.rados.0.smithi046.stderr:Error: racing read on 14 returned version 0 rather than version 2 2016-08-08T13:08:36.843 INFO:tasks.rados.rados.0.smithi046.stderr:./test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7f70097fa700 time 2016-08-08 20:08:36.189949 2016-08-08T13:08:36.844 INFO:tasks.rados.rados.0.smithi046.stderr:./test/osd/RadosModel.h: 892: FAILED assert(0 == "racing read got wrong version") 2016-08-08T13:08:36.845 INFO:tasks.rados.rados.0.smithi046.stderr: ceph version 10.2.2-195-g5c98730 (5c98730854f11b0efb3b3e03be426ce2b7a999af) 2016-08-08T13:08:36.847 INFO:tasks.rados.rados.0.smithi046.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f70204aa6cb] 2016-08-08T13:08:36.847 INFO:tasks.rados.rados.0.smithi046.stderr: 2: (WriteOp::_finish(TestOp::CallbackInfo*)+0x406) [0x7f70204949b6] 2016-08-08T13:08:36.849 INFO:tasks.rados.rados.0.smithi046.stderr: 3: (write_callback(void*, void*)+0x19) [0x7f70204a19d9] 2016-08-08T13:08:36.850 INFO:tasks.rados.rados.0.smithi046.stderr: 4: (()+0x998ed) [0x7f7016ab68ed] 2016-08-08T13:08:36.851 INFO:tasks.rados.rados.0.smithi046.stderr: 5: (()+0x83089) [0x7f7016aa0089] 2016-08-08T13:08:36.853 INFO:tasks.rados.rados.0.smithi046.stderr: 6: (()+0x164086) [0x7f7016b81086] 2016-08-08T13:08:36.854 INFO:tasks.rados.rados.0.smithi046.stderr: 7: (()+0x8182) [0x7f70163ed182] 2016-08-08T13:08:36.854 INFO:tasks.rados.rados.0.smithi046.stderr: 8: (clone()+0x6d) [0x7f7014d5e47d] 2016-08-08T13:08:36.855 INFO:tasks.rados.rados.0.smithi046.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Yuri Weinstein over 7 years ago
Run: http://pulpito.ceph.com/teuthology-2016-10-29_11:00:03-rados-jewel-distro-basic-smithi/
Job: 500945
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2016-10-29_11:00:03-rados-jewel-distro-basic-smithi/500945/teuthology.log
2016-10-29T11:20:12.626 INFO:tasks.rados.rados.0.smithi046.stdout:update_object_version oid 10 v 1946 (ObjNum 922 snap 149 seq_num 922) dirty exists 2016-10-29T11:20:12.626 INFO:tasks.rados.rados.0.smithi046.stderr:Error: oid 10 write returned error code -34 2016-10-29T11:20:12.626 INFO:tasks.rados.rados.0.smithi046.stderr:Error: oid 10 write returned error code -34 2016-10-29T11:20:12.626 INFO:tasks.rados.rados.0.smithi046.stderr:Error: racing read on 10 returned version 0 rather than version 1946 2016-10-29T11:20:12.626 INFO:tasks.rados.rados.0.smithi046.stderr:./test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7efecb7fe700 time 2016-10-29 11:20:12.624522 2016-10-29T11:20:12.626 INFO:tasks.rados.rados.0.smithi046.stderr:./test/osd/RadosModel.h: 892: FAILED assert(0 == "racing read got wrong version") 2016-10-29T11:20:12.648 INFO:tasks.rados.rados.0.smithi046.stderr: ceph version 10.2.3-337-g5efb6b1 (5efb6b1c2c9eb68f479446e7b42cd8945a18dd53) 2016-10-29T11:20:12.648 INFO:tasks.rados.rados.0.smithi046.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7efee672f82b] 2016-10-29T11:20:12.648 INFO:tasks.rados.rados.0.smithi046.stderr: 2: (WriteOp::_finish(TestOp::CallbackInfo*)+0x406) [0x7efee6719b16] 2016-10-29T11:20:12.648 INFO:tasks.rados.rados.0.smithi046.stderr: 3: (write_callback(void*, void*)+0x19) [0x7efee6726b39] 2016-10-29T11:20:12.648 INFO:tasks.rados.rados.0.smithi046.stderr: 4: (()+0x9abfd) [0x7efedcd39bfd] 2016-10-29T11:20:12.648 INFO:tasks.rados.rados.0.smithi046.stderr: 5: (()+0x84389) [0x7efedcd23389] 2016-10-29T11:20:12.648 INFO:tasks.rados.rados.0.smithi046.stderr: 6: (()+0x168466) [0x7efedce07466] 2016-10-29T11:20:12.648 INFO:tasks.rados.rados.0.smithi046.stderr: 7: (()+0x8182) [0x7efedc66f182] 2016-10-29T11:20:12.649 INFO:tasks.rados.rados.0.smithi046.stderr: 8: (clone()+0x6d) [0x7efedafe047d] 2016-10-29T11:20:12.649 INFO:tasks.rados.rados.0.smithi046.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Yuri Weinstein over 7 years ago
Run: http://pulpito.ceph.com/teuthology-2017-01-09_11:00:04-rados-jewel-distro-basic-smithi/
Jobs: ['701405', '701492']
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2017-01-09_11:00:04-rados-jewel-distro-basic-smithi/701405/teuthology.log
2017-01-10T00:15:44.233 INFO:tasks.ceph.osd.1.smithi004.stderr:2017-01-10 00:15:44.233560 7f393a739700 -1 osd.1 pg_epoch: 726 pg[2.3( v 726'918 (0'0,726'918] local-les=724 n=52 ec=7 les/c/f 724/724/0 723/723/723) [1,5] r=0 lpr=723 crt=720'917 lcod 720'917 mlcod 720'917 active+clean NIBBLEWISE] finish_promote unexpected promote error (34) Numerical result out of range 2017-01-10T00:15:44.237 INFO:tasks.rados.rados.0.smithi004.stdout:3509: finishing write tid 2 to smithi0048322-473 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 2017-01-10T00:15:44.240 INFO:tasks.rados.rados.0.smithi004.stdout:3509: finishing write tid 4 to smithi0048322-473 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 2017-01-10T00:15:44.242 INFO:tasks.rados.rados.0.smithi004.stdout:3509: finishing write tid 5 to smithi0048322-473 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 2017-01-10T00:15:44.245 INFO:tasks.rados.rados.0.smithi004.stdout:update_object_version oid 473 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo v 4420 (ObjNum 1462 snap 324 seq_num 1462) dirty exists 2017-01-10T00:15:44.248 INFO:tasks.rados.rados.0.smithi004.stderr:Error: oid 473 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo write returned error code -34 2017-01-10T00:15:44.250 INFO:tasks.rados.rados.0.smithi004.stderr:Error: oid 473 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo write returned error code -34 2017-01-10T00:15:44.253 INFO:tasks.rados.rados.0.smithi004.stderr:Error: racing read on 473 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo returned version 0 rather than version 4420 2017-01-10T00:15:44.255 INFO:tasks.rados.rados.0.smithi004.stderr:./test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7fdddeffd700 time 2017-01-10 00:15:44.234278 2017-01-10T00:15:44.258 INFO:tasks.rados.rados.0.smithi004.stderr:./test/osd/RadosModel.h: 892: FAILED assert(0 == "racing read got wrong version") 2017-01-10T00:15:44.261 INFO:tasks.rados.rados.0.smithi004.stderr: ceph version 10.2.5-5648-g5b402f8 (5b402f8a7b5a763852e93cd0a5decd34572f4518) 2017-01-10T00:15:44.264 INFO:tasks.rados.rados.0.smithi004.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7fddfa478105] 2017-01-10T00:15:44.273 INFO:tasks.rados.rados.0.smithi004.stderr: 2: (WriteOp::_finish(TestOp::CallbackInfo*)+0x46f) [0x7fddfa4617df] 2017-01-10T00:15:44.276 INFO:tasks.rados.rados.0.smithi004.stderr: 3: (write_callback(void*, void*)+0x19) [0x7fddfa46ef29] 2017-01-10T00:15:44.278 INFO:tasks.rados.rados.0.smithi004.stderr: 4: (()+0x9ea8d) [0x7fddf0a6aa8d] 2017-01-10T00:15:44.281 INFO:tasks.rados.rados.0.smithi004.stderr: 5: (()+0x86f79) [0x7fddf0a52f79] 2017-01-10T00:15:44.283 INFO:tasks.rados.rados.0.smithi004.stderr: 6: (()+0x173a56) [0x7fddf0b3fa56] 2017-01-10T00:15:44.290 INFO:tasks.rados.rados.0.smithi004.stderr: 7: (()+0x7dc5) [0x7fddefe38dc5] 2017-01-10T00:15:44.293 INFO:tasks.rados.rados.0.smithi004.stderr: 8: (clone()+0x6d) [0x7fddeed2073d] 2017-01-10T00:15:44.296 INFO:tasks.rados.rados.0.smithi004.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.