Project

General

Profile

Actions

Bug #16827

closed

"FAILED assert(0 == "racing read got wrong version")" in rados-jewel-distro-basic-smithi

Added by Yuri Weinstein almost 8 years ago. Updated over 7 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Run: http://pulpito.ceph.com/teuthology-2016-07-20_22:00:01-rados-jewel-distro-basic-smithi/
Jobs: 326655, 326535
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2016-07-20_22:00:01-rados-jewel-distro-basic-smithi/326535/teuthology.log

2016-07-25T21:17:32.196 INFO:tasks.rados.rados.0.smithi048.stderr:./test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7f6488ff9700 time 2016-07-26 04:17:32.189157
2016-07-25T21:17:32.197 INFO:tasks.rados.rados.0.smithi048.stderr:./test/osd/RadosModel.h: 892: FAILED assert(0 == "racing read got wrong version")
2016-07-25T21:17:32.197 INFO:tasks.rados.rados.0.smithi048.stderr: ceph version 10.2.2-101-gb15cf42 (b15cf42a4be7bb290e095cd5027d7f9ac604a97d)
2016-07-25T21:17:32.198 INFO:tasks.rados.rados.0.smithi048.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f649f76a67b]
2016-07-25T21:17:32.198 INFO:tasks.rados.rados.0.smithi048.stderr: 2: (WriteOp::_finish(TestOp::CallbackInfo*)+0x406) [0x7f649f754966]
2016-07-25T21:17:32.198 INFO:tasks.rados.rados.0.smithi048.stderr: 3: (write_callback(void*, void*)+0x19) [0x7f649f761989]
2016-07-25T21:17:32.199 INFO:tasks.rados.rados.0.smithi048.stderr: 4: (()+0x998ad) [0x7f6495d768ad]
2016-07-25T21:17:32.199 INFO:tasks.rados.rados.0.smithi048.stderr: 5: (()+0x83049) [0x7f6495d60049]
2016-07-25T21:17:32.199 INFO:tasks.rados.rados.0.smithi048.stderr: 6: (()+0x164036) [0x7f6495e41036]
2016-07-25T21:17:32.200 INFO:tasks.rados.rados.0.smithi048.stderr: 7: (()+0x8182) [0x7f64956ad182]
2016-07-25T21:17:32.200 INFO:tasks.rados.rados.0.smithi048.stderr: 8: (clone()+0x6d) [0x7f649401e47d]
2016-07-25T21:17:32.200 INFO:tasks.rados.rados.0.smithi048.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Related issues 1 (0 open1 closed)

Is duplicate of RADOS - Bug #16236: cache/proxied ops from different primaries (cache interval change) don't order propertly, -ERANGE on write in ceph_test_radosWon't Fix06/10/2016

Actions
Actions #1

Updated by David Zafman over 7 years ago

My analysis is that a proxy write interleaved with a promotion. The promotion expected the user version to be 3513 and I assume that the "commit on proxywrite" with RETRY=1 caused the user version to bump to 3514 at the base object.

2016-07-26 04:17:14.822591 7f7036891700 10 osd.1 pg_epoch: 499 pg[2.2( v 490'371 (333'211,490'371] local-les=499 n=20 ec=8 les/c/f 499/432/0 498/498/498) [1,5] r=0 lpr=498 pi=380-497/3 crt=477'367 lcod 490'370 mlcod 0'0 active NIBBLEWISE] do_op osd_op(client.4121.0:7215 2.5cebb3d2 smithi04822435-479 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo [read 0~1] snapc 0=[] RETRY=1 ack+retry+read+rwordered+known_if_redirected e498) v7 may_read -> write-ordered flags ack+retry+read+rwordered+known_if_redirected
2016-07-26 04:17:14.822610 7f7036891700 10 osd.1 pg_epoch: 499 pg[2.2( v 490'371 (333'211,490'371] local-les=499 n=20 ec=8 les/c/f 499/432/0 498/498/498) [1,5] r=0 lpr=498 pi=380-497/3 crt=477'367 lcod 490'370 mlcod 0'0 active NIBBLEWISE] wait_for_blocked_object 2:4bcdd73a:::smithi04822435-479 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head 0x7f7057f92600
2016-07-26 04:17:32.029229 7f7030084700 10 osd.1 pg_epoch: 503 pg[2.2( v 490'371 (333'211,490'371] local-les=499 n=20 ec=8 les/c/f 499/499/0 498/498/498) [1,5] r=0 lpr=498 crt=477'367 lcod 490'370 mlcod 0'0 active+clean NIBBLEWISE] finish_proxy_write 2:4bcdd73a:::smithi04822435-479 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head tid 287 (0) Success
2016-07-26 04:17:32.029250 7f7030084700 10 osd.1 pg_epoch: 503 pg[2.2( v 490'371 (333'211,490'371] local-les=499 n=20 ec=8 les/c/f 499/499/0 498/498/498) [1,5] r=0 lpr=498 crt=477'367 lcod 490'370 mlcod 0'0 active+clean NIBBLEWISE]  sending commit on proxywrite(0x7f7030082ad0 3507 pwop_tid=287 op=osd_op(client.4121.0:7211 2.5cebb3d2 smithi04822435-479 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo [write 645699~748749,stat] snapc 1a6=[1a6,1a3,1a1,19d,19c,19b,199,197,196,195,191,184,182,177,176,173,172] RETRY=1 ack+ondisk+retry+write+known_if_redirected e498) v7) 0x7f7057c10580
2016-07-26 04:17:32.059618 7f7030084700 10 osd.1 pg_epoch: 503 pg[2.2( v 503'372 (333'211,503'372] local-les=499 n=21 ec=8 les/c/f 499/499/0 498/498/498) [1,5] r=0 lpr=498 crt=477'367 lcod 490'371 mlcod 490'371 active+clean NIBBLEWISE] process_copy_chunk 2:4bcdd73a:::smithi04822435-479 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head tid 289 (0) Success
2016-07-26 04:17:32.066920 7f7030084700 10 osd.1 pg_epoch: 503 pg[2.2( v 503'372 (333'211,503'372] local-les=499 n=21 ec=8 les/c/f 499/499/0 498/498/498) [1,5] r=0 lpr=498 crt=490'371 lcod 490'371 mlcod 490'371 active+clean NIBBLEWISE] process_copy_chunk 2:4bcdd73a:::smithi04822435-479 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head tid 294 (34) Numerical result out of range
2016-07-26 04:17:32.067012 7f7030084700 10 osd.1 pg_epoch: 503 pg[2.2( v 503'372 (333'211,503'372] local-les=499 n=21 ec=8 les/c/f 499/499/0 498/498/498) [1,5] r=0 lpr=498 crt=490'371 lcod 490'371 mlcod 490'371 active+clean NIBBLEWISE] finish_promote 2:4bcdd73a:::smithi04822435-479 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head r=-34 uv3513
Actions #2

Updated by David Zafman over 7 years ago

  • Assignee set to David Zafman
Actions #3

Updated by Samuel Just over 7 years ago

  • Is duplicate of Bug #16236: cache/proxied ops from different primaries (cache interval change) don't order propertly, -ERANGE on write in ceph_test_rados added
Actions #4

Updated by Samuel Just over 7 years ago

  • Status changed from New to Duplicate
Actions #5

Updated by Yuri Weinstein over 7 years ago

also in http://qa-proxy.ceph.com/teuthology/teuthology-2016-08-06_22:00:03-rados-jewel-distro-basic-smithi/352700/teuthology.log

2016-08-08T13:08:36.839 INFO:tasks.rados.rados.0.smithi046.stderr:Error: oid 14 write returned error code -34
2016-08-08T13:08:36.841 INFO:tasks.rados.rados.0.smithi046.stderr:Error: oid 14 write returned error code -34
2016-08-08T13:08:36.842 INFO:tasks.rados.rados.0.smithi046.stderr:Error: racing read on 14 returned version 0 rather than version 2
2016-08-08T13:08:36.843 INFO:tasks.rados.rados.0.smithi046.stderr:./test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7f70097fa700 time 2016-08-08 20:08:36.189949
2016-08-08T13:08:36.844 INFO:tasks.rados.rados.0.smithi046.stderr:./test/osd/RadosModel.h: 892: FAILED assert(0 == "racing read got wrong version")
2016-08-08T13:08:36.845 INFO:tasks.rados.rados.0.smithi046.stderr: ceph version 10.2.2-195-g5c98730 (5c98730854f11b0efb3b3e03be426ce2b7a999af)
2016-08-08T13:08:36.847 INFO:tasks.rados.rados.0.smithi046.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f70204aa6cb]
2016-08-08T13:08:36.847 INFO:tasks.rados.rados.0.smithi046.stderr: 2: (WriteOp::_finish(TestOp::CallbackInfo*)+0x406) [0x7f70204949b6]
2016-08-08T13:08:36.849 INFO:tasks.rados.rados.0.smithi046.stderr: 3: (write_callback(void*, void*)+0x19) [0x7f70204a19d9]
2016-08-08T13:08:36.850 INFO:tasks.rados.rados.0.smithi046.stderr: 4: (()+0x998ed) [0x7f7016ab68ed]
2016-08-08T13:08:36.851 INFO:tasks.rados.rados.0.smithi046.stderr: 5: (()+0x83089) [0x7f7016aa0089]
2016-08-08T13:08:36.853 INFO:tasks.rados.rados.0.smithi046.stderr: 6: (()+0x164086) [0x7f7016b81086]
2016-08-08T13:08:36.854 INFO:tasks.rados.rados.0.smithi046.stderr: 7: (()+0x8182) [0x7f70163ed182]
2016-08-08T13:08:36.854 INFO:tasks.rados.rados.0.smithi046.stderr: 8: (clone()+0x6d) [0x7f7014d5e47d]
2016-08-08T13:08:36.855 INFO:tasks.rados.rados.0.smithi046.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #6

Updated by Yuri Weinstein over 7 years ago

Run: http://pulpito.ceph.com/teuthology-2016-10-29_11:00:03-rados-jewel-distro-basic-smithi/
Job: 500945
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2016-10-29_11:00:03-rados-jewel-distro-basic-smithi/500945/teuthology.log

2016-10-29T11:20:12.626 INFO:tasks.rados.rados.0.smithi046.stdout:update_object_version oid 10 v 1946 (ObjNum 922 snap 149 seq_num 922) dirty exists
2016-10-29T11:20:12.626 INFO:tasks.rados.rados.0.smithi046.stderr:Error: oid 10 write returned error code -34
2016-10-29T11:20:12.626 INFO:tasks.rados.rados.0.smithi046.stderr:Error: oid 10 write returned error code -34
2016-10-29T11:20:12.626 INFO:tasks.rados.rados.0.smithi046.stderr:Error: racing read on 10 returned version 0 rather than version 1946
2016-10-29T11:20:12.626 INFO:tasks.rados.rados.0.smithi046.stderr:./test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7efecb7fe700 time 2016-10-29 11:20:12.624522
2016-10-29T11:20:12.626 INFO:tasks.rados.rados.0.smithi046.stderr:./test/osd/RadosModel.h: 892: FAILED assert(0 == "racing read got wrong version")
2016-10-29T11:20:12.648 INFO:tasks.rados.rados.0.smithi046.stderr: ceph version 10.2.3-337-g5efb6b1 (5efb6b1c2c9eb68f479446e7b42cd8945a18dd53)
2016-10-29T11:20:12.648 INFO:tasks.rados.rados.0.smithi046.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7efee672f82b]
2016-10-29T11:20:12.648 INFO:tasks.rados.rados.0.smithi046.stderr: 2: (WriteOp::_finish(TestOp::CallbackInfo*)+0x406) [0x7efee6719b16]
2016-10-29T11:20:12.648 INFO:tasks.rados.rados.0.smithi046.stderr: 3: (write_callback(void*, void*)+0x19) [0x7efee6726b39]
2016-10-29T11:20:12.648 INFO:tasks.rados.rados.0.smithi046.stderr: 4: (()+0x9abfd) [0x7efedcd39bfd]
2016-10-29T11:20:12.648 INFO:tasks.rados.rados.0.smithi046.stderr: 5: (()+0x84389) [0x7efedcd23389]
2016-10-29T11:20:12.648 INFO:tasks.rados.rados.0.smithi046.stderr: 6: (()+0x168466) [0x7efedce07466]
2016-10-29T11:20:12.648 INFO:tasks.rados.rados.0.smithi046.stderr: 7: (()+0x8182) [0x7efedc66f182]
2016-10-29T11:20:12.649 INFO:tasks.rados.rados.0.smithi046.stderr: 8: (clone()+0x6d) [0x7efedafe047d]
2016-10-29T11:20:12.649 INFO:tasks.rados.rados.0.smithi046.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #7

Updated by Yuri Weinstein over 7 years ago

Run: http://pulpito.ceph.com/teuthology-2017-01-09_11:00:04-rados-jewel-distro-basic-smithi/
Jobs: ['701405', '701492']
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2017-01-09_11:00:04-rados-jewel-distro-basic-smithi/701405/teuthology.log

2017-01-10T00:15:44.233 INFO:tasks.ceph.osd.1.smithi004.stderr:2017-01-10 00:15:44.233560 7f393a739700 -1 osd.1 pg_epoch: 726 pg[2.3( v 726'918 (0'0,726'918] local-les=724 n=52 ec=7 les/c/f 724/724/0 723/723/723) [1,5] r=0 lpr=723 crt=720'917 lcod 720'917 mlcod 720'917 active+clean NIBBLEWISE] finish_promote unexpected promote error (34) Numerical result out of range
2017-01-10T00:15:44.237 INFO:tasks.rados.rados.0.smithi004.stdout:3509:  finishing write tid 2 to smithi0048322-473 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
2017-01-10T00:15:44.240 INFO:tasks.rados.rados.0.smithi004.stdout:3509:  finishing write tid 4 to smithi0048322-473 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
2017-01-10T00:15:44.242 INFO:tasks.rados.rados.0.smithi004.stdout:3509:  finishing write tid 5 to smithi0048322-473 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
2017-01-10T00:15:44.245 INFO:tasks.rados.rados.0.smithi004.stdout:update_object_version oid 473 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo v 4420 (ObjNum 1462 snap 324 seq_num 1462) dirty exists
2017-01-10T00:15:44.248 INFO:tasks.rados.rados.0.smithi004.stderr:Error: oid 473 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo write returned error code -34
2017-01-10T00:15:44.250 INFO:tasks.rados.rados.0.smithi004.stderr:Error: oid 473 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo write returned error code -34
2017-01-10T00:15:44.253 INFO:tasks.rados.rados.0.smithi004.stderr:Error: racing read on 473 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo returned version 0 rather than version 4420
2017-01-10T00:15:44.255 INFO:tasks.rados.rados.0.smithi004.stderr:./test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7fdddeffd700 time 2017-01-10 00:15:44.234278
2017-01-10T00:15:44.258 INFO:tasks.rados.rados.0.smithi004.stderr:./test/osd/RadosModel.h: 892: FAILED assert(0 == "racing read got wrong version")
2017-01-10T00:15:44.261 INFO:tasks.rados.rados.0.smithi004.stderr: ceph version 10.2.5-5648-g5b402f8 (5b402f8a7b5a763852e93cd0a5decd34572f4518)
2017-01-10T00:15:44.264 INFO:tasks.rados.rados.0.smithi004.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7fddfa478105]
2017-01-10T00:15:44.273 INFO:tasks.rados.rados.0.smithi004.stderr: 2: (WriteOp::_finish(TestOp::CallbackInfo*)+0x46f) [0x7fddfa4617df]
2017-01-10T00:15:44.276 INFO:tasks.rados.rados.0.smithi004.stderr: 3: (write_callback(void*, void*)+0x19) [0x7fddfa46ef29]
2017-01-10T00:15:44.278 INFO:tasks.rados.rados.0.smithi004.stderr: 4: (()+0x9ea8d) [0x7fddf0a6aa8d]
2017-01-10T00:15:44.281 INFO:tasks.rados.rados.0.smithi004.stderr: 5: (()+0x86f79) [0x7fddf0a52f79]
2017-01-10T00:15:44.283 INFO:tasks.rados.rados.0.smithi004.stderr: 6: (()+0x173a56) [0x7fddf0b3fa56]
2017-01-10T00:15:44.290 INFO:tasks.rados.rados.0.smithi004.stderr: 7: (()+0x7dc5) [0x7fddefe38dc5]
2017-01-10T00:15:44.293 INFO:tasks.rados.rados.0.smithi004.stderr: 8: (clone()+0x6d) [0x7fddeed2073d]
2017-01-10T00:15:44.296 INFO:tasks.rados.rados.0.smithi004.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions

Also available in: Atom PDF