Project

General

Profile

Bug #18366

Updated by Sage Weil over 7 years ago

/a/sage-2016-12-28_23:29:38-rados-wip-sage-testing---basic-smithi/673129 

 <pre> 
 remote/smithi053/log/ceph-osd.1.log:2016-12-29 01:11:09.646003 7fa73be07700    1 -- 172.21.15.53:6804/25434 --> 172.21.15.21:0/1021185627 -- osd_op_reply(34303 smithi02126561-7602 [write 550007~604296,stat] v61'5658 uv6288 ondisk = 0) v7 -- 0x7fa7700023c0 con 0 
 remote/smithi021/log/ceph-osd.5.log:2016-12-29 01:11:09.650247 7f24b0c81700    1 -- 172.21.15.21:6808/25814 --> 172.21.15.21:0/1021185627 -- osd_op_reply(34303 smithi02126561-7602 [write 550007~604296,stat] v61'2643 uv6236 ondisk = 0) v7 -- 0x7f24e4bb6b00 con 0 
 remote/smithi021/log/ceph-osd.5.log:2016-12-29 01:11:09.663276 7f24b0c81700    1 -- 172.21.15.21:6808/25814 --> 172.21.15.21:0/1021185627 -- osd_op_reply(34304 smithi02126561-7602 [write 1774455~751016] v61'2644 uv6237 ondisk = 0) v7 -- 0x7f24e94d7700 con 0 
 remote/smithi021/log/ceph-osd.5.log:2016-12-29 01:11:09.679605 7f24b0c81700    1 -- 172.21.15.21:6808/25814 --> 172.21.15.21:0/1021185627 -- osd_op_reply(34305 smithi02126561-7602 [write 3215698~534309] v61'2645 uv6238 ondisk = 0) v7 -- 0x7f24e57bd9c0 con 0 
 remote/smithi021/log/ceph-osd.5.log:2016-12-29 01:11:09.680446 7f24b3486700    1 -- 172.21.15.21:6808/25814 --> 172.21.15.21:0/1021185627 -- osd_op_reply(34306 smithi02126561-7602 [setxattr _header (58),truncate 3750007] v61'2646 uv6239 ondisk = 0) v7 -- 0x7f24e47b9e40 con 0 
 remote/smithi021/log/ceph-osd.5.log:2016-12-29 01:11:09.681370 7f24b0c81700    1 -- 172.21.15.21:6808/25814 --> 172.21.15.21:0/1021185627 -- osd_op_reply(34307 smithi02126561-7602 [read 0~1] v0'0 uv6239 ondisk = 0) v7 -- 0x7f24eb646b00 con 0 
 </pre> 
 note the 2 replies for 34303 with differing uv, and the test got 

 <pre> 
 2016-12-29T01:11:58.646 INFO:tasks.rados.rados.0.smithi021.stdout:7602:    finishing write tid 1 to smithi02126561-7602 
 2016-12-29T01:11:58.648 INFO:tasks.rados.rados.0.smithi021.stdout:7602:    finishing write tid 2 to smithi02126561-7602 
 2016-12-29T01:11:58.650 INFO:tasks.rados.rados.0.smithi021.stdout:7602:    finishing write tid 3 to smithi02126561-7602 
 2016-12-29T01:11:58.652 INFO:tasks.rados.rados.0.smithi021.stdout:7602:    finishing write tid 5 to smithi02126561-7602 
 2016-12-29T01:11:58.654 INFO:tasks.rados.rados.0.smithi021.stdout:7602:    finishing write tid 6 to smithi02126561-7602 
 2016-12-29T01:11:58.656 INFO:tasks.rados.rados.0.smithi021.stdout:update_object_version oid 7602 v 6288 (ObjNum 7601 snap 0 seq_num 7601) dirty exists 
 2016-12-29T01:11:58.657 INFO:tasks.rados.rados.0.smithi021.stderr:Error: racing read on 7602 returned version 6239 rather than version 6288 
 2016-12-29T01:11:58.659 INFO:tasks.rados.rados.0.smithi021.stderr:/mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.1.0-6328-gc0967ae/rpm/el7/BUILD/ceph-11.1.0-6328-gc0967ae/src/test/osd/RadosModel.h: In function 'virtual voi 
 d WriteOp::_finish(TestOp::CallbackInfo*)' thread 7f94a6ffd700 time 2016-12-29 01:11:10.057739 
 2016-12-29T01:11:58.661 INFO:tasks.rados.rados.0.smithi021.stderr:/mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.1.0-6328-gc0967ae/rpm/el7/BUILD/ceph-11.1.0-6328-gc0967ae/src/test/osd/RadosModel.h: 896: FAILED assert(0 ==  
 "racing read got wrong version") 
 </pre> 

 on osd.1, 

 <pre>2016-12-29 01:11:09.637017 7fa73e60c700 20 osd.1 pg_epoch: 61 pg[1.3( v 42'5657 (26'2600,42'5657] local-les=11 n=1873 ec=10 les/c/f 11/11/0 10/10/10) [1,2] r=0 lpr=10 crt=42'5657 lcod 42'5656 mlcod 42'5656 active+clean] do_op: op osd_op(client.4123.0:34303 1.d12f0d43 smithi02126561-7602 [write 550007~604296,stat] snapc 0=[] RETRY=1 ondisk+retry+write+known_if_redirected e61) v7 
 </pre> 
 and on osd.5 
 <pre>2016-12-29 01:11:09.635315 7f24b0c81700 20 osd.5 pg_epoch: 61 pg[2.3( v 42'2641 (0'0,42'2641] local-les=12 n=3 ec=11 les/c/f 12/12/0 11/11/11) [5,2] r=0 lpr=11 crt=42'2641 lcod 42'2640 mlcod 42'2640 active+clean] do_op: op osd_op(client.4123.0:34303 2.d12f0d43 smithi02126561-7602 [write 550007~604296,stat] snapc 0=[] ondisk+write+known_if_redirected e40) v7 
 </pre> 
 note that hte osd.4 request was sent in epoch 40, where's osd.1's was sent in epoch 61.   

 for the base pool we have 
 "last_force_op_resend": "60", 

 why didn't 
 but not for the client resend? 

 and/or, cache pool.    we should set it on both so we can handle unsetting *and* setting the osd discard? overlay.

Back