Bug #64192
openIncorrect behavior on combined cmpext+write ops in the face of session resets
0%
Description
There seems to be an expectation mismatch between the OSD and the Objecter (or at least the neorados bit of the Objecter). Based on a quick look, I'm inclined to say that this is an OSD bug.
1. A combined cmpext+write op arrives on the OSD:
2024-01-26T10:52:24.783-0500 7f6a861a1700 1 -- [v2:172.21.9.34:6802/39207300,v1:172.21.9.34:6803/39207300] <== client.4499 172.21.9.34:0/3763700879 4 ==== osd_op(client.4499.0:76 59.0 59.95312129 (undecoded) ondisk+write+known_if_redirected+supports_pool_eio e304) v8 ==== 284+0+28 (crc 0 0 0) 0x5623c974e000 con 0x5623ca360480
2. cmpext fails the compare (as expected -- the client is sending data that doesn't match on purpose):
2024-01-26T10:52:24.783-0500 7f6a645e6700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] _handle_message: osd_op(client.4499.0:76 59.0 59.95312129 (undecoded) ondisk+write+known_if_redirected+supports_pool_eio e304) v8 2024-01-26T10:52:24.783-0500 7f6a645e6700 20 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] do_op: op osd_op(client.4499.0:76 59.0 59:94848ca9:::rbd_data.1193ad1baaa.0000000000000000:head [cmpext 512~14 in=14b,write 512~14 in=14b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e304) v8 2024-01-26T10:52:24.783-0500 7f6a645e6700 20 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] op_has_sufficient_caps session=0x5623c8f2b400 pool=59 (test-librbd-senta04-2299691-58 ) pool_app_metadata={rados={}} need_read_cap=1 need_write_cap=1 classes=[] -> yes 2024-01-26T10:52:24.783-0500 7f6a645e6700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] do_op osd_op(client.4499.0:76 59.0 59:94848ca9:::rbd_data.1193ad1baaa.0000000000000000:head [cmpext 512~14 in=14b,write 512~14 in=14b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e304) v8 may_write may_read -> write-ordered flags ondisk+write+known_if_redirected+supports_pool_eio 2024-01-26T10:52:24.783-0500 7f6a645e6700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] get_object_context: found obc in cache: obc(59:94848ca9:::rbd_data.1193ad1baaa.0000000000000000:head rwstate(none n=0 w=0)) 2024-01-26T10:52:24.783-0500 7f6a645e6700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] get_object_context: obc(59:94848ca9:::rbd_data.1193ad1baaa.0000000000000000:head rwstate(none n=0 w=0)) oi: 59:94848ca9:::rbd_data.1193ad1baaa.0000000000000000:head(304'22 client.4499.0:70 dirty s 526 uv 22 alloc_hint [4194304 4194304 0]) exists: 1 ssc(59:94848ca9:::rbd_data.1193ad1baaa.0000000000000000:snapdir snapset: 0=[]:{} ref: 1 registered: 1 exists: 1) 2024-01-26T10:52:24.783-0500 7f6a645e6700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] find_object_context 59:94848ca9:::rbd_data.1193ad1baaa.0000000000000000:head @head oi=59:94848ca9:::rbd_data.1193ad1baaa.0000000000000000:head(304'22 client.4499.0:70 dirty s 526 uv 22 alloc_hint [4194304 4194304 0]) 2024-01-26T10:52:24.783-0500 7f6a645e6700 20 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] maybe_handle_manifest_detail: 59:94848ca9:::rbd_data.1193ad1baaa.0000000000000000:head is not manifest object 2024-01-26T10:52:24.783-0500 7f6a645e6700 20 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] do_op obc obc(59:94848ca9:::rbd_data.1193ad1baaa.0000000000000000:head rwstate(excl n=1 w=0)) 2024-01-26T10:52:24.783-0500 7f6a645e6700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] execute_ctx 0x5623c967c400 2024-01-26T10:52:24.783-0500 7f6a645e6700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] execute_ctx 59:94848ca9:::rbd_data.1193ad1baaa.0000000000000000:head [cmpext 512~14 in=14b,write 512~14 in=14b] ov 304'22 av 304'28 snapc 0=[] snapset 0=[]:{} 2024-01-26T10:52:24.783-0500 7f6a645e6700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] do_osd_op 59:94848ca9:::rbd_data.1193ad1baaa.0000000000000000:head [cmpext 512~14 in=14b,write 512~14 in=14b] 2024-01-26T10:52:24.783-0500 7f6a645e6700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] do_osd_op cmpext 512~14 in=14b 2024-01-26T10:52:24.783-0500 7f6a645e6700 20 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] do_extent_cmp 2024-01-26T10:52:24.783-0500 7f6a645e6700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] do_osd_op 59:94848ca9:::rbd_data.1193ad1baaa.0000000000000000:head [sync_read 512~14] 2024-01-26T10:52:24.783-0500 7f6a645e6700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] do_osd_op sync_read 512~14 2024-01-26T10:52:24.783-0500 7f6a645e6700 20 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] do_read 2024-01-26T10:52:24.783-0500 7f6a645e6700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] read got 14 / 14 bytes from obj 59:94848ca9:::rbd_data.1193ad1baaa.0000000000000000:head 2024-01-26T10:52:24.783-0500 7f6a645e6700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] do_osd_ops error: (4100) Unknown error 4100 2024-01-26T10:52:24.783-0500 7f6a645e6700 20 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] execute_ctx alloc reply 0x5623c80b2d80 result -4100 2024-01-26T10:52:24.787-0500 7f6a645e6700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] PeeringState::calc_trim_to_aggressive limit = 304'25 2024-01-26T10:52:24.787-0500 7f6a645e6700 20 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] op order client.4499 tid 76 last was 70 2024-01-26T10:52:24.787-0500 7f6a645e6700 20 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] execute_ctx update_log_only -- result=-4100 2024-01-26T10:52:24.787-0500 7f6a645e6700 20 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] record_write_error r=-4100 2024-01-26T10:52:24.787-0500 7f6a645e6700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] submit_log_entries 304'28 (0'0) error 59:94848ca9:::rbd_data.1193ad1baaa.0000000000000000:head by client.4499.0:76 0.000000 -4100 ObjectCleanRegions clean_offsets: [(0, 18446744073709551615)], clean_omap: true, new_object: false 2024-01-26T10:52:24.787-0500 7f6a645e6700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] new_repop: repgather(0x5623c982d800 304'28 rep_tid=1813 committed?=0 r=-4100) 2024-01-26T10:52:24.787-0500 7f6a645e6700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'27 (0'0,304'27] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] PeeringState::merge_new_log_entries 304'28 (0'0) error 59:94848ca9:::rbd_data.1193ad1baaa.0000000000000000:head by client.4499.0:76 0.000000 -4100 ObjectCleanRegions clean_offsets: [(0, 18446744073709551615)], clean_omap: true, new_object: false 2024-01-26T10:52:24.787-0500 7f6a645e6700 20 update missing, append 304'28 (0'0) error 59:94848ca9:::rbd_data.1193ad1baaa.0000000000000000:head by client.4499.0:76 0.000000 -4100 ObjectCleanRegions clean_offsets: [(0, 18446744073709551615)], clean_omap: true, new_object: false 2024-01-26T10:52:24.787-0500 7f6a645e6700 20 osd.0 pg_epoch: 304 pg[59.0( v 304'28 (0'0,304'28] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 lua=304'27 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] PeeringState::append_log_entries_update_missing trim_to bool = 1 trim_to = 0'0 2024-01-26T10:52:24.787-0500 7f6a645e6700 10 trim proposed trim_to = 0'0 2024-01-26T10:52:24.787-0500 7f6a645e6700 6 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 304'28, trimmed: , trimmed_dups: , clear_divergent_priors: 0 2024-01-26T10:52:24.787-0500 7f6a645e6700 10 _write_log_and_missing clearing up to 0'0 dirty_to_dups=0'0 dirty_from_dups=4294967295'18446744073709551615 write_from_dups=4294967295'18446744073709551615 trimmed_dups.size()=0 2024-01-26T10:52:24.787-0500 7f6a645e6700 10 _write_log_and_missing going to encode log.dups.size()=0 2024-01-26T10:52:24.787-0500 7f6a645e6700 10 _write_log_and_missing 1st round encoded log.dups.size()=0 2024-01-26T10:52:24.787-0500 7f6a645e6700 10 _write_log_and_missing 2st round encoded log.dups.size()=0 2024-01-26T10:52:24.787-0500 7f6a645e6700 10 end of _write_log_and_missing 2024-01-26T10:52:24.787-0500 7f6a645e6700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'28 (0'0,304'28] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 lua=304'27 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] op_applied version 304'28 2024-01-26T10:52:24.787-0500 7f6a645e6700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'28 (0'0,304'28] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'25 crt=304'25 lcod 304'24 mlcod 304'24 active+clean ps=[2~1]] PeeringState::calc_trim_to_aggressive limit = 304'25 2024-01-26T10:52:24.787-0500 7f6a645e6700 10 osd.0 304 dequeue_op osd_op(client.4499.0:76 59.0 59:94848ca9:::rbd_data.1193ad1baaa.0000000000000000:head [cmpext 512~14 in=14b,write 512~14 in=14b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e304) v8 finish
3. Before the op is ready, OSD session gets reset (due to ms_inject_socket_failures injection on the client):
2024-01-26T10:52:24.787-0500 7f6a861a1700 1 --2- [v2:172.21.9.34:6802/39207300,v1:172.21.9.34:6803/39207300] >> 172.21.9.34:0/3763700879 conn(0x5623ca360480 0x5623c97b8580 crc :-1 s=READY pgs=20 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_read_frame_preamble_main read frame preamble failed r=-1 2024-01-26T10:52:24.787-0500 7f6a861a1700 1 --2- [v2:172.21.9.34:6802/39207300,v1:172.21.9.34:6803/39207300] >> 172.21.9.34:0/3763700879 conn(0x5623ca360480 0x5623c97b8580 crc :-1 s=READY pgs=20 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).stop 2024-01-26T10:52:24.787-0500 7f6a75b6c700 2 osd.0 304 ms_handle_reset con 0x5623ca360480 session 0x5623c8f2b400
4. The client resends the op and dup detection on the OSD kicks in:
2024-01-26T10:52:24.791-0500 7f6a861a1700 1 -- [v2:172.21.9.34:6802/39207300,v1:172.21.9.34:6803/39207300] <== client.4499 172.21.9.34:0/3763700879 1 ==== osd_op(client.4499.0:76 59.0 59.95312129 (undecoded) ondisk+retry+write+known_if_redirected+supports_pool_eio e304) v8 ==== 284+0+28 (crc 0 0 0) 0x5623c974e780 con 0x5623ca3e6480 2024-01-26T10:52:24.791-0500 7f6a645e6700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'28 (0'0,304'28] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'27 crt=304'25 lcod 304'26 mlcod 304'26 active+clean ps=[2~1]] _handle_message: osd_op(client.4499.0:76 59.0 59.95312129 (undecoded) ondisk+retry+write+known_if_redirected+supports_pool_eio e304) v8 2024-01-26T10:52:24.791-0500 7f6a645e6700 20 osd.0 pg_epoch: 304 pg[59.0( v 304'28 (0'0,304'28] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'27 crt=304'25 lcod 304'26 mlcod 304'26 active+clean ps=[2~1]] do_op: op osd_op(client.4499.0:76 59.0 59:94848ca9:::rbd_data.1193ad1baaa.0000000000000000:head [cmpext 512~14 in=14b,write 512~14 in=14b] snapc 0=[] RETRY=1 ondisk+retry+write+known_if_redirected+supports_pool_eio e304) v8 2024-01-26T10:52:24.791-0500 7f6a645e6700 20 osd.0 pg_epoch: 304 pg[59.0( v 304'28 (0'0,304'28] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'27 crt=304'25 lcod 304'26 mlcod 304'26 active+clean ps=[2~1]] op_has_sufficient_caps session=0x5623c974e280 pool=59 (test-librbd-senta04-2299691-58 ) pool_app_metadata={rados={}} need_read_cap=1 need_write_cap=1 classes=[] -> yes 2024-01-26T10:52:24.791-0500 7f6a645e6700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'28 (0'0,304'28] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'27 crt=304'25 lcod 304'26 mlcod 304'26 active+clean ps=[2~1]] do_op osd_op(client.4499.0:76 59.0 59:94848ca9:::rbd_data.1193ad1baaa.0000000000000000:head [cmpext 512~14 in=14b,write 512~14 in=14b] snapc 0=[] RETRY=1 ondisk+retry+write+known_if_redirected+supports_pool_eio e304) v8 may_write may_read -> write-ordered flags ondisk+retry+write+known_if_redirected+supports_pool_eio 2024-01-26T10:52:24.791-0500 7f6a645e6700 3 osd.0 pg_epoch: 304 pg[59.0( v 304'28 (0'0,304'28] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'27 crt=304'25 lcod 304'26 mlcod 304'26 active+clean ps=[2~1]] do_op dup client.4499.0:76 version 304'28
^^^ It's not logged, but I suspect that PG::check_in_progress_op() returns with user_version = 0 and an empty op_returns vector here.
2024-01-26T10:52:24.791-0500 7f6a645e6700 20 osd.0 pg_epoch: 304 pg[59.0( v 304'28 (0'0,304'28] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'27 crt=304'25 lcod 304'26 mlcod 304'26 active+clean ps=[2~1]] already_complete: 304'28 2024-01-26T10:52:24.791-0500 7f6a645e6700 20 osd.0 pg_epoch: 304 pg[59.0( v 304'28 (0'0,304'28] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'27 crt=304'25 lcod 304'26 mlcod 304'26 active+clean ps=[2~1]] already_complete: repgather(0x5623c982d800 304'28 rep_tid=1813 committed?=0 r=-4100) 2024-01-26T10:52:24.791-0500 7f6a685ee700 20 osd.0 op_wq(0) _process 59.0 to_process <> waiting <> waiting_peering {} 2024-01-26T10:52:24.791-0500 7f6a645e6700 20 osd.0 pg_epoch: 304 pg[59.0( v 304'28 (0'0,304'28] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'27 crt=304'25 lcod 304'26 mlcod 304'26 active+clean ps=[2~1]] already_complete: repgather(0x5623c982d800 304'28 rep_tid=1813 committed?=0 r=-4100) not committed, returning false
5. The op becomes ready:
2024-01-26T10:52:24.791-0500 7f6a685ee700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'30 (0'0,304'30] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'27 crt=304'26 lcod 304'26 mlcod 304'26 active+clean ps=[2~1]] repop_all_committed: repop tid 1813 all committed 2024-01-26T10:52:24.791-0500 7f6a685ee700 20 osd.0 pg_epoch: 304 pg[59.0( v 304'30 (0'0,304'30] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'28 crt=304'26 lcod 304'27 mlcod 304'26 active+clean ps=[2~1]] PeeringState::calc_min_last_complete_ondisk last_complete_ondisk is updated to: 304'27 from: 304'26 2024-01-26T10:52:24.791-0500 7f6a685ee700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'30 (0'0,304'30] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'28 crt=304'26 lcod 304'27 mlcod 304'27 active+clean ps=[2~1]] eval_repop repgather(0x5623c982d800 304'28 rep_tid=1813 committed?=1 r=-4100) 2024-01-26T10:52:24.791-0500 7f6a685ee700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'30 (0'0,304'30] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'28 crt=304'26 lcod 304'27 mlcod 304'27 active+clean ps=[2~1]] commit: repgather(0x5623c982d800 304'28 rep_tid=1813 committed?=1 r=-4100)
6. Reply with user_version = 0 is sent:
2024-01-26T10:52:24.791-0500 7f6a685ee700 1 -- [v2:172.21.9.34:6802/39207300,v1:172.21.9.34:6803/39207300] --> 172.21.9.34:0/3763700879 -- osd_op_reply(76 rbd_data.1193ad1baaa.0000000000000000 [cmpext 512~14,write 512~14] v304'28 uv0 ondisk = -4100 ((4100) Unknown error 4100)) v8 -- 0x5623c92786c0 con 0x5623ca3e6480 2024-01-26T10:52:24.791-0500 7f6a685ee700 20 osd.0 pg_epoch: 304 pg[59.0( v 304'30 (0'0,304'30] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'28 crt=304'26 lcod 304'27 mlcod 304'27 active+clean ps=[2~1]] PeeringState::prepare_stats_for_publish reporting purged_snaps [2~1] 2024-01-26T10:52:24.791-0500 7f6a685ee700 15 osd.0 pg_epoch: 304 pg[59.0( v 304'30 (0'0,304'30] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'28 crt=304'26 lcod 304'27 mlcod 304'27 active+clean ps=[2~1]] PeeringState::prepare_stats_for_publish publish_stats_to_osd 304:75 2024-01-26T10:52:24.791-0500 7f6a685ee700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'30 (0'0,304'30] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'28 crt=304'26 lcod 304'27 mlcod 304'27 active+clean ps=[2~1]] removing repgather(0x5623c982d800 304'28 rep_tid=1813 committed?=1 r=-4100) 2024-01-26T10:52:24.791-0500 7f6a685ee700 20 osd.0 pg_epoch: 304 pg[59.0( v 304'30 (0'0,304'30] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'28 crt=304'26 lcod 304'27 mlcod 304'27 active+clean ps=[2~1]] q front is repgather(0x5623c982d800 304'28 rep_tid=1813 committed?=1 r=-4100) 2024-01-26T10:52:24.791-0500 7f6a685ee700 20 osd.0 pg_epoch: 304 pg[59.0( v 304'30 (0'0,304'30] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'28 crt=304'26 lcod 304'27 mlcod 304'27 active+clean ps=[2~1]] finished operator() r=-4100 2024-01-26T10:52:24.791-0500 7f6a685ee700 10 osd.0 pg_epoch: 304 pg[59.0( v 304'30 (0'0,304'30] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'28 crt=304'26 lcod 304'27 mlcod 304'27 active+clean ps=[2~1]] sending commit on osd_op(client.4499.0:76 59.0 59:94848ca9:::rbd_data.1193ad1baaa.0000000000000000:head [cmpext 512~14 in=14b,write 512~14 in=14b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e304) v8 0x5623c80b2d80
7. Reply with user_version != 0 is sent:
2024-01-26T10:52:24.791-0500 7f6a685ee700 1 -- [v2:172.21.9.34:6802/39207300,v1:172.21.9.34:6803/39207300] --> 172.21.9.34:0/3763700879 -- osd_op_reply(76 rbd_data.1193ad1baaa.0000000000000000 [cmpext 512~14,write 512~14] v304'28 uv22 ondisk = -4100 ((4100) Unknown error 4100)) v8 -- 0x5623c80b2d80 con 0x5623ca360480 2024-01-26T10:52:24.791-0500 7f6a685ee700 20 osd.0 pg_epoch: 304 pg[59.0( v 304'30 (0'0,304'30] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'28 crt=304'26 lcod 304'27 mlcod 304'27 active+clean ps=[2~1]] remove_repop repgather(0x5623c982d800 304'28 rep_tid=1813 committed?=1 r=-4100)
8. On the client side, Objecter processes the reply with user_version = 0. This reply also happens to have rval = 0 for all ops, including cmpext even though it failed the compare:
2024-01-26T10:52:24.791-0500 7ff755ffb700 1 -- 172.21.9.34:0/3763700879 <== osd.0 v2:172.21.9.34:6802/39207300 1 ==== osd_op_reply(76 rbd_data.1193ad1baaa.0000000000000000 [cmpext 512~14,write 512~14] v304'28 uv0 ondisk = -4100 ((4100) Unknown error 4100)) v8 ==== 223+0+0 (crc 0 0 0) 0x7ff704078040 con 0x7ff72c025d50 2024-01-26T10:52:24.791-0500 7ff755ffb700 10 client.4499.objecter ms_dispatch 0x55bffcc34780 osd_op_reply(76 rbd_data.1193ad1baaa.0000000000000000 [cmpext 512~14,write 512~14] v304'28 uv0 ondisk = -4100 ((4100) Unknown error 4100)) v8 2024-01-26T10:52:24.791-0500 7ff755ffb700 10 client.4499.objecter in handle_osd_op_reply 2024-01-26T10:52:24.791-0500 7ff755ffb700 7 client.4499.objecter handle_osd_op_reply 76 ondisk uv 0 in 59.0 attempt 1 2024-01-26T10:52:24.791-0500 7ff755ffb700 10 client.4499.objecter op 0 rval 0 len 0 2024-01-26T10:52:24.791-0500 7ff755ffb700 10 client.4499.objecter op 1 rval 0 len 0 2024-01-26T10:52:24.791-0500 7ff755ffb700 15 client.4499.objecter handle_osd_op_reply completed tid 76 2024-01-26T10:52:24.791-0500 7ff755ffb700 15 client.4499.objecter _finish_op 76
Because of rval = 0 there, Adam's cmpext "handler" from https://github.com/ceph/ceph/pull/52495 doesn't set mismatch_offset and doesn't adjust the overall return code. -4100 is returned to the user (librbd in this case), which chokes on it: we expect -MAX_ERRNO (-4095) overall return code and mismatch_offset = 5.
For comparison, here is Objecter processing the same thing when retries aren't involved:
2024-01-26T10:47:07.259-0500 7ff743fff700 10 client.4142.objecter ms_dispatch 0x55bffca54cd0 osd_op_reply(67 rbd_data.102eba81544b.0000000000000000 [cmpext 512~14,write 512~14] v12'23 uv21 ondisk = -4100 ((4100) Unknown error 4100)) v8 2024-01-26T10:47:07.259-0500 7ff743fff700 10 client.4142.objecter in handle_osd_op_reply 2024-01-26T10:47:07.259-0500 7ff743fff700 7 client.4142.objecter handle_osd_op_reply 67 ondisk uv 21 in 2.0 attempt 0 2024-01-26T10:47:07.259-0500 7ff743fff700 10 client.4142.objecter op 0 rval -4100 len 0 2024-01-26T10:47:07.259-0500 7ff743fff700 10 client.4142.objecter ERROR: tid 67: handler function threw CmpExt mismatch [osd:4095] 2024-01-26T10:47:07.259-0500 7ff743fff700 10 client.4142.objecter op 1 rval 0 len 0 2024-01-26T10:47:07.259-0500 7ff743fff700 15 client.4142.objecter handle_osd_op_reply completed tid 67 2024-01-26T10:47:07.259-0500 7ff743fff700 15 client.4142.objecter _finish_op 67
Radoslaw, is PG::check_in_progress_op() returning with user_version = 0 and an empty op_returns vector, causing a dummy reply with user_version = 0 and rval = 0 for all ops being sent first, expected? Should such a dummy reply be sent at all?
Updated by Radoslaw Zarzynski 3 months ago
- Status changed from New to In Progress
Updated by Radoslaw Zarzynski 3 months ago
Hi Ilya!
I looked over the log's reader (PG::check_in_progress_op()
+ PGLog::get_request()
) and nothing has struck at the first glance -- it seems that for every single return true
the op_returns
should be set.
However, the writer part looks suspicious:
void PrimaryLogPG::record_write_error(OpRequestRef op, const hobject_t &soid,
MOSDOpReply *orig_reply, int r,
OpContext *ctx_for_op_returns)
{
dout(20) << __func__ << " r=" << r << dendl;
ceph_assert(op->may_write());
const osd_reqid_t &reqid = op->get_req<MOSDOp>()->get_reqid();
mempool::osd_pglog::list<pg_log_entry_t> entries;
entries.push_back(pg_log_entry_t(pg_log_entry_t::ERROR, soid,
get_next_version(), eversion_t(), 0,
reqid, utime_t(), r));
if (ctx_for_op_returns) {
entries.back().set_op_returns(*ctx_for_op_returns->ops);
dout(20) << __func__ << " op_returns=" << entries.back().op_returns << dendl;
}
There is the level-20 log I can't see in the snippets:
if (ctx_for_op_returns) {
entries.back().set_op_returns(*ctx_for_op_returns->ops);
dout(20) << __func__ << " op_returns=" << entries.back().op_returns << dendl;
}
If that's the case, ctx_for_op_returns
has been passed by the caller.
Updated by Radoslaw Zarzynski 3 months ago
The caller of the recorder is:
void PrimaryLogPG::execute_ctx(OpContext *ctx)
{
// ...
if (ctx->update_log_only) {
if (result >= 0)
do_osd_op_effects(ctx, m->get_connection());
dout(20) << __func__ << " update_log_only -- result=" << result << dendl;
// save just what we need from ctx
MOSDOpReply *reply = ctx->reply;
ctx->reply = nullptr;
reply->get_header().data_off = (ctx->data_off ? *ctx->data_off : 0);
if (result == -ENOENT) {
reply->set_enoent_reply_versions(info.last_update,
info.last_user_version);
}
reply->add_flags(CEPH_OSD_FLAG_ACK | CEPH_OSD_FLAG_ONDISK);
// append to pg log for dup detection - don't save buffers for now
record_write_error(op, soid, reply, result,
ctx->op->allows_returnvec() ? ctx : nullptr);
close_op_ctx(ctx);
return;
}
The question is: what was the actual value of allows_returnvec()
?
Let's track the the allows_returnvec
paths:
bool OpInfo::allows_returnvec() const {
return check_rmw(CEPH_OSD_RMW_FLAG_RETURNVEC);
}
void OpInfo::set_returnvec() { set_rmw_flags(CEPH_OSD_RMW_FLAG_RETURNVEC); }
int OpInfo::set_from_op(
const MOSDOp *m,
const OSDMap &osdmap)
{
// client flags have no bearing on whether an op is a read, write, etc.
clear();
if (m->has_flag(CEPH_OSD_FLAG_RWORDERED)) {
set_force_rwordered();
}
if (m->has_flag(CEPH_OSD_FLAG_RETURNVEC)) {
set_returnvec();
}
return set_from_op(m->ops, m->get_pg(), osdmap);
}
So, the request message had to be flagged with CEPH_OSD_FLAG_RETURNVEC
.
What can be helpful with disproving the hypothesis is:
const char *ceph_osd_flag_name(unsigned flag)
{
switch (flag) {
case CEPH_OSD_FLAG_ACK: return "ack";
case CEPH_OSD_FLAG_ONNVRAM: return "onnvram";
case CEPH_OSD_FLAG_ONDISK: return "ondisk";
case CEPH_OSD_FLAG_RETRY: return "retry";
case CEPH_OSD_FLAG_READ: return "read";
case CEPH_OSD_FLAG_WRITE: return "write";
case CEPH_OSD_FLAG_ORDERSNAP: return "ordersnap";
case CEPH_OSD_FLAG_PEERSTAT_OLD: return "peerstat_old";
case CEPH_OSD_FLAG_BALANCE_READS: return "balance_reads";
case CEPH_OSD_FLAG_PARALLELEXEC: return "parallelexec";
case CEPH_OSD_FLAG_PGOP: return "pgop";
case CEPH_OSD_FLAG_EXEC: return "exec";
case CEPH_OSD_FLAG_EXEC_PUBLIC: return "exec_public";
case CEPH_OSD_FLAG_LOCALIZE_READS: return "localize_reads";
case CEPH_OSD_FLAG_RWORDERED: return "rwordered";
case CEPH_OSD_FLAG_IGNORE_CACHE: return "ignore_cache";
case CEPH_OSD_FLAG_SKIPRWLOCKS: return "skiprwlocks";
case CEPH_OSD_FLAG_IGNORE_OVERLAY: return "ignore_overlay";
case CEPH_OSD_FLAG_FLUSH: return "flush";
case CEPH_OSD_FLAG_MAP_SNAP_CLONE: return "map_snap_clone";
case CEPH_OSD_FLAG_ENFORCE_SNAPC: return "enforce_snapc";
case CEPH_OSD_FLAG_REDIRECTED: return "redirected";
case CEPH_OSD_FLAG_KNOWN_REDIR: return "known_if_redirected";
case CEPH_OSD_FLAG_FULL_TRY: return "full_try";
case CEPH_OSD_FLAG_FULL_FORCE: return "full_force";
case CEPH_OSD_FLAG_IGNORE_REDIRECT: return "ignore_redirect";
case CEPH_OSD_FLAG_RETURNVEC: return "returnvec";
case CEPH_OSD_FLAG_SUPPORTSPOOLEIO: return "supports_pool_eio";
default: return "???";
}
}
From the provided snippets we can see there this flag has NOT been set:
2024-01-26T10:52:24.791-0500 7f6a861a1700 1 -- [v2:172.21.9.34:6802/39207300,v1:172.21.9.34:6803/39207300] <== client.4499 172.21.9.34:0/3763700879 1 ==== osd_op(client.4499.0:76 59.0 59.95312129 (undecoded) ondisk+retry+write+known_if_redirected+supports_pool_eio e304) v8 ==== 284+0+28 (crc 0 0 0) 0x5623c974e780 con 0x5623ca3e6480
2024-01-26T10:52:24.783-0500 7f6a861a1700 1 -- [v2:172.21.9.34:6802/39207300,v1:172.21.9.34:6803/39207300] <== client.4499 172.21.9.34:0/3763700879 4 ==== osd_op(client.4499.0:76 59.0 59.95312129 (undecoded) ondisk+write+known_if_redirected+supports_pool_eio e304) v8 ==== 284+0+28 (crc 0 0 0) 0x5623c974e000 con 0x5623ca360480
Updated by Radoslaw Zarzynski 3 months ago
Now the Q is why the other reply carried the op_returns
vector.
Updated by Ilya Dryomov 3 months ago
- Category deleted (
Correctness/Safety) - Assignee changed from Radoslaw Zarzynski to Greg Farnum
- Severity deleted (
3 - minor)
Radoslaw Zarzynski wrote:
Now the Q is why the other reply carried the
op_returns
vector.
By the other reply, do you mean the reply from step 7 (with user_version != 0)? I don't think we know whether it actually carried the op_returns
vector -- only that it was "more populated" in the sense that user_version was set.
Updated by Ilya Dryomov 3 months ago
- Category set to Correctness/Safety
- Assignee changed from Greg Farnum to Radoslaw Zarzynski
I'm puzzled by how this ended up being reassigned to Greg or how the Category got cleared. I haven't touched any of those fields...
Updated by Ilya Dryomov 3 months ago
- Priority changed from High to Urgent
Bumping priority -- this would be a blocker from the RBD side for Squid.
Updated by Radoslaw Zarzynski 3 months ago
Let's take a look on how a retry (dup's request) is being handled:
void PrimaryLogPG::do_op(OpRequestRef& op)
{
// ...
// dup/resent?
if (op->may_write() || op->may_cache()) {
// warning: we will get back *a* request for this reqid, but not
// necessarily the most recent. this happens with flush and
// promote ops, but we can't possible have both in our log where
// the original request is still not stable on disk, so for our
// purposes here it doesn't matter which one we get.
eversion_t version;
version_t user_version;
int return_code = 0;
vector<pg_log_op_return_item_t> op_returns;
bool got = check_in_progress_op(
m->get_reqid(), &version, &user_version, &return_code, &op_returns);
if (got) {
dout(3) << __func__ << " dup " << m->get_reqid()
<< " version " << version << dendl;
if (already_complete(version)) {
osd->reply_op_error(op, return_code, version, user_version, op_returns);
} else {
dout(10) << " waiting for " << version << " to commit" << dendl;
// always queue ondisk waiters, so that we can requeue if needed
waiting_for_ondisk[version].emplace_back(op, user_version, return_code,
op_returns);
op->mark_delayed("waiting for ondisk");
}
return;
}
}
There are two paths:
1. If the underlying operation is completed (the already_complete() true
case), a response based solely on the PGLog (regardless whether it has been filled with proper values in record_write_error()
) is prepared:
if (already_complete(version)) {
osd->reply_op_error(op, return_code, version, user_version, op_returns);
It's a no-brainer. However, this is NOT what has happened in the case described by Ilya:
2024-01-26T10:52:24.791-0500 7f6a645e6700 20 osd.0 pg_epoch: 304 pg[59.0( v 304'28 (0'0,304'28] local-lis/les=301/302 n=9 ec=301/301 lis/c=301/301 les/c/f=302/302/0 sis=301) [0] r=0 lpr=301 luod=304'27 crt=304'25 lcod 304'26 mlcod 304'26 active+clean ps=[2~1]] already_complete: repgather(0x5623c982d800 304'28 rep_tid=1813 committed?=0 r=-4100) not committed, returning false
2. If the operation is still in progress, more complex path over the waiting_for_ondisk
is selected:
} else {
dout(10) << " waiting for " << version << " to commit" << dendl;
// always queue ondisk waiters, so that we can requeue if needed
waiting_for_ondisk[version].emplace_back(op, user_version, return_code,
op_returns);
op->mark_delayed("waiting for ondisk");
}
The entry is translated into a response in eval_repop()
. In the case this method has first dealt with the response to the retry (dup):
void PrimaryLogPG::eval_repop(RepGather *repop)
{
// ...
// send dup commits, in order
auto it = waiting_for_ondisk.find(repop->v);
if (it != waiting_for_ondisk.end()) {
ceph_assert(waiting_for_ondisk.begin()->first == repop->v);
for (auto& i : it->second) {
int return_code = repop->r;
if (return_code >= 0) {
return_code = std::get<2>(i);
}
osd->reply_op_error(std::get<0>(i), return_code, repop->v,
std::get<1>(i), std::get<3>(i));
}
waiting_for_ondisk.erase(it);
}
publish_stats_to_osd();
// ...
Although return_code
can be altered in some cases, the user version (fourth parameter to reply_op_error
) shouldn't.
void reply_op_error(OpRequestRef op, int err, eversion_t v, version_t uv,
std::vector<pg_log_op_return_item_t> op_returns);
I think this is the place where the reply with uv 0
has been sent – it's the response to the retry. `0` came from PGLog as it's exactly what has been put there the recorder. Let's retake a look:
void PrimaryLogPG::record_write_error(OpRequestRef op, const hobject_t &soid,
MOSDOpReply *orig_reply, int r,
OpContext *ctx_for_op_returns)
{
// ...
entries.push_back(pg_log_entry_t(pg_log_entry_t::ERROR, soid,
get_next_version(), eversion_t(), 0,
reqid, utime_t(), r));
if (ctx_for_op_returns) {
entries.back().set_op_returns(*ctx_for_op_returns->ops);
dout(20) << __func__ << " op_returns=" << entries.back().op_returns << dendl;
}
pg_log_entry_t(int _op, const hobject_t& _soid,
const eversion_t& v, const eversion_t& pv,
version_t uv,
const osd_reqid_t& rid, const utime_t& mt,
int return_code)
: soid(_soid), reqid(rid), version(v), prior_version(pv), user_version(uv),
mtime(mt), return_code(return_code), op(_op),
invalid_hash(false), invalid_pool(false) {
snaps.reassign_to_mempool(mempool::mempool_osd_pglog);
}
uv
is always zero in these 2 paths and I'm afraid they haven't changed in last 4 years.
To summarize: a response to a retry is sent with uv == 0
and contains the op_returnvec
**only if the client sets the returnvec
flag (as shown in the previous comments). I guess this is the differentiation point with the original message. Let's take a look on.
The response to the original request came from eval_ops
as well, it has been sent after response to the retry:
void PrimaryLogPG::eval_repop(RepGather *repop)
{
// ...
// send dup commits, in order
// ...
publish_stats_to_osd();
dout(10) << " removing " << *repop << dendl;
ceph_assert(!repop_queue.empty());
dout(20) << " q front is " << *repop_queue.front() << dendl;
if (repop_queue.front() == repop) {
RepGather *to_remove = nullptr;
while (!repop_queue.empty() &&
(to_remove = repop_queue.front())->all_committed) {
repop_queue.pop_front();
for (auto p = to_remove->on_success.begin();
p != to_remove->on_success.end();
to_remove->on_success.erase(p++)) {
(*p)();
}
remove_repop(to_remove);
}
}
The callback here came from the same record_write_error()
that is putting uv = 0
to the PGLog (for the sake a of retries):
void PrimaryLogPG::record_write_error(OpRequestRef op, const hobject_t &soid,
MOSDOpReply *orig_reply, int r,
OpContext *ctx_for_op_returns)
{
// ...
struct OnComplete {
PrimaryLogPG *pg;
OpRequestRef op;
boost::intrusive_ptr<MOSDOpReply> orig_reply;
int r;
OnComplete(
PrimaryLogPG *pg,
OpRequestRef op,
MOSDOpReply *orig_reply,
int r)
: pg(pg), op(op),
orig_reply(orig_reply, false /* take over ref */), r(r)
{}
void operator()() {
ldpp_dout(pg, 20) << "finished " << __func__ << " r=" << r << dendl;
auto m = op->get_req<MOSDOp>();
MOSDOpReply *reply = orig_reply.detach();
ldpp_dout(pg, 10) << " sending commit on " << *m << " " << reply << dendl;
pg->osd->send_message_osd_client(reply, m->get_connection());
}
};
ObcLockManager lock_manager;
submit_log_entries(
entries,
std::move(lock_manager),
std::optional<std::function<void(void)> >(
OnComplete(this, op, orig_reply, r)),
op,
r);
}
The final look goes to the crafting of MOSDOpReply
in the error scenario:
bool ignore_out_data = false;
if (!ctx->op_t->empty() &&
op->may_write() &&
result >= 0) {
// successful update
// ...
}
// prepare the reply
ctx->reply = new MOSDOpReply(m, result, get_osdmap_epoch(), 0,
ignore_out_data);
// ...
ctx->reply->set_reply_versions(ctx->at_version, ctx->user_at_version);
// ...
if (ctx->update_log_only) {
// ...
// append to pg log for dup detection - don't save buffers for now
record_write_error(op, soid, reply, result,
ctx->op->allows_returnvec() ? ctx : nullptr);
To summarize: for the reply to the original request the presence of returnvec in the error scenario (result < 0
) is unconditional.
Updated by Radoslaw Zarzynski 3 months ago
The intuitive (not saying yet: proper nor good) would be to replicate the behavior of the main path and record retvec
on result < 0
regardless of a client's will.
Updated by Radoslaw Zarzynski 3 months ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 55570
Updated by Radoslaw Zarzynski 2 months ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to reef,quincy
Updated by Backport Bot 2 months ago
- Copied to Backport #64575: reef: Incorrect behavior on combined cmpext+write ops in the face of session resets added
Updated by Backport Bot 2 months ago
- Copied to Backport #64576: quincy: Incorrect behavior on combined cmpext+write ops in the face of session resets added