Bug #62081
closedtasks/fscrypt-common does not finish, timesout
100%
Description
/a/yuriw-2023-07-14_23:36:48-fs-wip-yuri7-testing-2023-07-14-0803-reef-distro-default-smithi/7338148
Starts out fine
2023-07-15T00:28:16.193 DEBUG:teuthology.orchestra.run.smithi141:running tests for dummy_encryption from xfstests-dev> (cd /tmp/tmp.vV456TNbPzxfstests-dev && exec sudo env DIFF_LENGTH=0 ./check -g quick -E ./ceph.exclude)
runs for a while, and then
2023-07-15T00:37:35.968 DEBUG:teuthology.orchestra.run.smithi120:> sudo logrotate /etc/logrotate.d/ceph-test.conf 2023-07-15T00:37:35.971 DEBUG:teuthology.orchestra.run.smithi141:> sudo logrotate /etc/logrotate.d/ceph-test.conf 2023-07-15T00:37:36.234 INFO:teuthology.orchestra.run.smithi141.stdout:generic/111 [not run] Reflink not supported by test filesystem type: ceph 2023-07-15T00:38:00.665 INFO:teuthology.orchestra.run.smithi141.stdout:generic/112 24s 2023-07-15T00:38:06.238 DEBUG:teuthology.orchestra.run.smithi003:> sudo logrotate /etc/logrotate.d/ceph-test.conf ... ... ...
Updated by Venky Shankar 10 months ago
Another instance: /a/yuriw-2023-07-26_14:34:38-fs-reef-release-distro-default-smithi/7353184
Updated by Xiubo Li 10 months ago
Venky Shankar wrote:
/a/yuriw-2023-07-14_23:36:48-fs-wip-yuri7-testing-2023-07-14-0803-reef-distro-default-smithi/7338148
Starts out fine
[...]runs for a while, and then
[...]
The osd6 was not reachable suddenly. Not dig it further yet.
2023-07-15T00:37:34.412125+00:00 smithi141 kernel: libceph: mon0 (1)172.21.15.3:6789 session established 2023-07-15T00:37:34.412209+00:00 smithi141 kernel: libceph: client5376 fsid c59db712-ccb5-4eab-9bab-c4004ffac57d 2023-07-15T00:37:34.412240+00:00 smithi141 kernel: ceph: test_dummy_encryption mode enabled 2023-07-15T00:39:43.333754+00:00 smithi141 kernel: libceph: osd6 (1)172.21.15.120:6817 socket error on read 2023-07-15T00:39:43.342744+00:00 smithi141 kernel: libceph: osd6 (1)172.21.15.120:6817 socket error on read 2023-07-15T00:39:43.351744+00:00 smithi141 kernel: libceph: osd6 (1)172.21.15.120:6817 socket error on read 2023-07-15T00:39:43.359742+00:00 smithi141 kernel: libceph: osd6 (1)172.21.15.120:6817 socket error on read 2023-07-15T00:39:43.369746+00:00 smithi141 kernel: libceph: osd6 (1)172.21.15.120:6817 socket error on read 2023-07-15T00:39:43.378743+00:00 smithi141 kernel: libceph: osd6 (1)172.21.15.120:6817 socket error on read 2023-07-15T00:39:43.387747+00:00 smithi141 kernel: libceph: osd6 (1)172.21.15.120:6817 socket error on read 2023-07-15T00:39:43.396746+00:00 smithi141 kernel: libceph: osd6 (1)172.21.15.120:6817 socket error on read 2023-07-15T00:39:43.404739+00:00 smithi141 kernel: libceph: osd6 (1)172.21.15.120:6817 socket error on read 2023-07-15T00:39:43.404788+00:00 smithi141 kernel: libceph: osd6 (1)172.21.15.120:6817 socket error on read ...
Updated by Xiubo Li 8 months ago
- Status changed from New to In Progress
Xiubo Li wrote:
Venky Shankar wrote:
/a/yuriw-2023-07-14_23:36:48-fs-wip-yuri7-testing-2023-07-14-0803-reef-distro-default-smithi/7338148
Starts out fine
[...]runs for a while, and then
[...]The osd6 was not reachable suddenly. Not dig it further yet.
[...]
From remote/smithi120/log/ceph-osd.6.log.30.gz:
The sparse-read succeeded but the kclient kept retrying in a infinite loop:
2023-07-15T01:17:27.466+0000 7f0bd13b2700 1 -- [v2:172.21.15.120:6816/146557,v1:172.21.15.120:6817/146557] <== client.5376 v1:172.21.15.141:0/3368586781 5 ==== osd_op(client.5376.0:161010 6.2e 6.552cb06e (undecoded) ondisk+retry+read+known_if_redirected e36) v8 ==== 221+0+0 (unknown 3349828285 0 0) 0x55ec90b20a00 con 0x55ec92027800 2023-07-15T01:17:27.466+0000 7f0bad755700 20 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] do_op obc obc(6:a5c3f7ca:::10000009a27.00000002:head rwstate(read n=1 w=0)) 2023-07-15T01:17:27.466+0000 7f0bb1f5e700 20 osd.6 op_wq(4) _process OpSchedulerItem(6.2c PGOpItem(op=osd_op(client.5376.0:161009 6.2c 6.afabdc6c (undecoded) ondisk+retry+read+known_if_redirected e36) v8) class_id 3 prio 127 qos_cost 214110 cost 0 e36) queued 2023-07-15T01:17:27.466+0000 7f0bad755700 10 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] execute_ctx 0x55ec9b5ee900 2023-07-15T01:17:27.466+0000 7f0bd13b2700 15 osd.6 36 enqueue_op osd_op(client.5376.0:161010 6.2e 6.552cb06e (undecoded) ondisk+retry+read+known_if_redirected e36) v8 prio 127 type 42 cost 0 latency 0.000010 epoch 36 osd_op(client.5376.0:161010 6.2e 6.552cb06e (undecoded) ondisk+retry+read+known_if_redirected e36) v8 2023-07-15T01:17:27.466+0000 7f0bb0f5c700 20 osd.6 pg_epoch: 36 pg[6.2e( v 36'12926 (36'2900,36'12926] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'12926 lcod 36'12925 mlcod 36'12925 active+clean] do_op: op osd_op(client.5376.0:160973 6.2e 6:75d6f72a:::10000009a25.00000000:head [sparse-read 2387968~65536 [1@-1]] snapc 0=[] RETRY=316378 ondisk+retry+read+known_if_redirected e36) v8 2023-07-15T01:17:27.466+0000 7f0bad755700 10 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] execute_ctx 6:a5c3f7ca:::10000009a27.00000002:head [sparse-read 1511424~65536 [1@-1]] ov 36'5662 ... 2023-07-15T01:17:27.481+0000 7f0bb0f5c700 1 -- [v2:172.21.15.120:6816/146557,v1:172.21.15.120:6817/146557] --> v1:172.21.15.141:0/3368586781 -- osd_op_reply(161010 10000009a24.00000000 [sparse-read 4050944~65536 out=65560b] v0'0 uv12925 ondisk = 0) v8 -- 0x55ec92841680 con 0x55ec98fcb000
Till here it had retried RETRY=316381 time.
Need to go through the libceph kernel module to find out the root cause.
Updated by Xiubo Li 7 months ago
Just save a full log for client.5376.0:160927:
2023-07-15T01:17:27.346+0000 7f0bd13b2700 1 -- [v2:172.21.15.120:6816/146557,v1:172.21.15.120:6817/146557] <== client.5376 v1:172.21.15.141:0/3368586781 1 ==== osd_op(client.5376.0:160927 6.25 6.53efc3a5 (undecoded) ondisk+retry+read+known_if_redirected e36) v8 ==== 221+0+0 (unknown 2276465548 0 0) 0x55ec9428d400 con 0x55ec993e3000 2023-07-15T01:17:27.346+0000 7f0bd13b2700 15 osd.6 36 enqueue_op osd_op(client.5376.0:160927 6.25 6.53efc3a5 (undecoded) ondisk+retry+read+known_if_redirected e36) v8 prio 127 type 42 cost 0 latency 0.000017 epoch 36 osd_op(client.5376.0:160927 6.25 6.53efc3a5 (undecoded) ondisk+retry+read+known_if_redirected e36) v8 2023-07-15T01:17:27.346+0000 7f0bd13b2700 20 osd.6 op_wq(5) _enqueue OpSchedulerItem(6.25 PGOpItem(op=osd_op(client.5376.0:160927 6.25 6.53efc3a5 (undecoded) ondisk+retry+read+known_if_redirected e36) v8) class_id 3 prio 127 cost 0 e36) 2023-07-15T01:17:27.346+0000 7f0bad755700 20 osd.6 op_wq(5) _process 6.25 to_process <> waiting <> waiting_peering {} 2023-07-15T01:17:27.346+0000 7f0bd13b2700 1 -- [v2:172.21.15.120:6816/146557,v1:172.21.15.120:6817/146557] <== client.5376 v1:172.21.15.141:0/3368586781 2 ==== osd_op(client.5376.0:160950 6.25 6.4f2dc065 (undecoded) ondisk+retry+read+known_if_redirected e36) v8 ==== 221+0+0 (unknown 3143864825 0 0) 0x55ec938cf180 con 0x55ec993e3000 2023-07-15T01:17:27.346+0000 7f0bad755700 20 osd.6 op_wq(5) _process OpSchedulerItem(6.25 PGOpItem(op=osd_op(client.5376.0:160927 6.25 6.53efc3a5 (undecoded) ondisk+retry+read+known_if_redirected e36) v8) class_id 3 prio 127 qos_cost 214110 cost 0 e36) queued 2023-07-15T01:17:27.346+0000 7f0bd13b2700 15 osd.6 36 enqueue_op osd_op(client.5376.0:160950 6.25 6.4f2dc065 (undecoded) ondisk+retry+read+known_if_redirected e36) v8 prio 127 type 42 cost 0 latency 0.000014 epoch 36 osd_op(client.5376.0:160950 6.25 6.4f2dc065 (undecoded) ondisk+retry+read+known_if_redirected e36) v8 2023-07-15T01:17:27.346+0000 7f0bad755700 20 osd.6 op_wq(5) _process 6.25 to_process <OpSchedulerItem(6.25 PGOpItem(op=osd_op(client.5376.0:160927 6.25 6.53efc3a5 (undecoded) ondisk+retry+read+known_if_redirected e36) v8) class_id 3 prio 127 qos_cost 214110 cost 0 e36)> waiting <> waiting_peering {} 2023-07-15T01:17:27.346+0000 7f0bd13b2700 20 osd.6 op_wq(5) _enqueue OpSchedulerItem(6.25 PGOpItem(op=osd_op(client.5376.0:160950 6.25 6.4f2dc065 (undecoded) ondisk+retry+read+known_if_redirected e36) v8) class_id 3 prio 127 cost 0 e36) 2023-07-15T01:17:27.346+0000 7f0bad755700 20 osd.6 op_wq(5) _process OpSchedulerItem(6.25 PGOpItem(op=osd_op(client.5376.0:160927 6.25 6.53efc3a5 (undecoded) ondisk+retry+read+known_if_redirected e36) v8) class_id 3 prio 127 qos_cost 214110 cost 0 e36) pg 0x55ec92173000 2023-07-15T01:17:27.346+0000 7f0bb175d700 20 osd.6 op_wq(5) _process empty q, waiting 2023-07-15T01:17:27.346+0000 7f0bad755700 10 osd.6 36 dequeue_op osd_op(client.5376.0:160927 6.25 6.53efc3a5 (undecoded) ondisk+retry+read+known_if_redirected e36) v8 prio 127 cost 0 latency 0.000055 osd_op(client.5376.0:160927 6.25 6.53efc3a5 (undecoded) ondisk+retry+read+known_if_redirected e36) v8 pg pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] 2023-07-15T01:17:27.346+0000 7f0bad755700 10 osd.6 36 maybe_share_map con 0x55ec993e3000 v1:172.21.15.141:0/3368586781 map epoch 0 -> 36 (as per caller) 2023-07-15T01:17:27.346+0000 7f0bb175d700 20 osd.6 op_wq(5) _process 6.25 to_process <> waiting <> waiting_peering {} 2023-07-15T01:17:27.346+0000 7f0bb175d700 20 osd.6 op_wq(5) _process OpSchedulerItem(6.25 PGOpItem(op=osd_op(client.5376.0:160950 6.25 6.4f2dc065 (undecoded) ondisk+retry+read+known_if_redirected e36) v8) class_id 3 prio 127 qos_cost 214110 cost 0 e36) queued 2023-07-15T01:17:27.346+0000 7f0bad755700 10 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] _handle_message: osd_op(client.5376.0:160927 6.25 6.53efc3a5 (undecoded) ondisk+retry+read+known_if_redirected e36) v8 2023-07-15T01:17:27.346+0000 7f0bd13b2700 1 -- [v2:172.21.15.120:6816/146557,v1:172.21.15.120:6817/146557] <== client.5376 v1:172.21.15.141:0/3368586781 3 ==== osd_op(client.5376.0:160973 6.2e 6.54ef6bae (undecoded) ondisk+retry+read+known_if_redirected e36) v8 ==== 221+0+0 (unknown 3082543621 0 0) 0x55ec95b7e000 con 0x55ec993e3000 2023-07-15T01:17:27.346+0000 7f0bd13b2700 15 osd.6 36 enqueue_op osd_op(client.5376.0:160973 6.2e 6.54ef6bae (undecoded) ondisk+retry+read+known_if_redirected e36) v8 prio 127 type 42 cost 0 latency 0.000012 epoch 36 osd_op(client.5376.0:160973 6.2e 6.54ef6bae (undecoded) ondisk+retry+read+known_if_redirected e36) v8 2023-07-15T01:17:27.346+0000 7f0bad755700 20 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] do_op: op osd_op(client.5376.0:160927 6.25 6:a5c3f7ca:::10000009a27.00000002:head [sparse-read 1511424~65536 [1@-1]] snapc 0=[] RETRY=316361 ondisk+retry+read+known_if_redirected e36) v8 2023-07-15T01:17:27.346+0000 7f0bd13b2700 20 osd.6 op_wq(6) _enqueue OpSchedulerItem(6.2e PGOpItem(op=osd_op(client.5376.0:160973 6.2e 6.54ef6bae (undecoded) ondisk+retry+read+known_if_redirected e36) v8) class_id 3 prio 127 cost 0 e36) 2023-07-15T01:17:27.346+0000 7f0bad755700 20 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] op_has_sufficient_caps session=0x55ec95437a40 pool=6 (cephfs_data ) pool_app_metadata={cephfs={data=cephfs}} need_read_cap=1 need_write_cap=0 classes=[] -> yes 2023-07-15T01:17:27.346+0000 7f0bb0f5c700 20 osd.6 op_wq(6) _process 6.2e to_process <> waiting <> waiting_peering {} 2023-07-15T01:17:27.346+0000 7f0bb0f5c700 20 osd.6 op_wq(6) _process OpSchedulerItem(6.2e PGOpItem(op=osd_op(client.5376.0:160973 6.2e 6.54ef6bae (undecoded) ondisk+retry+read+known_if_redirected e36) v8) class_id 3 prio 127 qos_cost 214110 cost 0 e36) queued 2023-07-15T01:17:27.346+0000 7f0bd13b2700 1 -- [v2:172.21.15.120:6816/146557,v1:172.21.15.120:6817/146557] <== client.5376 v1:172.21.15.141:0/3368586781 4 ==== osd_op(client.5376.0:161009 6.2c 6.afabdc6c (undecoded) ondisk+retry+read+known_if_redirected e36) v8 ==== 221+0+0 (unknown 38890711 0 0) 0x55ec933e1400 con 0x55ec993e3000 2023-07-15T01:17:27.346+0000 7f0bb0f5c700 20 osd.6 op_wq(6) _process 6.2e to_process <OpSchedulerItem(6.2e PGOpItem(op=osd_op(client.5376.0:160973 6.2e 6.54ef6bae (undecoded) ondisk+retry+read+known_if_redirected e36) v8) class_id 3 prio 127 qos_cost 214110 cost 0 e36)> waiting <> waiting_peering {} 2023-07-15T01:17:27.346+0000 7f0bad755700 10 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] do_op osd_op(client.5376.0:160927 6.25 6:a5c3f7ca:::10000009a27.00000002:head [sparse-read 1511424~65536 [1@-1]] snapc 0=[] RETRY=316361 ondisk+retry+read+known_if_redirected e36) v8 may_read -> read-ordered flags ondisk+retry+read+known_if_redirected 2023-07-15T01:17:27.346+0000 7f0bb0f5c700 20 osd.6 op_wq(6) _process OpSchedulerItem(6.2e PGOpItem(op=osd_op(client.5376.0:160973 6.2e 6.54ef6bae (undecoded) ondisk+retry+read+known_if_redirected e36) v8) class_id 3 prio 127 qos_cost 214110 cost 0 e36) pg 0x55ec92198000 2023-07-15T01:17:27.346+0000 7f0bd13b2700 15 osd.6 36 enqueue_op osd_op(client.5376.0:161009 6.2c 6.afabdc6c (undecoded) ondisk+retry+read+known_if_redirected e36) v8 prio 127 type 42 cost 0 latency 0.000010 epoch 36 osd_op(client.5376.0:161009 6.2c 6.afabdc6c (undecoded) ondisk+retry+read+known_if_redirected e36) v8 2023-07-15T01:17:27.346+0000 7f0bd13b2700 20 osd.6 op_wq(4) _enqueue OpSchedulerItem(6.2c PGOpItem(op=osd_op(client.5376.0:161009 6.2c 6.afabdc6c (undecoded) ondisk+retry+read+known_if_redirected e36) v8) class_id 3 prio 127 cost 0 e36) 2023-07-15T01:17:27.346+0000 7f0bb0f5c700 10 osd.6 36 dequeue_op osd_op(client.5376.0:160973 6.2e 6.54ef6bae (undecoded) ondisk+retry+read+known_if_redirected e36) v8 prio 127 cost 0 latency 0.000034 osd_op(client.5376.0:160973 6.2e 6.54ef6bae (undecoded) ondisk+retry+read+known_if_redirected e36) v8 pg pg[6.2e( v 36'12926 (36'2900,36'12926] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'12926 lcod 36'12925 mlcod 36'12925 active+clean] 2023-07-15T01:17:27.346+0000 7f0bad755700 10 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] get_object_context: found obc in cache: obc(6:a5c3f7ca:::10000009a27.00000002:head rwstate(none n=0 w=0)) 2023-07-15T01:17:27.346+0000 7f0bb0f5c700 10 osd.6 pg_epoch: 36 pg[6.2e( v 36'12926 (36'2900,36'12926] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'12926 lcod 36'12925 mlcod 36'12925 active+clean] _handle_message: osd_op(client.5376.0:160973 6.2e 6.54ef6bae (undecoded) ondisk+retry+read+known_if_redirected e36) v8 2023-07-15T01:17:27.346+0000 7f0bacf54700 20 osd.6 op_wq(6) _process empty q, waiting 2023-07-15T01:17:27.346+0000 7f0bad755700 10 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] get_object_context: obc(6:a5c3f7ca:::10000009a27.00000002:head rwstate(none n=0 w=0)) oi: 6:a5c3f7ca:::10000009a27.00000002:head(36'5662 client.5376.0:159026 dirty s 2084864 uv 5662 alloc_hint [0 0 0]) exists: 1 ssc(6:a5c3f7ca:::10000009a27.00000002:snapdir snapset: 1=[]:{} ref: 1 registered: 1 exists: 1) 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 20 osd.6 op_wq(4) _process 6.2c to_process <> waiting <> waiting_peering {} 2023-07-15T01:17:27.346+0000 7f0bd13b2700 1 -- [v2:172.21.15.120:6816/146557,v1:172.21.15.120:6817/146557] <== client.5376 v1:172.21.15.141:0/3368586781 5 ==== osd_op(client.5376.0:161010 6.2e 6.552cb06e (undecoded) ondisk+retry+read+known_if_redirected e36) v8 ==== 221+0+0 (unknown 2366764798 0 0) 0x55ec92d2b400 con 0x55ec993e3000 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 20 osd.6 op_wq(4) _process OpSchedulerItem(6.2c PGOpItem(op=osd_op(client.5376.0:161009 6.2c 6.afabdc6c (undecoded) ondisk+retry+read+known_if_redirected e36) v8) class_id 3 prio 127 qos_cost 214110 cost 0 e36) queued 2023-07-15T01:17:27.346+0000 7f0bb0f5c700 20 osd.6 pg_epoch: 36 pg[6.2e( v 36'12926 (36'2900,36'12926] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'12926 lcod 36'12925 mlcod 36'12925 active+clean] do_op: op osd_op(client.5376.0:160973 6.2e 6:75d6f72a:::10000009a25.00000000:head [sparse-read 2387968~65536 [1@-1]] snapc 0=[] RETRY=316361 ondisk+retry+read+known_if_redirected e36) v8 2023-07-15T01:17:27.346+0000 7f0bad755700 10 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] find_object_context 6:a5c3f7ca:::10000009a27.00000002:head @head oi=6:a5c3f7ca:::10000009a27.00000002:head(36'5662 client.5376.0:159026 dirty s 2084864 uv 5662 alloc_hint [0 0 0]) 2023-07-15T01:17:27.346+0000 7f0bd13b2700 15 osd.6 36 enqueue_op osd_op(client.5376.0:161010 6.2e 6.552cb06e (undecoded) ondisk+retry+read+known_if_redirected e36) v8 prio 127 type 42 cost 0 latency 0.000010 epoch 36 osd_op(client.5376.0:161010 6.2e 6.552cb06e (undecoded) ondisk+retry+read+known_if_redirected e36) v8 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 20 osd.6 op_wq(4) _process 6.2c to_process <OpSchedulerItem(6.2c PGOpItem(op=osd_op(client.5376.0:161009 6.2c 6.afabdc6c (undecoded) ondisk+retry+read+known_if_redirected e36) v8) class_id 3 prio 127 qos_cost 214110 cost 0 e36)> waiting <> waiting_peering {} 2023-07-15T01:17:27.346+0000 7f0bd13b2700 20 osd.6 op_wq(6) _enqueue OpSchedulerItem(6.2e PGOpItem(op=osd_op(client.5376.0:161010 6.2e 6.552cb06e (undecoded) ondisk+retry+read+known_if_redirected e36) v8) class_id 3 prio 127 cost 0 e36) 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 20 osd.6 op_wq(4) _process OpSchedulerItem(6.2c PGOpItem(op=osd_op(client.5376.0:161009 6.2c 6.afabdc6c (undecoded) ondisk+retry+read+known_if_redirected e36) v8) class_id 3 prio 127 qos_cost 214110 cost 0 e36) pg 0x55ec912a2000 2023-07-15T01:17:27.346+0000 7f0bad755700 20 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] maybe_handle_manifest_detail: 6:a5c3f7ca:::10000009a27.00000002:head is not manifest object 2023-07-15T01:17:27.346+0000 7f0bb0f5c700 20 osd.6 pg_epoch: 36 pg[6.2e( v 36'12926 (36'2900,36'12926] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'12926 lcod 36'12925 mlcod 36'12925 active+clean] op_has_sufficient_caps session=0x55ec95437a40 pool=6 (cephfs_data ) pool_app_metadata={cephfs={data=cephfs}} need_read_cap=1 need_write_cap=0 classes=[] -> yes 2023-07-15T01:17:27.346+0000 7f0badf56700 20 osd.6 op_wq(4) _process empty q, waiting 2023-07-15T01:17:27.346+0000 7f0bacf54700 20 osd.6 op_wq(6) _process 6.2e to_process <> waiting <> waiting_peering {} 2023-07-15T01:17:27.346+0000 7f0bad755700 20 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] do_op obc obc(6:a5c3f7ca:::10000009a27.00000002:head rwstate(read n=1 w=0)) 2023-07-15T01:17:27.346+0000 7f0bb0f5c700 10 osd.6 pg_epoch: 36 pg[6.2e( v 36'12926 (36'2900,36'12926] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'12926 lcod 36'12925 mlcod 36'12925 active+clean] do_op osd_op(client.5376.0:160973 6.2e 6:75d6f72a:::10000009a25.00000000:head [sparse-read 2387968~65536 [1@-1]] snapc 0=[] RETRY=316361 ondisk+retry+read+known_if_redirected e36) v8 may_read -> read-ordered flags ondisk+retry+read+known_if_redirected 2023-07-15T01:17:27.346+0000 7f0bd13b2700 1 -- [v2:172.21.15.120:6816/146557,v1:172.21.15.120:6817/146557] <== client.5376 v1:172.21.15.141:0/3368586781 6 ==== osd_op(client.5376.0:161011 6.23 6.57953a3 (undecoded) ondisk+retry+read+known_if_redirected e36) v8 ==== 221+0+0 (unknown 1419772785 0 0) 0x55ec9566fb80 con 0x55ec993e3000 2023-07-15T01:17:27.346+0000 7f0bacf54700 20 osd.6 op_wq(6) _process OpSchedulerItem(6.2e PGOpItem(op=osd_op(client.5376.0:161010 6.2e 6.552cb06e (undecoded) ondisk+retry+read+known_if_redirected e36) v8) class_id 3 prio 127 qos_cost 214110 cost 0 e36) queued 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 10 osd.6 36 dequeue_op osd_op(client.5376.0:161009 6.2c 6.afabdc6c (undecoded) ondisk+retry+read+known_if_redirected e36) v8 prio 127 cost 0 latency 0.000042 osd_op(client.5376.0:161009 6.2c 6.afabdc6c (undecoded) ondisk+retry+read+known_if_redirected e36) v8 pg pg[6.2c( v 36'7322 (0'0,36'7322] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'7322 lcod 36'7321 mlcod 36'7321 active+clean] 2023-07-15T01:17:27.346+0000 7f0bad755700 10 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] execute_ctx 0x55ec9b5ee900 2023-07-15T01:17:27.346+0000 7f0bd13b2700 15 osd.6 36 enqueue_op osd_op(client.5376.0:161011 6.23 6.57953a3 (undecoded) ondisk+retry+read+known_if_redirected e36) v8 prio 127 type 42 cost 0 latency 0.000009 epoch 36 osd_op(client.5376.0:161011 6.23 6.57953a3 (undecoded) ondisk+retry+read+known_if_redirected e36) v8 2023-07-15T01:17:27.346+0000 7f0bb0f5c700 10 osd.6 pg_epoch: 36 pg[6.2e( v 36'12926 (36'2900,36'12926] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'12926 lcod 36'12925 mlcod 36'12925 active+clean] get_object_context: found obc in cache: obc(6:75d6f72a:::10000009a25.00000000:head rwstate(none n=0 w=0)) 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 10 osd.6 pg_epoch: 36 pg[6.2c( v 36'7322 (0'0,36'7322] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'7322 lcod 36'7321 mlcod 36'7321 active+clean] _handle_message: osd_op(client.5376.0:161009 6.2c 6.afabdc6c (undecoded) ondisk+retry+read+known_if_redirected e36) v8 2023-07-15T01:17:27.346+0000 7f0bd13b2700 20 osd.6 op_wq(3) _enqueue OpSchedulerItem(6.23 PGOpItem(op=osd_op(client.5376.0:161011 6.23 6.57953a3 (undecoded) ondisk+retry+read+known_if_redirected e36) v8) class_id 3 prio 127 cost 0 e36) 2023-07-15T01:17:27.346+0000 7f0bad755700 10 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] execute_ctx 6:a5c3f7ca:::10000009a27.00000002:head [sparse-read 1511424~65536 [1@-1]] ov 36'5662 2023-07-15T01:17:27.346+0000 7f0bb0f5c700 10 osd.6 pg_epoch: 36 pg[6.2e( v 36'12926 (36'2900,36'12926] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'12926 lcod 36'12925 mlcod 36'12925 active+clean] get_object_context: obc(6:75d6f72a:::10000009a25.00000000:head rwstate(none n=0 w=0)) oi: 6:75d6f72a:::10000009a25.00000000:head(36'12926 client.5376.0:159052 dirty s 4194304 uv 12926 alloc_hint [0 0 0]) exists: 1 ssc(6:75d6f72a:::10000009a25.00000000:snapdir snapset: 1=[]:{} ref: 1 registered: 1 exists: 1) 2023-07-15T01:17:27.346+0000 7f0bad755700 10 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] do_osd_op 6:a5c3f7ca:::10000009a27.00000002:head [sparse-read 1511424~65536 [1@-1]] 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 20 osd.6 pg_epoch: 36 pg[6.2c( v 36'7322 (0'0,36'7322] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'7322 lcod 36'7321 mlcod 36'7321 active+clean] do_op: op osd_op(client.5376.0:161009 6.2c 6:363bd5f5:::10000009a1c.00000002:head [sparse-read 1855488~65536 [1@-1]] snapc 0=[] RETRY=316361 ondisk+retry+read+known_if_redirected e36) v8 2023-07-15T01:17:27.346+0000 7f0bb0f5c700 10 osd.6 pg_epoch: 36 pg[6.2e( v 36'12926 (36'2900,36'12926] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'12926 lcod 36'12925 mlcod 36'12925 active+clean] find_object_context 6:75d6f72a:::10000009a25.00000000:head @head oi=6:75d6f72a:::10000009a25.00000000:head(36'12926 client.5376.0:159052 dirty s 4194304 uv 12926 alloc_hint [0 0 0]) 2023-07-15T01:17:27.346+0000 7f0bad755700 10 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] do_osd_op sparse-read 1511424~65536 [1@-1] 2023-07-15T01:17:27.346+0000 7f0bb275f700 20 osd.6 op_wq(3) _process 6.23 to_process <> waiting <> waiting_peering {} 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 20 osd.6 pg_epoch: 36 pg[6.2c( v 36'7322 (0'0,36'7322] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'7322 lcod 36'7321 mlcod 36'7321 active+clean] op_has_sufficient_caps session=0x55ec95437a40 pool=6 (cephfs_data ) pool_app_metadata={cephfs={data=cephfs}} need_read_cap=1 need_write_cap=0 classes=[] -> yes 2023-07-15T01:17:27.346+0000 7f0bad755700 20 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] do_sparse_read 2023-07-15T01:17:27.346+0000 7f0bb0f5c700 20 osd.6 pg_epoch: 36 pg[6.2e( v 36'12926 (36'2900,36'12926] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'12926 lcod 36'12925 mlcod 36'12925 active+clean] maybe_handle_manifest_detail: 6:75d6f72a:::10000009a25.00000000:head is not manifest object 2023-07-15T01:17:27.346+0000 7f0bb275f700 20 osd.6 op_wq(3) _process OpSchedulerItem(6.23 PGOpItem(op=osd_op(client.5376.0:161011 6.23 6.57953a3 (undecoded) ondisk+retry+read+known_if_redirected e36) v8) class_id 3 prio 127 qos_cost 214110 cost 0 e36) queued 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 10 osd.6 pg_epoch: 36 pg[6.2c( v 36'7322 (0'0,36'7322] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'7322 lcod 36'7321 mlcod 36'7321 active+clean] do_op osd_op(client.5376.0:161009 6.2c 6:363bd5f5:::10000009a1c.00000002:head [sparse-read 1855488~65536 [1@-1]] snapc 0=[] RETRY=316361 ondisk+retry+read+known_if_redirected e36) v8 may_read -> read-ordered flags ondisk+retry+read+known_if_redirected 2023-07-15T01:17:27.346+0000 7f0bb275f700 20 osd.6 op_wq(3) _process 6.23 to_process <OpSchedulerItem(6.23 PGOpItem(op=osd_op(client.5376.0:161011 6.23 6.57953a3 (undecoded) ondisk+retry+read+known_if_redirected e36) v8) class_id 3 prio 127 qos_cost 214110 cost 0 e36)> waiting <> waiting_peering {} 2023-07-15T01:17:27.346+0000 7f0bb0f5c700 20 osd.6 pg_epoch: 36 pg[6.2e( v 36'12926 (36'2900,36'12926] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'12926 lcod 36'12925 mlcod 36'12925 active+clean] do_op obc obc(6:75d6f72a:::10000009a25.00000000:head rwstate(read n=1 w=0)) 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 10 osd.6 pg_epoch: 36 pg[6.2c( v 36'7322 (0'0,36'7322] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'7322 lcod 36'7321 mlcod 36'7321 active+clean] get_object_context: found obc in cache: obc(6:363bd5f5:::10000009a1c.00000002:head rwstate(none n=0 w=0)) 2023-07-15T01:17:27.346+0000 7f0bb275f700 20 osd.6 op_wq(3) _process OpSchedulerItem(6.23 PGOpItem(op=osd_op(client.5376.0:161011 6.23 6.57953a3 (undecoded) ondisk+retry+read+known_if_redirected e36) v8) class_id 3 prio 127 qos_cost 214110 cost 0 e36) pg 0x55ec921b6000 2023-07-15T01:17:27.346+0000 7f0bb0f5c700 10 osd.6 pg_epoch: 36 pg[6.2e( v 36'12926 (36'2900,36'12926] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'12926 lcod 36'12925 mlcod 36'12925 active+clean] execute_ctx 0x55ec93eab200 2023-07-15T01:17:27.346+0000 7f0bae757700 20 osd.6 op_wq(3) _process empty q, waiting 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 10 osd.6 pg_epoch: 36 pg[6.2c( v 36'7322 (0'0,36'7322] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'7322 lcod 36'7321 mlcod 36'7321 active+clean] get_object_context: obc(6:363bd5f5:::10000009a1c.00000002:head rwstate(none n=0 w=0)) oi: 6:363bd5f5:::10000009a1c.00000002:head(36'7322 client.5376.0:159005 dirty s 2093056 uv 7322 alloc_hint [0 0 0]) exists: 1 ssc(6:363bd5f5:::10000009a1c.00000002:snapdir snapset: 1=[]:{} ref: 1 registered: 1 exists: 1) 2023-07-15T01:17:27.346+0000 7f0bb0f5c700 10 osd.6 pg_epoch: 36 pg[6.2e( v 36'12926 (36'2900,36'12926] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'12926 lcod 36'12925 mlcod 36'12925 active+clean] execute_ctx 6:75d6f72a:::10000009a25.00000000:head [sparse-read 2387968~65536 [1@-1]] ov 36'12926 2023-07-15T01:17:27.346+0000 7f0bb275f700 10 osd.6 36 dequeue_op osd_op(client.5376.0:161011 6.23 6.57953a3 (undecoded) ondisk+retry+read+known_if_redirected e36) v8 prio 127 cost 0 latency 0.000044 osd_op(client.5376.0:161011 6.23 6.57953a3 (undecoded) ondisk+retry+read+known_if_redirected e36) v8 pg pg[6.23( v 36'5900 (0'0,36'5900] local-lis/les=32/33 n=1 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,1] r=0 lpr=32 crt=36'5900 lcod 36'5899 mlcod 36'5899 active+clean] 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 10 osd.6 pg_epoch: 36 pg[6.2c( v 36'7322 (0'0,36'7322] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'7322 lcod 36'7321 mlcod 36'7321 active+clean] find_object_context 6:363bd5f5:::10000009a1c.00000002:head @head oi=6:363bd5f5:::10000009a1c.00000002:head(36'7322 client.5376.0:159005 dirty s 2093056 uv 7322 alloc_hint [0 0 0]) 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 20 osd.6 pg_epoch: 36 pg[6.2c( v 36'7322 (0'0,36'7322] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'7322 lcod 36'7321 mlcod 36'7321 active+clean] maybe_handle_manifest_detail: 6:363bd5f5:::10000009a1c.00000002:head is not manifest object 2023-07-15T01:17:27.346+0000 7f0bb0f5c700 10 osd.6 pg_epoch: 36 pg[6.2e( v 36'12926 (36'2900,36'12926] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'12926 lcod 36'12925 mlcod 36'12925 active+clean] do_osd_op 6:75d6f72a:::10000009a25.00000000:head [sparse-read 2387968~65536 [1@-1]] 2023-07-15T01:17:27.346+0000 7f0bb275f700 10 osd.6 pg_epoch: 36 pg[6.23( v 36'5900 (0'0,36'5900] local-lis/les=32/33 n=1 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,1] r=0 lpr=32 crt=36'5900 lcod 36'5899 mlcod 36'5899 active+clean] _handle_message: osd_op(client.5376.0:161011 6.23 6.57953a3 (undecoded) ondisk+retry+read+known_if_redirected e36) v8 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 20 osd.6 pg_epoch: 36 pg[6.2c( v 36'7322 (0'0,36'7322] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'7322 lcod 36'7321 mlcod 36'7321 active+clean] do_op obc obc(6:363bd5f5:::10000009a1c.00000002:head rwstate(read n=1 w=0)) 2023-07-15T01:17:27.346+0000 7f0bb0f5c700 10 osd.6 pg_epoch: 36 pg[6.2e( v 36'12926 (36'2900,36'12926] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'12926 lcod 36'12925 mlcod 36'12925 active+clean] do_osd_op sparse-read 2387968~65536 [1@-1] 2023-07-15T01:17:27.346+0000 7f0bad755700 10 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] sparse_read got 4096 bytes from object 6:a5c3f7ca:::10000009a27.00000002:head 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 10 osd.6 pg_epoch: 36 pg[6.2c( v 36'7322 (0'0,36'7322] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'7322 lcod 36'7321 mlcod 36'7321 active+clean] execute_ctx 0x55ec92a09600 2023-07-15T01:17:27.346+0000 7f0bb0f5c700 20 osd.6 pg_epoch: 36 pg[6.2e( v 36'12926 (36'2900,36'12926] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'12926 lcod 36'12925 mlcod 36'12925 active+clean] do_sparse_read 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 10 osd.6 pg_epoch: 36 pg[6.2c( v 36'7322 (0'0,36'7322] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'7322 lcod 36'7321 mlcod 36'7321 active+clean] execute_ctx 6:363bd5f5:::10000009a1c.00000002:head [sparse-read 1855488~65536 [1@-1]] ov 36'7322 2023-07-15T01:17:27.346+0000 7f0bad755700 20 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] execute_ctx alloc reply 0x55ec9b625d40 result 0 2023-07-15T01:17:27.346+0000 7f0bb275f700 20 osd.6 pg_epoch: 36 pg[6.23( v 36'5900 (0'0,36'5900] local-lis/les=32/33 n=1 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,1] r=0 lpr=32 crt=36'5900 lcod 36'5899 mlcod 36'5899 active+clean] do_op: op osd_op(client.5376.0:161011 6.23 6:c5ca9ea0:::10000009a29.00000001:head [sparse-read 610304~65536 [1@-1]] snapc 0=[] RETRY=316360 ondisk+retry+read+known_if_redirected e36) v8 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 10 osd.6 pg_epoch: 36 pg[6.2c( v 36'7322 (0'0,36'7322] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'7322 lcod 36'7321 mlcod 36'7321 active+clean] do_osd_op 6:363bd5f5:::10000009a1c.00000002:head [sparse-read 1855488~65536 [1@-1]] 2023-07-15T01:17:27.346+0000 7f0bad755700 15 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] do_osd_op_effects client.5376 con 0x55ec993e3000 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 10 osd.6 pg_epoch: 36 pg[6.2c( v 36'7322 (0'0,36'7322] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'7322 lcod 36'7321 mlcod 36'7321 active+clean] do_osd_op sparse-read 1855488~65536 [1@-1] 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 20 osd.6 pg_epoch: 36 pg[6.2c( v 36'7322 (0'0,36'7322] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'7322 lcod 36'7321 mlcod 36'7321 active+clean] do_sparse_read 2023-07-15T01:17:27.346+0000 7f0bb275f700 20 osd.6 pg_epoch: 36 pg[6.23( v 36'5900 (0'0,36'5900] local-lis/les=32/33 n=1 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,1] r=0 lpr=32 crt=36'5900 lcod 36'5899 mlcod 36'5899 active+clean] op_has_sufficient_caps session=0x55ec95437a40 pool=6 (cephfs_data ) pool_app_metadata={cephfs={data=cephfs}} need_read_cap=1 need_write_cap=0 classes=[] -> yes 2023-07-15T01:17:27.346+0000 7f0bad755700 15 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] log_op_stats osd_op(client.5376.0:160927 6.25 6:a5c3f7ca:::10000009a27.00000002:head [sparse-read 1511424~4096 out=4120b] snapc 0=[] RETRY=316361 ondisk+retry+read+known_if_redirected e36) v8 inb 0 outb 4120 lat 0.000208 2023-07-15T01:17:27.346+0000 7f0bb275f700 10 osd.6 pg_epoch: 36 pg[6.23( v 36'5900 (0'0,36'5900] local-lis/les=32/33 n=1 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,1] r=0 lpr=32 crt=36'5900 lcod 36'5899 mlcod 36'5899 active+clean] do_op osd_op(client.5376.0:161011 6.23 6:c5ca9ea0:::10000009a29.00000001:head [sparse-read 610304~65536 [1@-1]] snapc 0=[] RETRY=316360 ondisk+retry+read+known_if_redirected e36) v8 may_read -> read-ordered flags ondisk+retry+read+known_if_redirected 2023-07-15T01:17:27.346+0000 7f0bad755700 20 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] prepare_stats_for_publish reporting purged_snaps [] 2023-07-15T01:17:27.346+0000 7f0bb275f700 10 osd.6 pg_epoch: 36 pg[6.23( v 36'5900 (0'0,36'5900] local-lis/les=32/33 n=1 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,1] r=0 lpr=32 crt=36'5900 lcod 36'5899 mlcod 36'5899 active+clean] get_object_context: found obc in cache: obc(6:c5ca9ea0:::10000009a29.00000001:head rwstate(none n=0 w=0)) 2023-07-15T01:17:27.346+0000 7f0bad755700 15 osd.6 pg_epoch: 36 pg[6.25( v 36'5662 (0'0,36'5662] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,2] r=0 lpr=32 crt=36'5662 lcod 36'5661 mlcod 36'5661 active+clean] publish_stats_to_osd 36:638854 2023-07-15T01:17:27.346+0000 7f0bad755700 1 -- [v2:172.21.15.120:6816/146557,v1:172.21.15.120:6817/146557] --> v1:172.21.15.141:0/3368586781 -- osd_op_reply(160927 10000009a27.00000002 [sparse-read 1511424~4096 out=4120b] v0'0 uv5662 ondisk = 0) v8 -- 0x55ec9b625d40 con 0x55ec993e3000 2023-07-15T01:17:27.346+0000 7f0bb275f700 10 osd.6 pg_epoch: 36 pg[6.23( v 36'5900 (0'0,36'5900] local-lis/les=32/33 n=1 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,1] r=0 lpr=32 crt=36'5900 lcod 36'5899 mlcod 36'5899 active+clean] get_object_context: obc(6:c5ca9ea0:::10000009a29.00000001:head rwstate(none n=0 w=0)) oi: 6:c5ca9ea0:::10000009a29.00000001:head(36'5900 client.5376.0:158910 dirty s 4194304 uv 5900 alloc_hint [0 0 0]) exists: 1 ssc(6:c5ca9ea0:::10000009a29.00000001:snapdir snapset: 1=[]:{} ref: 1 registered: 1 exists: 1) 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 10 osd.6 pg_epoch: 36 pg[6.2c( v 36'7322 (0'0,36'7322] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'7322 lcod 36'7321 mlcod 36'7321 active+clean] sparse_read got 65536 bytes from object 6:363bd5f5:::10000009a1c.00000002:head 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 20 osd.6 pg_epoch: 36 pg[6.2c( v 36'7322 (0'0,36'7322] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'7322 lcod 36'7321 mlcod 36'7321 active+clean] execute_ctx alloc reply 0x55ec915d5440 result 0 2023-07-15T01:17:27.346+0000 7f0bb0f5c700 10 osd.6 pg_epoch: 36 pg[6.2e( v 36'12926 (36'2900,36'12926] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'12926 lcod 36'12925 mlcod 36'12925 active+clean] sparse_read got 65536 bytes from object 6:75d6f72a:::10000009a25.00000000:head 2023-07-15T01:17:27.346+0000 7f0bb275f700 10 osd.6 pg_epoch: 36 pg[6.23( v 36'5900 (0'0,36'5900] local-lis/les=32/33 n=1 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,1] r=0 lpr=32 crt=36'5900 lcod 36'5899 mlcod 36'5899 active+clean] find_object_context 6:c5ca9ea0:::10000009a29.00000001:head @head oi=6:c5ca9ea0:::10000009a29.00000001:head(36'5900 client.5376.0:158910 dirty s 4194304 uv 5900 alloc_hint [0 0 0]) 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 15 osd.6 pg_epoch: 36 pg[6.2c( v 36'7322 (0'0,36'7322] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'7322 lcod 36'7321 mlcod 36'7321 active+clean] do_osd_op_effects client.5376 con 0x55ec993e3000 2023-07-15T01:17:27.346+0000 7f0bad755700 10 osd.6 36 dequeue_op osd_op(client.5376.0:160927 6.25 6:a5c3f7ca:::10000009a27.00000002:head [sparse-read 1511424~4096 out=4120b] snapc 0=[] RETRY=316361 ondisk+retry+read+known_if_redirected e36) v8 finish 2023-07-15T01:17:27.346+0000 7f0bb275f700 20 osd.6 pg_epoch: 36 pg[6.23( v 36'5900 (0'0,36'5900] local-lis/les=32/33 n=1 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,1] r=0 lpr=32 crt=36'5900 lcod 36'5899 mlcod 36'5899 active+clean] maybe_handle_manifest_detail: 6:c5ca9ea0:::10000009a29.00000001:head is not manifest object 2023-07-15T01:17:27.346+0000 7f0bb0f5c700 20 osd.6 pg_epoch: 36 pg[6.2e( v 36'12926 (36'2900,36'12926] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'12926 lcod 36'12925 mlcod 36'12925 active+clean] execute_ctx alloc reply 0x55ecaa1786c0 result 0 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 15 osd.6 pg_epoch: 36 pg[6.2c( v 36'7322 (0'0,36'7322] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'7322 lcod 36'7321 mlcod 36'7321 active+clean] log_op_stats osd_op(client.5376.0:161009 6.2c 6:363bd5f5:::10000009a1c.00000002:head [sparse-read 1855488~65536 out=65560b] snapc 0=[] RETRY=316361 ondisk+retry+read+known_if_redirected e36) v8 inb 0 outb 65560 lat 0.000150 2023-07-15T01:17:27.346+0000 7f0bb0f5c700 15 osd.6 pg_epoch: 36 pg[6.2e( v 36'12926 (36'2900,36'12926] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'12926 lcod 36'12925 mlcod 36'12925 active+clean] do_osd_op_effects client.5376 con 0x55ec993e3000 2023-07-15T01:17:27.346+0000 7f0bad755700 20 osd.6 op_wq(5) _process empty q, waiting 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 20 osd.6 pg_epoch: 36 pg[6.2c( v 36'7322 (0'0,36'7322] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'7322 lcod 36'7321 mlcod 36'7321 active+clean] prepare_stats_for_publish reporting purged_snaps [] 2023-07-15T01:17:27.346+0000 7f0bb275f700 20 osd.6 pg_epoch: 36 pg[6.23( v 36'5900 (0'0,36'5900] local-lis/les=32/33 n=1 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,1] r=0 lpr=32 crt=36'5900 lcod 36'5899 mlcod 36'5899 active+clean] do_op obc obc(6:c5ca9ea0:::10000009a29.00000001:head rwstate(read n=1 w=0)) 2023-07-15T01:17:27.346+0000 7f0bb1f5e700 15 osd.6 pg_epoch: 36 pg[6.2c( v 36'7322 (0'0,36'7322] local-lis/les=32/33 n=2 ec=32/32 lis/c=32/32 les/c/f=33/33/0 sis=32) [6,4] r=0 lpr=32 crt=36'7322 lcod 36'7321 mlcod 36'7321 active+clean] publish_stats_to_osd 36:325007
Updated by Xiubo Li 7 months ago
Currently there only have two cases could let the requests to be retried:
3766 static void handle_reply(struct ceph_osd *osd, struct ceph_msg *msg) ...... 3817 if (!ceph_oloc_empty(&m.redirect.oloc)) { 3818 dout("req %p tid %llu redirect pool %lld\n", req, req->r_tid, 3819 m.redirect.oloc.pool); 3820 unlink_request(osd, req); 3821 mutex_unlock(&osd->lock); 3822 3823 /* 3824 * Not ceph_oloc_copy() - changing pool_ns is not 3825 * supported. 3826 */ 3827 req->r_t.target_oloc.pool = m.redirect.oloc.pool; 3828 req->r_flags |= CEPH_OSD_FLAG_REDIRECTED | 3829 CEPH_OSD_FLAG_IGNORE_OVERLAY | 3830 CEPH_OSD_FLAG_IGNORE_CACHE; 3831 req->r_tid = 0; 3832 __submit_request(req, false); 3833 goto out_unlock_osdc; 3834 } 3835 3836 if (m.result == -EAGAIN) { 3837 dout("req %p tid %llu EAGAIN\n", req, req->r_tid); 3838 unlink_request(osd, req); 3839 mutex_unlock(&osd->lock); 3840 3841 /* 3842 * The object is missing on the replica or not (yet) 3843 * readable. Clear pgid to force a resend to the primary 3844 * via legacy_change. 3845 */ 3846 req->r_t.pgid.pool = 0; 3847 req->r_t.pgid.seed = 0; 3848 WARN_ON(!req->r_t.used_replica); 3849 req->r_flags &= ~(CEPH_OSD_FLAG_BALANCE_READS | 3850 CEPH_OSD_FLAG_LOCALIZE_READS); 3851 req->r_tid = 0; 3852 __submit_request(req, false); 3853 goto out_unlock_osdc; 3854 } ......
I didn't see any failure in the osd logs and it most like not the -EAGAIN case.
But I also didn't see the following logs when setting the redirect in the osd logs from:
2816 void PrimaryLogPG::do_cache_redirect(OpRequestRef op) 2817 { 2818 auto m = op->get_req<MOSDOp>(); 2819 int flags = m->get_flags() & (CEPH_OSD_FLAG_ACK|CEPH_OSD_FLAG_ONDISK); 2820 MOSDOpReply *reply = new MOSDOpReply(m, -ENOENT, get_osdmap_epoch(), 2821 flags, false); 2822 request_redirect_t redir(m->get_object_locator(), pool.info.tier_of); 2823 reply->set_redirect(redir); 2824 dout(10) << "sending redirect to pool " << pool.info.tier_of << " for op " 2825 << op << dendl; 2826 m->get_connection()->send_message(reply); 2827 return; 2828 }
Or MOSDOpReply:
137 public: 138 MOSDOpReply() 139 : Message{CEPH_MSG_OSD_OPREPLY, HEAD_VERSION, COMPAT_VERSION}, 140 bdata_encode(false) { 141 do_redirect = false; 142 } ...... 326 void print(std::ostream& out) const override { 327 out << "osd_op_reply(" << get_tid() 328 << " " << oid << " " << ops 329 << " v" << get_replay_version() 330 << " uv" << get_user_version(); 331 if (is_ondisk()) 332 out << " ondisk"; 333 else if (is_onnvram()) 334 out << " onnvram"; 335 else 336 out << " ack"; 337 out << " = " << get_result(); 338 if (get_result() < 0) { 339 out << " (" << cpp_strerror(get_result()) << ")"; 340 } 341 if (is_redirect_reply()) { 342 out << " redirect: { " << redirect << " }"; 343 } 344 out << ")"; 345 } 346 ...... 350 };
Updated by Ilya Dryomov 7 months ago
I suggested to take close look at the new sparse read code added by Jeff to the messenger. The state machine there might be misbehaving and returning some unexpected error, causing the messenger to emit "libceph: osd6 (1)172.21.15.120:6817 socket error on read" message, drop the session, reconnect and resubmit in-flight ops in a tight loop.
Updated by Xiubo Li 7 months ago
Ilya Dryomov wrote:
I suggested to take close look at the new sparse read code added by Jeff to the messenger. The state machine there might be misbehaving and returning some unexpected error, causing the messenger to emit "libceph: osd6 (1)172.21.15.120:6817 socket error on read" message, drop the session, reconnect and resubmit in-flight ops in a tight loop.
Yeah, I also couldn't figure out how the above two cases could cause it. I will spend more time in going through Jeff's sparse read code later.
Thanks
Updated by Xiubo Li 7 months ago
libceph: osd6 (1)172.21.15.120:6817 socket error on read
When the above message seen, it should fail from Line 3175 below:
3156 int ceph_con_v2_try_read(struct ceph_connection *con) 3157 { 3158 int ret; 3159 3160 dout("%s con %p state %d need %zu\n", __func__, con, con->state, 3161 iov_iter_count(&con->v2.in_iter)); 3162 3163 if (con->state == CEPH_CON_S_PREOPEN) 3164 return 0; 3165 3166 /* 3167 * We should always have something pending here. If not, 3168 * avoid calling populate_in_iter() as if we read something 3169 * (ceph_tcp_recv() would immediately return 1). 3170 */ 3171 if (WARN_ON(!iov_iter_count(&con->v2.in_iter))) 3172 return -ENODATA; 3173 3174 for (;;) { 3175 ret = ceph_tcp_recv(con); 3176 if (ret <= 0) 3177 return ret; 3178 3179 ret = populate_in_iter(con); 3180 if (ret <= 0) { 3181 if (ret && ret != -EAGAIN && !con->error_msg) 3182 con->error_msg = "read processing error"; 3183 return ret; 3184 } 3185 } 3186 } 3187
Because if populate_in_iter()--> ... -->spare_read() fails it should set the error_msg to read processing error or something else instead of socket error on read. That means it could be the sparse_read() incorrectly set the parameters and then when trying to read the data from socket buffer it failed.
The sparse-read state machine seems correct for me and it should be we miss the corner case. Need to go through the sparse-read protocol to dig the root cause.
Updated by Patrick Donnelly 7 months ago
- Project changed from Linux kernel client to RADOS
- Status changed from In Progress to Fix Under Review
- Source set to Q/A
- Backport set to reef,quincy,pacific
- Regression changed from No to Yes
- Severity changed from 3 - minor to 2 - major
- Pull request ID set to 54047
- Component(RADOS) OSD added
Updated by Patrick Donnelly 7 months ago
- Has duplicate Bug #63318: OSD: use-after-move in PrimaryLogPG::do_sparse_read() added
Updated by Ilya Dryomov 6 months ago
- Has duplicate deleted (Bug #63318: OSD: use-after-move in PrimaryLogPG::do_sparse_read())
Updated by Xiubo Li 6 months ago
From my test and by going through the ceph side code, I didn't find there is any limit about the max extent array size as in libceph.ko does:
My test result:
[ RUN ] LibRadosIoPP.SparseReadExtentArrayOpPP extents array size : 4297
The kclient code:
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 7af35106acaf..177a1d92c517 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -5850,8 +5850,6 @@ static inline void convert_extent_map(struct ceph_sparse_read *sr) } #endif -#define MAX_EXTENTS 4096 - static int osd_sparse_read(struct ceph_connection *con, struct ceph_msg_data_cursor *cursor, char **pbuf) @@ -5882,16 +5880,6 @@ static int osd_sparse_read(struct ceph_connection *con, if (count > 0) { if (!sr->sr_extent || count > sr->sr_ext_len) { - /* - * Apply a hard cap to the number of extents. - * If we have more, assume something is wrong. - */ - if (count > MAX_EXTENTS) { - dout("%s: OSD returned 0x%x extents in a single reply!\n", - __func__, count); - return -EREMOTEIO; - } - /* no extent array provided, or too short */ kfree(sr->sr_extent); sr->sr_extent = kmalloc_array(count,
This could cause the sparse read fail and make the messager fail and keeps retrying the inflight IOs infinitely.
Updated by Ilya Dryomov 6 months ago
There are many "sanity" limits like that in the kernel client. For example, we also limit the size of the front, middle and data portions of the message even though theoretically it's possible to get larger messages:
#define CEPH_MSG_MAX_CONTROL_LEN (16*1024*1024) #define CEPH_MSG_MAX_FRONT_LEN (16*1024*1024) #define CEPH_MSG_MAX_MIDDLE_LEN (16*1024*1024) /* * The largest possible rbd data object is 32M. * The largest possible rbd object map object is 64M. * * There is no limit on the size of cephfs objects, but it has to obey * rsize and wsize mount options anyway. */ #define CEPH_MSG_MAX_DATA_LEN (64*1024*1024)
Updated by Xiubo Li 5 months ago
- Status changed from In Progress to Duplicate
Same issue with https://tracker.ceph.com/issues/63586.