Project

General

Profile

Actions

Bug #61892

closed

CephFS - Bug #61781: mds: couldn't successfully calculate the locker caps

[testing] qa: test_snapshot_remove (tasks.cephfs.test_strays.TestStrays)

Added by Venky Shankar 10 months ago. Updated 4 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

100%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

/a/yuriw-2023-07-03_15:34:02-fs-quincy_release-distro-default-smithi/7325367

The test creates a file with data in a directory, snapshots the directory, deviates the head version of the file and then removes the head version of the file and the directory. Later, it validates if the snapshotted version of the file is present and correct. The test fails in this validation:

2023-07-04T00:31:43.736 INFO:tasks.cephfs.mount:Validating 8388608 bytes from snapdir/.snap/snap1/subdir/file_a
2023-07-04T00:31:43.737 DEBUG:teuthology.orchestra.run.smithi082:> sudo adjust-ulimits daemon-helper kill python3 -c '
2023-07-04T00:31:43.737 DEBUG:teuthology.orchestra.run.smithi082:> import zlib
2023-07-04T00:31:43.737 DEBUG:teuthology.orchestra.run.smithi082:> path = "/home/ubuntu/cephtest/mnt.0/snapdir/.snap/snap1/subdir/file_a" 
2023-07-04T00:31:43.738 DEBUG:teuthology.orchestra.run.smithi082:> with open(path, '"'"'r'"'"') as f:
2023-07-04T00:31:43.738 DEBUG:teuthology.orchestra.run.smithi082:>     bytes = f.read()
2023-07-04T00:31:43.738 DEBUG:teuthology.orchestra.run.smithi082:> if len(bytes) != 8388608:
2023-07-04T00:31:43.738 DEBUG:teuthology.orchestra.run.smithi082:>     raise RuntimeError("Bad length {0} vs. expected {1}".format(
2023-07-04T00:31:43.739 DEBUG:teuthology.orchestra.run.smithi082:>         len(bytes), 8388608
2023-07-04T00:31:43.739 DEBUG:teuthology.orchestra.run.smithi082:>     ))
2023-07-04T00:31:43.739 DEBUG:teuthology.orchestra.run.smithi082:> for i, b in enumerate(bytes):
2023-07-04T00:31:43.739 DEBUG:teuthology.orchestra.run.smithi082:>     val = zlib.crc32(str(i).encode('"'"'utf-8'"'"')) & 7
2023-07-04T00:31:43.740 DEBUG:teuthology.orchestra.run.smithi082:>     if b != chr(val):
2023-07-04T00:31:43.740 DEBUG:teuthology.orchestra.run.smithi082:>         raise RuntimeError("Bad data at offset {0}".format(i))
2023-07-04T00:31:43.740 DEBUG:teuthology.orchestra.run.smithi082:> '
2023-07-04T00:31:43.919 INFO:teuthology.orchestra.run.smithi082.stderr:Traceback (most recent call last):
2023-07-04T00:31:43.919 INFO:teuthology.orchestra.run.smithi082.stderr:  File "<string>", line 13, in <module>
2023-07-04T00:31:43.919 INFO:teuthology.orchestra.run.smithi082.stderr:RuntimeError: Bad data at offset 0
Actions #1

Updated by Venky Shankar 10 months ago

  • Assignee set to Patrick Donnelly
  • Component(FS) MDS, kceph added
Actions #3

Updated by Patrick Donnelly 10 months ago

  • Project changed from CephFS to Linux kernel client
  • Subject changed from Test failure: test_snapshot_remove (tasks.cephfs.test_strays.TestStrays) to [testing] qa: test_snapshot_remove (tasks.cephfs.test_strays.TestStrays)
  • Category deleted (Correctness/Safety)
  • Assignee changed from Patrick Donnelly to Xiubo Li
  • Target version deleted (v19.0.0)
  • Backport deleted (reef,quincy,pacific)

This is a bug in the testing branch:

https://pulpito.ceph.com/pdonnell-2023-07-11_20:39:21-fs:functional-main-distro-default-smithi/

The osd is returning a trimmed read:

./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf2ae47700  1 -- [v2:172.21.15.101:6821/2336127221,v1:172.21.15.101:6823/2336127221] <== client.14634 192.168.0.1:0/3195104767 1 ==== osd_op(client.14634.0:1 43.12 43.611c37d2 (undecoded) ondisk+read+known_if_redirected e281) v8 ==== 221+0+0 (crc 0 0 0) 0x562d585a8000 con 0x562d5ab03400
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf2ae47700 15 osd.1 281 enqueue_op osd_op(client.14634.0:1 43.12 43.611c37d2 (undecoded) ondisk+read+known_if_redirected e281) v8 prio 127 type 42 cost 0 latency 0.000018 epoch 281 osd_op(client.14634.0:1 43.12 43.611c37d2 (undecoded) ondisk+read+known_if_redirected e281) v8
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf2ae47700 20 osd.1 op_wq(2) _enqueue OpSchedulerItem(43.12 PGOpItem(op=osd_op(client.14634.0:1 43.12 43.611c37d2 (undecoded) ondisk+read+known_if_redirected e281) v8) class_id 3 prio 127 cost 0 e281)
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 20 osd.1 op_wq(2) _process 43.12 to_process <> waiting <> waiting_peering {}
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 20 osd.1 op_wq(2) _process OpSchedulerItem(43.12 PGOpItem(op=osd_op(client.14634.0:1 43.12 43.611c37d2 (undecoded) ondisk+read+known_if_redirected e281) v8) class_id 3 prio 127 qos_cost 446780 cost 0 e281) queued
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 20 osd.1 op_wq(2) _process 43.12 to_process <OpSchedulerItem(43.12 PGOpItem(op=osd_op(client.14634.0:1 43.12 43.611c37d2 (undecoded) ondisk+read+known_if_redirected e281) v8) class_id 3 prio 127 qos_cost 446780 cost 0 e281)> waiting <> waiting_peering {}
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 20 osd.1 op_wq(2) _process OpSchedulerItem(43.12 PGOpItem(op=osd_op(client.14634.0:1 43.12 43.611c37d2 (undecoded) ondisk+read+known_if_redirected e281) v8) class_id 3 prio 127 qos_cost 446780 cost 0 e281) pg 0x562d59aee000
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 10 osd.1 281 dequeue_op osd_op(client.14634.0:1 43.12 43.611c37d2 (undecoded) ondisk+read+known_if_redirected e281) v8 prio 127 cost 0 latency 0.000131 osd_op(client.14634.0:1 43.12 43.611c37d2 (undecoded) ondisk+read+known_if_redirected e281) v8 pg pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean]
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf0adc7700 20 osd.1 op_wq(2) _process empty q, waiting
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 10 osd.1 281 maybe_share_map: con 192.168.0.1:0/3195104767 updating session's projected_epoch from 0 to ping map epoch of 281
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 10 osd.1 281 maybe_share_map: con 192.168.0.1:0/3195104767 our osdmap epoch of 281 is not newer than session's projected_epoch of 281
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 10 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] _handle_message: osd_op(client.14634.0:1 43.12 43.611c37d2 (undecoded) ondisk+read+known_if_redirected e281) v8
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 20 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] do_op: op osd_op(client.14634.0:1 43.12 43:4bec3886:::10000000002.00000000:2 [read 0~4194304 [1@-1]] snapc 0=[] ondisk+read+known_if_redirected e281) v8
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 20 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] op_has_sufficient_caps session=0x562d5abf6f00 pool=43 (cephfs_data ) pool_app_metadata={cephfs={data=cephfs}} need_read_cap=1 need_write_cap=0 classes=[] -> yes
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 10 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] do_op osd_op(client.14634.0:1 43.12 43:4bec3886:::10000000002.00000000:2 [read 0~4194304 [1@-1]] snapc 0=[] ondisk+read+known_if_redirected e281) v8 may_read -> read-ordered flags ondisk+read+known_if_redirected
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 10 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] find_object_context 43:4bec3886:::10000000002.00000000:2 @2 snapset 2=[]:{2=[2]}
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 10 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] get_object_context: found obc in cache: obc(43:4bec3886:::10000000002.00000000:2 rwstate(none n=0 w=0))
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 10 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] get_object_context: obc(43:4bec3886:::10000000002.00000000:2 rwstate(none n=0 w=0)) oi: 43:4bec3886:::10000000002.00000000:2(281'2 client.13519.0:2 dirty s 0 uv 1 alloc_hint [0 0 0]) exists: 1 ssc(43:4bec3886:::10000000002.00000000:snapdir snapset: 2=[]:{2=[2]} ref: 3 registered: 1 exists: 1)
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 20 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] find_object_context 43:4bec3886:::10000000002.00000000:2 snapset 2=[]:{2=[2]}
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 20 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] find_object_context 43:4bec3886:::10000000002.00000000:2 clone_snaps [2] contains 2 -- HIT 43:4bec3886:::10000000002.00000000:2
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 20 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] maybe_handle_manifest_detail: 43:4bec3886:::10000000002.00000000:2 is not manifest object 
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 20 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] do_op obc obc(43:4bec3886:::10000000002.00000000:2 rwstate(read n=1 w=0))
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 10 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] execute_ctx 0x562d5b1a1600
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 10 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] execute_ctx 43:4bec3886:::10000000002.00000000:2 [read 0~4194304 [1@-1]] ov 281'2
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 10 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] do_osd_op 43:4bec3886:::10000000002.00000000:2 [read 0~4194304 [1@-1]]
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 10 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] do_osd_op  read 0~4194304 [1@-1]
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 20 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] do_read
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 20 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] execute_ctx alloc reply 0x562d5918eb40 result 0
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 15 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] do_osd_op_effects client.14634 con 0x562d5ab03400
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 15 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] log_op_stats osd_op(client.14634.0:1 43.12 43:4bec3886:::10000000002.00000000:2 [read 0~0] snapc 0=[] ondisk+read+known_if_redirected e281) v8 inb 0 outb 0 lat 0.000315
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 20 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] PeeringState::prepare_stats_for_publish reporting purged_snaps []
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 15 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] PeeringState::prepare_stats_for_publish publish_stats_to_osd 281:27
./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700  1 -- [v2:172.21.15.101:6821/2336127221,v1:172.21.15.101:6823/2336127221] --> 192.168.0.1:0/3195104767 -- osd_op_reply(1 10000000002.00000000 [read 0~0] v0'0 uv1 ondisk = 0) v8 -- 0x562d5918eb40 con 0x562d5ab03400

from: /teuthology/pdonnell-2023-07-11_20:39:21-fs:functional-main-distro-default-smithi/7334220

vs.

./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.691+0000 7f66b46db700  1 -- [v2:172.21.15.55:6824/2443937958,v1:172.21.15.55:6825/2443937958] <== client.10556 v1:172.21.15.55:0/3911741624 1 ==== osd_op(client.10556.0:1 43.12 43.611c37d2 (undecoded) ondisk+read+known_if_redirected e287) v8 ==== 221+0+0 (unknown 890737275 0 0) 0x55dff2e69400 con 0x55dff4427800
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.691+0000 7f66b46db700 15 osd.1 287 enqueue_op osd_op(client.10556.0:1 43.12 43.611c37d2 (undecoded) ondisk+read+known_if_redirected e287) v8 prio 127 type 42 cost 0 latency 0.000022 epoch 287 osd_op(client.10556.0:1 43.12 43.611c37d2 (undecoded) ondisk+read+known_if_redirected e287) v8
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.691+0000 7f66b46db700 20 osd.1 op_wq(2) _enqueue OpSchedulerItem(43.12 PGOpItem(op=osd_op(client.10556.0:1 43.12 43.611c37d2 (undecoded) ondisk+read+known_if_redirected e287) v8) class_id 3 prio 127 cost 0 e287)
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.691+0000 7f669728b700 20 osd.1 op_wq(2) _process 43.12 to_process <> waiting <> waiting_peering {}
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.691+0000 7f669728b700 20 osd.1 op_wq(2) _process OpSchedulerItem(43.12 PGOpItem(op=osd_op(client.10556.0:1 43.12 43.611c37d2 (undecoded) ondisk+read+known_if_redirected e287) v8) class_id 3 prio 127 qos_cost 224477 cost 0 e287) queued
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 20 osd.1 op_wq(2) _process 43.12 to_process <OpSchedulerItem(43.12 PGOpItem(op=osd_op(client.10556.0:1 43.12 43.611c37d2 (undecoded) ondisk+read+known_if_redirected e287) v8) class_id 3 prio 127 qos_cost 224477 cost 0 e287)> waiting <> waiting_peering {}
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 20 osd.1 op_wq(2) _process OpSchedulerItem(43.12 PGOpItem(op=osd_op(client.10556.0:1 43.12 43.611c37d2 (undecoded) ondisk+read+known_if_redirected e287) v8) class_id 3 prio 127 qos_cost 224477 cost 0 e287) pg 0x55dff3349000
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f6693283700 20 osd.1 op_wq(2) _process empty q, waiting
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 10 osd.1 287 dequeue_op osd_op(client.10556.0:1 43.12 43.611c37d2 (undecoded) ondisk+read+known_if_redirected e287) v8 prio 127 cost 0 latency 0.000076 osd_op(client.10556.0:1 43.12 43.611c37d2 (undecoded) ondisk+read+known_if_redirected e287) v8 pg pg[43.12( v 287'16 (0'0,287'16] local-lis/les=283/284 n=2 ec=283/283 lis/c=283/283 les/c/f=284/284/0 sis=283) [1,2] r=0 lpr=283 crt=287'16 lcod 287'15 mlcod 287'15 active+clean]
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 10 osd.1 287 maybe_share_map: con v1:172.21.15.55:0/3911741624 updating session's projected_epoch from 0 to ping map epoch of 287
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 10 osd.1 287 maybe_share_map: con v1:172.21.15.55:0/3911741624 our osdmap epoch of 287 is not newer than session's projected_epoch of 287
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 10 osd.1 pg_epoch: 287 pg[43.12( v 287'16 (0'0,287'16] local-lis/les=283/284 n=2 ec=283/283 lis/c=283/283 les/c/f=284/284/0 sis=283) [1,2] r=0 lpr=283 crt=287'16 lcod 287'15 mlcod 287'15 active+clean] _handle_message: osd_op(client.10556.0:1 43.12 43.611c37d2 (undecoded) ondisk+read+known_if_redirected e287) v8
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 20 osd.1 pg_epoch: 287 pg[43.12( v 287'16 (0'0,287'16] local-lis/les=283/284 n=2 ec=283/283 lis/c=283/283 les/c/f=284/284/0 sis=283) [1,2] r=0 lpr=283 crt=287'16 lcod 287'15 mlcod 287'15 active+clean] do_op: op osd_op(client.10556.0:1 43.12 43:4bec3886:::10000000002.00000000:2 [read 0~4194304 [1@-1]] snapc 0=[] ondisk+read+known_if_redirected e287) v8
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 20 osd.1 pg_epoch: 287 pg[43.12( v 287'16 (0'0,287'16] local-lis/les=283/284 n=2 ec=283/283 lis/c=283/283 les/c/f=284/284/0 sis=283) [1,2] r=0 lpr=283 crt=287'16 lcod 287'15 mlcod 287'15 active+clean] op_has_sufficient_caps session=0x55e0018b8d20 pool=43 (cephfs_data ) pool_app_metadata={cephfs={data=cephfs}} need_read_cap=1 need_write_cap=0 classes=[] -> yes
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 10 osd.1 pg_epoch: 287 pg[43.12( v 287'16 (0'0,287'16] local-lis/les=283/284 n=2 ec=283/283 lis/c=283/283 les/c/f=284/284/0 sis=283) [1,2] r=0 lpr=283 crt=287'16 lcod 287'15 mlcod 287'15 active+clean] do_op osd_op(client.10556.0:1 43.12 43:4bec3886:::10000000002.00000000:2 [read 0~4194304 [1@-1]] snapc 0=[] ondisk+read+known_if_redirected e287) v8 may_read -> read-ordered flags ondisk+read+known_if_redirected
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 10 osd.1 pg_epoch: 287 pg[43.12( v 287'16 (0'0,287'16] local-lis/les=283/284 n=2 ec=283/283 lis/c=283/283 les/c/f=284/284/0 sis=283) [1,2] r=0 lpr=283 crt=287'16 lcod 287'15 mlcod 287'15 active+clean] find_object_context 43:4bec3886:::10000000002.00000000:2 @2 snapset 2=[]:{2=[2]}
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 10 osd.1 pg_epoch: 287 pg[43.12( v 287'16 (0'0,287'16] local-lis/les=283/284 n=2 ec=283/283 lis/c=283/283 les/c/f=284/284/0 sis=283) [1,2] r=0 lpr=283 crt=287'16 lcod 287'15 mlcod 287'15 active+clean] get_object_context: found obc in cache: obc(43:4bec3886:::10000000002.00000000:2 rwstate(none n=0 w=0))
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 10 osd.1 pg_epoch: 287 pg[43.12( v 287'16 (0'0,287'16] local-lis/les=283/284 n=2 ec=283/283 lis/c=283/283 les/c/f=284/284/0 sis=283) [1,2] r=0 lpr=283 crt=287'16 lcod 287'15 mlcod 287'15 active+clean] get_object_context: obc(43:4bec3886:::10000000002.00000000:2 rwstate(none n=0 w=0)) oi: 43:4bec3886:::10000000002.00000000:2(287'3 client.12199.0:3 dirty|data_digest s 4194304 uv 2 dd 199342ba alloc_hint [0 0 0]) exists: 1 ssc(43:4bec3886:::10000000002.00000000:snapdir snapset: 2=[]:{2=[2]} ref: 3 registered: 1 exists: 1)
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 20 osd.1 pg_epoch: 287 pg[43.12( v 287'16 (0'0,287'16] local-lis/les=283/284 n=2 ec=283/283 lis/c=283/283 les/c/f=284/284/0 sis=283) [1,2] r=0 lpr=283 crt=287'16 lcod 287'15 mlcod 287'15 active+clean] find_object_context 43:4bec3886:::10000000002.00000000:2 snapset 2=[]:{2=[2]}
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 20 osd.1 pg_epoch: 287 pg[43.12( v 287'16 (0'0,287'16] local-lis/les=283/284 n=2 ec=283/283 lis/c=283/283 les/c/f=284/284/0 sis=283) [1,2] r=0 lpr=283 crt=287'16 lcod 287'15 mlcod 287'15 active+clean] find_object_context 43:4bec3886:::10000000002.00000000:2 clone_snaps [2] contains 2 -- HIT 43:4bec3886:::10000000002.00000000:2
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 20 osd.1 pg_epoch: 287 pg[43.12( v 287'16 (0'0,287'16] local-lis/les=283/284 n=2 ec=283/283 lis/c=283/283 les/c/f=284/284/0 sis=283) [1,2] r=0 lpr=283 crt=287'16 lcod 287'15 mlcod 287'15 active+clean] maybe_handle_manifest_detail: 43:4bec3886:::10000000002.00000000:2 is not manifest object
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 20 osd.1 pg_epoch: 287 pg[43.12( v 287'16 (0'0,287'16] local-lis/les=283/284 n=2 ec=283/283 lis/c=283/283 les/c/f=284/284/0 sis=283) [1,2] r=0 lpr=283 crt=287'16 lcod 287'15 mlcod 287'15 active+clean] do_op obc obc(43:4bec3886:::10000000002.00000000:2 rwstate(read n=1 w=0))
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 10 osd.1 pg_epoch: 287 pg[43.12( v 287'16 (0'0,287'16] local-lis/les=283/284 n=2 ec=283/283 lis/c=283/283 les/c/f=284/284/0 sis=283) [1,2] r=0 lpr=283 crt=287'16 lcod 287'15 mlcod 287'15 active+clean] execute_ctx 0x55dff3a3d600
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 10 osd.1 pg_epoch: 287 pg[43.12( v 287'16 (0'0,287'16] local-lis/les=283/284 n=2 ec=283/283 lis/c=283/283 les/c/f=284/284/0 sis=283) [1,2] r=0 lpr=283 crt=287'16 lcod 287'15 mlcod 287'15 active+clean] execute_ctx 43:4bec3886:::10000000002.00000000:2 [read 0~4194304 [1@-1]] ov 287'3
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 10 osd.1 pg_epoch: 287 pg[43.12( v 287'16 (0'0,287'16] local-lis/les=283/284 n=2 ec=283/283 lis/c=283/283 les/c/f=284/284/0 sis=283) [1,2] r=0 lpr=283 crt=287'16 lcod 287'15 mlcod 287'15 active+clean] do_osd_op 43:4bec3886:::10000000002.00000000:2 [read 0~4194304 [1@-1]]
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 10 osd.1 pg_epoch: 287 pg[43.12( v 287'16 (0'0,287'16] local-lis/les=283/284 n=2 ec=283/283 lis/c=283/283 les/c/f=284/284/0 sis=283) [1,2] r=0 lpr=283 crt=287'16 lcod 287'15 mlcod 287'15 active+clean] do_osd_op  read 0~4194304 [1@-1]
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 20 osd.1 pg_epoch: 287 pg[43.12( v 287'16 (0'0,287'16] local-lis/les=283/284 n=2 ec=283/283 lis/c=283/283 les/c/f=284/284/0 sis=283) [1,2] r=0 lpr=283 crt=287'16 lcod 287'15 mlcod 287'15 active+clean] do_read
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 15 bluestore(/var/lib/ceph/osd/ceph-1) read 43.12_head #43:4bec3886:::10000000002.00000000:2# 0x0~400000
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 20 bluestore(/var/lib/ceph/osd/ceph-1) _do_read 0x0~400000 size 0x400000 (4194304)
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 20 bluestore(/var/lib/ceph/osd/ceph-1) _do_read defaulting to buffered read
...
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.695+0000 7f669728b700  1 -- [v2:172.21.15.55:6824/2443937958,v1:172.21.15.55:6825/2443937958] --> v1:172.21.15.55:0/3911741624 -- osd_op_reply(1 10000000002.00000000 [read 0~4194304 out=4194304b] v0'0 uv2 ondisk = 0) v8 -- 0x55dff1032b40 con 0x55dff4427800

from: /teuthology/pdonnell-2023-07-11_20:39:21-fs:functional-main-distro-default-smithi/7334229

It looks like this is caused by the op.extent.truncate_seq but I'm skipping further investigation since Xiubo will be familiar with which patch causes this.

Actions #4

Updated by Xiubo Li 10 months ago

From the failure call trace:

2023-07-04T00:31:43.736 INFO:tasks.cephfs.mount:Validating 8388608 bytes from snapdir/.snap/snap1/subdir/file_a
2023-07-04T00:31:43.737 DEBUG:teuthology.orchestra.run.smithi082:> sudo adjust-ulimits daemon-helper kill python3 -c '
2023-07-04T00:31:43.737 DEBUG:teuthology.orchestra.run.smithi082:> import zlib
2023-07-04T00:31:43.737 DEBUG:teuthology.orchestra.run.smithi082:> path = "/home/ubuntu/cephtest/mnt.0/snapdir/.snap/snap1/subdir/file_a" 
2023-07-04T00:31:43.738 DEBUG:teuthology.orchestra.run.smithi082:> with open(path, '"'"'r'"'"') as f:
2023-07-04T00:31:43.738 DEBUG:teuthology.orchestra.run.smithi082:>     bytes = f.read()
2023-07-04T00:31:43.738 DEBUG:teuthology.orchestra.run.smithi082:> if len(bytes) != 8388608:
2023-07-04T00:31:43.738 DEBUG:teuthology.orchestra.run.smithi082:>     raise RuntimeError("Bad length {0} vs. expected {1}".format(
2023-07-04T00:31:43.739 DEBUG:teuthology.orchestra.run.smithi082:>         len(bytes), 8388608
2023-07-04T00:31:43.739 DEBUG:teuthology.orchestra.run.smithi082:>     ))
2023-07-04T00:31:43.739 DEBUG:teuthology.orchestra.run.smithi082:> for i, b in enumerate(bytes):
2023-07-04T00:31:43.739 DEBUG:teuthology.orchestra.run.smithi082:>     val = zlib.crc32(str(i).encode('"'"'utf-8'"'"')) & 7
2023-07-04T00:31:43.740 DEBUG:teuthology.orchestra.run.smithi082:>     if b != chr(val):
2023-07-04T00:31:43.740 DEBUG:teuthology.orchestra.run.smithi082:>         raise RuntimeError("Bad data at offset {0}".format(i))
2023-07-04T00:31:43.740 DEBUG:teuthology.orchestra.run.smithi082:> '
2023-07-04T00:31:43.919 INFO:teuthology.orchestra.run.smithi082.stderr:Traceback (most recent call last):
2023-07-04T00:31:43.919 INFO:teuthology.orchestra.run.smithi082.stderr:  File "<string>", line 13, in <module>
2023-07-04T00:31:43.919 INFO:teuthology.orchestra.run.smithi082.stderr:RuntimeError: Bad data at offset 0

It seems successfully read the contents but failed when validating them, "Bad data at offset 0".

And from the osd logs in comment#3, it seems the client request just passed a incorrect snapc, which is snapc 0=[], but as my understanding it should be snapc 0=[2] (?)

do_op osd_op(client.10556.0:1 43.12 43:4bec3886:::10000000002.00000000:2 [read 0~4194304 [1@-1]] snapc 0=[] ondisk+read+known_if_redirected e287) v8 may_read -> read-ordered flags ondisk+read+known_if_redirected
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 10 osd.1 pg_epoch: 287 pg[43.12( v 287'16 (0'0,287'16] local-lis/les=283/284 n=2 ec=283/283 lis/c=283/283 les/c/f=284/284/0 sis=283) [1,2] r=0 lpr=283 crt=287'16 lcod 287'15 mlcod 287'15 active+clean] find_object_context 43:4bec3886:::10000000002.00000000:2 @2 snapset 2=[]:{2=[2]}

Actions #5

Updated by Xiubo Li 10 months ago

Xiubo Li wrote:

From the failure call trace:

[...]

It seems successfully read the contents but failed when validating them, "Bad data at offset 0".

And from the osd logs in comment#3, it seems the client request just passed a incorrect snapc, which is snapc 0=[], but as my understanding it should be snapc 0=[2] (?)

do_op osd_op(client.10556.0:1 43.12 43:4bec3886:::10000000002.00000000:2 [read 0~4194304 [1@-1]] snapc 0=[] ondisk+read+known_if_redirected e287) v8 may_read -> read-ordered flags ondisk+read+known_if_redirected
./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 10 osd.1 pg_epoch: 287 pg[43.12( v 287'16 (0'0,287'16] local-lis/les=283/284 n=2 ec=283/283 lis/c=283/283 les/c/f=284/284/0 sis=283) [1,2] r=0 lpr=283 crt=287'16 lcod 287'15 mlcod 287'15 active+clean] find_object_context 43:4bec3886:::10000000002.00000000:2 @2 snapset 2=[]:{2=[2]}

While the truncate_seq and truncate_size were correct from kclient:

For the one we have the issue:

./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 10 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] do_osd_op  read 0~4194304 [1@-1]

For the one worked with stock kernel:

./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 10 osd.1 pg_epoch: 287 pg[43.12( v 287'16 (0'0,287'16] local-lis/les=283/284 n=2 ec=283/283 lis/c=283/283 les/c/f=284/284/0 sis=283) [1,2] r=0 lpr=283 crt=287'16 lcod 287'15 mlcod 287'15 active+clean] do_osd_op  read 0~4194304 [1@-1]

We can see that they were both the same [1@-1].

Then from the osd code, there is only one possible case that could cause the trimmed_read will be set, which is the oi.size was 0:

 5793   // are we beyond truncate_size?
 5794   if ( (seq < op.extent.truncate_seq) && 
 5795        (op.extent.offset + op.extent.length > op.extent.truncate_size) &&
 5796        (size > op.extent.truncate_size) )  
 5797     size = op.extent.truncate_size;
 5798  
 5799   if (op.extent.length == 0) //length is zero mean read the whole object
 5800     op.extent.length = size;
 5801 
 5802   if (op.extent.offset >= size) {
 5803     op.extent.length = 0;
 5804     trimmed_read = true;
 5805   } else if (op.extent.offset + op.extent.length > size) {
 5806     op.extent.length = size - op.extent.offset;
 5807     trimmed_read = true;
 5808   }
 5809 
 5810   dout(30) << __func__ << "op.extent.length is now " << op.extent.length << dendl;
 5811 
 5812   // read into a buffer
 5813   int result = 0;
 5814   if (trimmed_read && op.extent.length == 0) {
 5815     // read size was trimmed to zero and it is expected to do nothing
 5816     // a read operation of 0 bytes does *not* do nothing, this is why
 5817     // the trimmed_read boolean is needed
...

I need to read the osd logs to get why.

Actions #6

Updated by Xiubo Li 10 months ago

from: /teuthology/pdonnell-2023-07-11_20:39:21-fs:functional-main-distro-default-smithi/7334220, the mds just trimmed it to 0 and when cloning the object 10000000002.00000000:2, the size was 0:

2023-07-11T21:26:23.423+0000 7fcf29e45700  1 -- [v2:172.21.15.101:6821/2336127221,v1:172.21.15.101:6823/2336127221] <== mds.0 v2:172.21.15.101:6835/539709401 3 ==== osd_op(mds.0.257:54 43.12 43.611c37d2 (undecoded) ondisk+write+known_if_redirected+full_force+supports_pool_eio e281) v8 ==== 237+0+0 (crc 0 0 0) 0x562d5aa18a00 con 0x562d587a7400
...
2023-07-11T21:26:23.423+0000 7fcf06dbf700 20 trim 281'2 (281'1) clone    43:4bec3886:::10000000002.00000000:2 by unknown.0.0:0 2023-07-11T21:26:18.475483+0000 0 snaps [2] ObjectCleanRegions clean_offsets: [0~18446744073709551615], clean_omap: 1, new_object: 0
2023-07-11T21:26:23.423+0000 7fcf06dbf700 20 trim 281'3 (281'1) modify   43:4bec3886:::10000000002.00000000:head by mds.0.257:54 0.000000 0 ObjectCleanRegions clean_offsets: [0~18446744073709551615], clean_omap: 1, new_object: 0
2023-07-11T21:26:23.423+0000 7fcf06dbf700 20 IndexedLog::trim after trim dups.size()=3 tail=281'3 s=281'3

...

2023-07-11T21:26:23.423+0000 7fcf06dbf700 20 bluestore(/var/lib/ceph/osd/ceph-1).collection(43.12_head 0x562d5b566000)  r -2 v.len 0
2023-07-11T21:26:23.423+0000 7fcf06dbf700 20 bluestore.OnodeSpace(0x562d5b566140 in 0x562d57bd4000) add_onode #43:4bec3886:::10000000002.00000000:2# 0x562d5ad838c0
2023-07-11T21:26:23.423+0000 7fcf06dbf700 20 _add 0x562d57bd4000 #43:4bec3886:::10000000002.00000000:2# added, num=4
2023-07-11T21:26:23.423+0000 7fcf06dbf700 15 bluestore(/var/lib/ceph/osd/ceph-1) _clone 43.12_head #43:4bec3886:::10000000002.00000000:head# -> #43:4bec3886:::10000000002.00000000:2#
2023-07-11T21:26:23.423+0000 7fcf06dbf700 20 bluestore(/var/lib/ceph/osd/ceph-1) _assign_nid 4002
2023-07-11T21:26:23.423+0000 7fcf06dbf700 20 bluestore.onode(0x562d5abba000).flush flush done
2023-07-11T21:26:23.423+0000 7fcf06dbf700 15 bluestore(/var/lib/ceph/osd/ceph-1) _do_truncate 43.12_head #43:4bec3886:::10000000002.00000000:2# 0x0
2023-07-11T21:26:23.423+0000 7fcf06dbf700 15 bluestore(/var/lib/ceph/osd/ceph-1) _do_clone_range 43.12_head #43:4bec3886:::10000000002.00000000:head# -> #43:4bec3886:::10000000002.00000000:2# 0x0~0 ->  0x0~0
2023-07-11T21:26:23.423+0000 7fcf06dbf700 20 bluestore.extentmap(0x562d5ad83a40) dirty_range mark inline shard dirty
2023-07-11T21:26:23.423+0000 7fcf06dbf700 10 bluestore(/var/lib/ceph/osd/ceph-1) _clone 43.12_head #43:4bec3886:::10000000002.00000000:head# -> #43:4bec3886:::10000000002.00000000:2# = 0
2023-07-11T21:26:23.423+0000 7fcf06dbf700 15 bluestore(/var/lib/ceph/osd/ceph-1) _rmattr 43.12_head #43:4bec3886:::10000000002.00000000:2# snapset
2023-07-11T21:26:23.423+0000 7fcf06dbf700 10 bluestore(/var/lib/ceph/osd/ceph-1) _rmattr 43.12_head #43:4bec3886:::10000000002.00000000:2# snapset = 0
2023-07-11T21:26:23.423+0000 7fcf06dbf700 15 bluestore(/var/lib/ceph/osd/ceph-1) _setattrs 43.12_head #43:4bec3886:::10000000002.00000000:2# 1 keys
2023-07-11T21:26:23.423+0000 7fcf06dbf700 10 bluestore(/var/lib/ceph/osd/ceph-1) _setattrs 43.12_head #43:4bec3886:::10000000002.00000000:2# 1 keys = 0

While from: /teuthology/pdonnell-2023-07-11_20:39:21-fs:functional-main-distro-default-smithi/7334229, the size is always 4MB:

2023-07-11T21:20:25.704+0000 7f66b3eda700  1 -- [v2:172.21.15.55:6824/2443937958,v1:172.21.15.55:6825/2443937958] <== mds.0 v2:172.21.15.190:6835/508922477 3 ==== osd_op(mds.0.251:55 43.12 43.611c37d2 (undecoded) ondisk+write+known_if_redirected+full_force+supports_pool_eio e287) v8 ==== 237+0+0 (crc 0 0 0) 0x55dff2724780 con 0x55dff4426c00
...
2023-07-11T21:20:25.704+0000 7f669728b700 20 earliest_dup_version = 0
2023-07-11T21:20:25.704+0000 7f669728b700 20 trim 287'3 (287'2) clone    43:4bec3886:::10000000002.00000000:2 by unknown.0.0:0 2023-07-11T21:20:25.670379+0000 0 snaps [2] ObjectCleanRegions clean_offsets: [0~18446744073709551615], clean_omap: 1, new_object: 0
2023-07-11T21:20:25.704+0000 7f669728b700 20 trim 287'4 (287'2) modify   43:4bec3886:::10000000002.00000000:head by mds.0.251:55 0.000000 0 ObjectCleanRegions clean_offsets: [4194304~18446744073705357311], clean_omap: 1, new_object: 0

...

2023-07-11T21:20:25.704+0000 7f669728b700 20 bluestore(/var/lib/ceph/osd/ceph-1).collection(43.12_head 0x55dff1d1a960)  r -2 v.len 0
2023-07-11T21:20:25.704+0000 7f669728b700 20 bluestore.OnodeSpace(0x55dff1d1aaa0 in 0x55dff073e000) add_onode #43:4bec3886:::10000000002.00000000:2# 0x55dff39d82c0
2023-07-11T21:20:25.704+0000 7f669728b700 20 _add 0x55dff073e000 #43:4bec3886:::10000000002.00000000:2# added, num=4
2023-07-11T21:20:25.704+0000 7f669728b700 15 bluestore(/var/lib/ceph/osd/ceph-1) _clone 43.12_head #43:4bec3886:::10000000002.00000000:head# -> #43:4bec3886:::10000000002.00000000:2#
2023-07-11T21:20:25.704+0000 7f669728b700 20 bluestore(/var/lib/ceph/osd/ceph-1) _assign_nid 4075
2023-07-11T21:20:25.704+0000 7f669728b700 20 bluestore.onode(0x55dff3f582c0).flush flush done
2023-07-11T21:20:25.704+0000 7f669728b700 15 bluestore(/var/lib/ceph/osd/ceph-1) _do_truncate 43.12_head #43:4bec3886:::10000000002.00000000:2# 0x0
2023-07-11T21:20:25.704+0000 7f669728b700 15 bluestore(/var/lib/ceph/osd/ceph-1) _do_clone_range 43.12_head #43:4bec3886:::10000000002.00000000:head# -> #43:4bec3886:::10000000002.00000000:2# 0x0~400000 ->  0x0~400000
2023-07-11T21:20:25.704+0000 7f669728b700 20 bluestore.extentmap(0x55dff3f58440) dup  src 0x0~10000: 0x0~10000 Blob(0x55dff7263a40 blob([0xa59a0000~10000] csum crc32c/0x1000) use_tracker(0x10*0x1000 0x[1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000]) SharedBlob(0x55dff787af50 sbid 0x0))
2023-07-11T21:20:25.704+0000 7f669728b700 20 bluestore(/var/lib/ceph/osd/ceph-1) _assign_blobid 10241
2023-07-11T21:20:25.704+0000 7f669728b700 10 bluestore(/var/lib/ceph/osd/ceph-1).collection(43.12_head 0x55dff1d1a960) make_blob_shared Blob(0x55dff7263a40 blob([0xa59a0000~10000] csum crc32c/0x1000) use_tracker(0x10*0x1000 0x[1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000,1000]) SharedBlob(0x55dff787af50 sbid 0x0))

next step I will check why the MDS set the incorrect ObjectCleanRegions.

Actions #7

Updated by Patrick Donnelly 10 months ago

Looks like you are on the right track Xiubo. I think I was looking at the wrong log earlier when I noted the size was correctly 4MB. See from comment 3:

./remote/smithi101/log/ceph-osd.1.log.gz:2023-07-11T21:26:29.063+0000 7fcf06dbf700 10 osd.1 pg_epoch: 281 pg[43.12( v 281'16 (0'0,281'16] local-lis/les=279/280 n=2 ec=279/279 lis/c=279/279 les/c/f=280/280/0 sis=279) [1,2] r=0 lpr=279 crt=281'16 lcod 281'15 mlcod 281'15 active+clean] get_object_context: obc(43:4bec3886:::10000000002.00000000:2 rwstate(none n=0 w=0)) oi: 43:4bec3886:::10000000002.00000000:2(281'2 client.13519.0:2 dirty s 0 uv 1 alloc_hint [0 0 0]) exists: 1 ssc(43:4bec3886:::10000000002.00000000:snapdir snapset: 2=[]:{2=[2]} ref: 3 registered: 1 exists: 1)

"s 0" size is 0.

vs.

./remote/smithi055/log/ceph-osd.1.log.gz:2023-07-11T21:20:30.692+0000 7f669728b700 10 osd.1 pg_epoch: 287 pg[43.12( v 287'16 (0'0,287'16] local-lis/les=283/284 n=2 ec=283/283 lis/c=283/283 les/c/f=284/284/0 sis=283) [1,2] r=0 lpr=283 crt=287'16 lcod 287'15 mlcod 287'15 active+clean] get_object_context: obc(43:4bec3886:::10000000002.00000000:2 rwstate(none n=0 w=0)) oi: 43:4bec3886:::10000000002.00000000:2(287'3 client.12199.0:3 dirty|data_digest s 4194304 uv 2 dd 199342ba alloc_hint [0 0 0]) exists: 1 ssc(43:4bec3886:::10000000002.00000000:snapdir snapset: 2=[]:{2=[2]} ref: 3 registered: 1 exists: 1)

"s 4194304" is 4MB.

Actions #8

Updated by Xiubo Li 10 months ago

Patrick Donnelly wrote:

Looks like you are on the right track Xiubo. I think I was looking at the wrong log earlier when I noted the size was correctly 4MB. See from comment 3:

[...]

"s 0" size is 0.

vs.

[...]

"s 4194304" is 4MB.

Yeah, I think the kclient reported a incorrect info to MDS and then the MDS truncated the first object.

Even the first object was cleared and set it's size to 0 and the kclient could correctly read the 8MB bytes, it should just treat the first 4MB as a file hole. I think this should well explain why we couldn't see the test fail in:

2023-07-04T00:31:43.738 DEBUG:teuthology.orchestra.run.smithi082:>     raise RuntimeError("Bad length {0} vs. expected {1}".format(
2023-07-04T00:31:43.739 DEBUG:teuthology.orchestra.run.smithi082:>         len(bytes), 8388608
Actions #9

Updated by Xiubo Li 10 months ago

The mds.0 received a reqeust from kclient and truncated the size to 0:

2023-07-11T21:26:23.403+0000 7f720df00700  1 -- [v2:172.21.15.101:6835/539709401,v1:172.21.15.101:6837/539709401] <== client.13519 172.21.15.101:0/443134592 56 ==== client_request(client.13519:36 setattr size=0 #0x10000000002 2023-07-11T21:26:23.403365+0000 caller_uid=1000, caller_gid=1267{6,27,108,1267,10102,}) v6 ==== 232+0+0 (crc 0 0 0) 0x555a1ccaec00 con 0x555a1ccacc00
...
2023-07-11T21:26:23.423+0000 7f720df00700 10 MDSContext::complete: 19C_MDC_RetryTruncate
2023-07-11T21:26:23.423+0000 7f720df00700 10 mds.0.cache _truncate_inode 8388608 -> 0 fscrypt last block length is 0 on [inode 0x10000000002 [3,head] /snapdir/subdir/file_a auth v12 ap=2 DIRTYPARENT truncating(8388608 to 0) s=0 n(v0 rc2023-07-11T21:26:23.403365+0000 1=1+0) need_snapflush={2=13519} (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={13519=0-20971520@1} caps={13519=pAsxLsXsxFc/pAsxXsxFxwb@8},l=13519 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=1 needsnapflush=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x555a1ccc7700]
2023-07-11T21:26:23.423+0000 7f720df00700 10 mds.0.cache  realm snaprealm(0x10000000000 seq 2 lc 2 cr 2 snaps={2=snap(2 0x10000000000 'snap1' 2023-07-11T21:26:23.267368+0000)} last_modified 2023-07-11T21:26:23.267368+0000 change_attr 1 0x555a1cd76000)
2023-07-11T21:26:23.423+0000 7f720df00700 10 mds.0.cache _truncate_inode  snapc 0x555a1cd761b0 on [inode 0x10000000002 [3,head] /snapdir/subdir/file_a auth v12 ap=2 DIRTYPARENT truncating(8388608 to 0) s=0 n(v0 rc2023-07-11T21:26:23.403365+0000 1=1+0) need_snapflush={2=13519} (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={13519=0-20971520@1} caps={13519=pAsxLsXsxFc/pAsxXsxFxwb@8},l=13519 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=1 needsnapflush=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x555a1ccc7700] fscrypt_last_block length is 0
2023-07-11T21:26:23.423+0000 7f720df00700 10 mds.0.cache _truncate_inode truncate on inode [inode 0x10000000002 [3,head] /snapdir/subdir/file_a auth v12 ap=2 DIRTYPARENT truncating(8388608 to 0) s=0 n(v0 rc2023-07-11T21:26:23.403365+0000 1=1+0) need_snapflush={2=13519} (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={13519=0-20971520@1} caps={13519=pAsxLsXsxFc/pAsxXsxFxwb@8},l=13519 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=1 needsnapflush=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x555a1ccc7700]
2023-07-11T21:26:23.423+0000 7f720df00700  1 -- [v2:172.21.15.101:6835/539709401,v1:172.21.15.101:6837/539709401] --> [v2:172.21.15.101:6821/2336127221,v1:172.21.15.101:6823/2336127221] -- osd_op(unknown.0.257:54 43.12 43:4bec3886:::10000000002.00000000:head [trimtrunc 2@0] snapc 2=[2] ondisk+write+known_if_redirected+full_force+supports_pool_eio e281) v8 -- 0x555a1cd00000 con 0x555a1cba9800
2023-07-11T21:26:23.423+0000 7f720df00700  1 -- [v2:172.21.15.101:6835/539709401,v1:172.21.15.101:6837/539709401] --> [v2:172.21.15.186:6824/1204885061,v1:172.21.15.186:6825/1204885061] -- osd_op(unknown.0.257:55 43.1d 43:bbf7764d:::10000000002.00000001:head [trimtrunc 2@0] snapc 2=[2] ondisk+write+known_if_redirected+full_force+supports_pool_eio e281) v8 -- 0x555a1ccf6380 con 0x555a1cba9c00

We can see that the inode 10000000002's needsnapflush=1, which means the dirty capsnap hasn't been flushed to yet when trying to truncate the file_a. So we need to trigger to flush the dirty capsnap in kclient in time.

Actions #10

Updated by Xiubo Li 10 months ago

  • Status changed from New to Duplicate
  • Parent task set to #61781

I found the root cause and this should be duplicated with https://tracker.ceph.com/issues/61781.

Because when truncating the file_a in MDS for the setattr request just before replying to the kclient it will try to trigger:

6427 void MDCache::truncate_inode(CInode *in, LogSegment *ls)
6428 {
6429 const auto& pi = in->get_projected_inode();
6430 dout(10) << "truncate_inode "
6431 << pi->truncate_from << " > " << pi>truncate_size
6432 << " on " << *in
6433 << dendl;
6434
6435 ls->truncating_inodes.insert(in);
6436 in->get(CInode::PIN_TRUNCATING);
6437 in->auth_pin(this);
6438
6439 if (!in->client_need_snapflush.empty() &&
6440 (in->get_caps_issued() & CEPH_CAP_FILE_BUFFER)) {
6441 ceph_assert(in->filelock.is_xlocked());
6442 in->filelock.set_xlock_snap_sync(new C_MDC_RetryTruncate(this, in, ls));
6443 mds->locker->issue_caps(in);
6444 return;
6445 }
6446
6447 _truncate_inode(in, ls);
6448 }

Which will revoke the Fb caps and then wait, the logs:

2023-07-11T21:26:23.403+0000 7f720df00700  1 -- [v2:172.21.15.101:6835/539709401,v1:172.21.15.101:6837/539709401] <== client.13519 172.21.15.101:0/443134592 56 ==== client_request(client.13519:36 setattr size=0 #0x10000000002 2023-07-11T21:26:23.403365+0000 caller_uid=1000, caller_gid=1267{6,27,108,1267,10102,}) v6 ==== 232+0+0 (crc 0 0 0) 0x555a1ccaec00 con 0x555a1ccacc00
2023-07-11T21:26:23.403+0000 7f720df00700  4 mds.0.server handle_client_request client_request(client.13519:36 setattr size=0 #0x10000000002 2023-07-11T21:26:23.403365+0000 caller_uid=1000, caller_gid=1267{6,27,108,1267,10102,}) v6
2023-07-11T21:26:23.403+0000 7f720df00700 20 mds.0.257 get_session have 0x555a1cc68d00 client.13519 172.21.15.101:0/443134592 state open
...
2023-07-11T21:26:23.419+0000 7f7207ef4700 10 mds.0.cache truncate_inode 8388608 -> 0 on [inode 0x10000000002 [3,head] /snapdir/subdir/file_a auth v12 ap=3 DIRTYPARENT truncating(8388608 to 0) s=0 n(v0 rc2023-07-11T21:26:23.403365+0000 1=1+0) need_snapflush={2=13519} (iauth excl) (ifile xlockdone x=1) (ixattr excl) (iversion lock w=1 last_client=13519) cr={13519=0-20971520@1} caps={13519=pAsxLsXsxFscb/pAsxXsxFxwb@7},l=13519 | ptrwaiter=0 request=1 lock=3 caps=1 needsnapflush=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x555a1ccc7700]
2023-07-11T21:26:23.419+0000 7f7207ef4700 10 mds.0.cache.ino(0x10000000002) auth_pin by 0x555a1cbe7600 on [inode 0x10000000002 [3,head] /snapdir/subdir/file_a auth v12 ap=4 DIRTYPARENT truncating(8388608 to 0) s=0 n(v0 rc2023-07-11T21:26:23.403365+0000 1=1+0) need_snapflush={2=13519} (iauth excl) (ifile xlockdone x=1) (ixattr excl) (iversion lock w=1 last_client=13519) cr={13519=0-20971520@1} caps={13519=pAsxLsXsxFscb/pAsxXsxFxwb@7},l=13519 | ptrwaiter=0 request=1 lock=3 caps=1 truncating=1 needsnapflush=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x555a1ccc7700] now 4
2023-07-11T21:26:23.419+0000 7f7207ef4700 15 mds.0.cache.dir(0x10000000001) adjust_nested_auth_pins 1 on [dir 0x10000000001 /snapdir/subdir/ [2,head] auth v=13 cv=0/0 ap=1+9 state=1610874881|complete f(v0 m2023-07-11T21:26:18.475483+0000 1=1+0) n(v0 rc2023-07-11T21:26:23.403365+0000 1=1+0) hs=1+0,ss=1+0 dirty=2 | child=1 dirty=1 authpin=1 0x555a1cc14880] by 0x555a1ccc7700 count now 1/9
2023-07-11T21:26:23.419+0000 7f7207ef4700 10 mds.0.cache.ino(0x10000000002) add_waiter tag 2000000040000000 0x555a1cd706e0 !ambig 1 !frozen 1 !freezing 1
2023-07-11T21:26:23.419+0000 7f7207ef4700 15 mds.0.cache.ino(0x10000000002) taking waiter here
2023-07-11T21:26:23.419+0000 7f7207ef4700  7 mds.0.locker get_allowed_caps loner client.13519 allowed=pAsxLsXsxFc, xlocker allowed=pAsxLsXsxFc, others allowed=pLsFc on [inode 0x10000000002 [3,head] /snapdir/subdir/file_a auth v12 ap=4 DIRTYPARENT truncating(8388608 to 0) s=0 n(v0 rc2023-07-11T21:26:23.403365+0000 1=1+0) need_snapflush={2=13519} (iauth excl) (ifile xlocksnap x=1) (ixattr excl) (iversion lock w=1 last_client=13519) cr={13519=0-20971520@1} caps={13519=pAsxLsXsxFscb/pAsxXsxFxwb@7},l=13519 | ptrwaiter=0 request=1 lock=3 caps=1 truncating=1 needsnapflush=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x555a1ccc7700]
2023-07-11T21:26:23.419+0000 7f7207ef4700 20 mds.0.locker  client.13519 pending pAsxLsXsxFscb allowed pAsxLsXsxFc wanted pAsxXsxFxwb
2023-07-11T21:26:23.423+0000 7f7207ef4700  7 mds.0.locker    sending MClientCaps to client.13519 seq 8 new pending pAsxLsXsxFc was pAsxLsXsxFscb
2023-07-11T21:26:23.423+0000 7f7207ef4700 20 mds.0.cache.ino(0x10000000002) encode_cap_message pfile 1 pauth 0 plink 0 pxattr 0 mtime 2023-07-11T21:26:23.403365+0000 ctime 2023-07-11T21:26:23.403365+0000 change_attr 4
2023-07-11T21:26:23.423+0000 7f7207ef4700 10 mds.0.257 send_message_client_counted client.13519 seq 14 client_caps(revoke ino 0x10000000002 1 seq 8 caps=pAsxLsXsxFc dirty=- wanted=pAsxXsxFxwb follows 0 size 0/20971520 ts 2/0 mtime 2023-07-11T21:26:23.403365+0000 ctime 2023-07-11T21:26:23.403365+0000 change_attr 4 tws 1) v12
2023-07-11T21:26:23.423+0000 7f7207ef4700  1 -- [v2:172.21.15.101:6835/539709401,v1:172.21.15.101:6837/539709401] --> 172.21.15.101:0/443134592 -- client_caps(revoke ino 0x10000000002 1 seq 8 caps=pAsxLsXsxFc dirty=- wanted=pAsxXsxFxwb follows 0 size 0/20971520 ts 2/0 mtime 2023-07-11T21:26:23.403365+0000 ctime 2023-07-11T21:26:23.403365+0000 change_attr 4 tws 1) v12 -- 0x555a1cd00380 con 0x555a1ccacc00
2023-07-11T21:26:23.423+0000 7f7207ef4700 20 mds.0.bal hit_dir 1 pop is 5.02369, frag * size 1 [pop IRD:[C 0.00e+00] IWR:[C 5.02e+00] RDR:[C 0.00e+00] FET:[C 0.00e+00] STR:[C 0.00e+00] *LOAD:10.0]
2023-07-11T21:26:23.423+0000 7f7207ef4700  7 mds.0.server reply_client_request 0 ((0) Success) client_request(client.13519:36 setattr size=0 #0x10000000002 2023-07-11T21:26:23.403365+0000 caller_uid=1000, caller_gid=1267{6,27,108,1267,10102,}) v6

And then later if the kclient need to flush the dirty buffer before releasing the Fb caps it will defer flushing the capsnap, but still there is possible that the kclient could send a cap update reqeust, such as:

2023-07-11T21:26:23.423+0000 7f720df00700  1 -- [v2:172.21.15.101:6835/539709401,v1:172.21.15.101:6837/539709401] <== client.13519 172.21.15.101:0/443134592 58 ==== client_caps(update ino 0x10000000002 1 seq 8 caps=pAsxLsXsxFcb dirty=- wanted=pAsxXsxFxwb follows 0 size 0/0 mtime 2023-07-11T21:26:23.403365+0000 ctime 2023-07-11T21:26:23.403365+0000 change_attr 4 tws 1 xattrs(v=3474562154742264576 l=0)) v12 ==== 268+0+0 (crc 0 0 0) 0x555a1cd01180 con 0x555a1ccacc00
2023-07-11T21:26:23.423+0000 7f720df00700  7 mds.0.locker handle_client_caps  on 0x10000000002 tid 0 follows 0 op update flags 0x4
2023-07-11T21:26:23.423+0000 7f720df00700 20 mds.0.257 get_session have 0x555a1cc68d00 client.13519 172.21.15.101:0/443134592 state open
2023-07-11T21:26:23.423+0000 7f720df00700 10 mds.0.locker  head inode [inode 0x10000000002 [3,head] /snapdir/subdir/file_a auth v12 ap=3 DIRTYPARENT truncating(8388608 to 0) s=0 n(v0 rc2023-07-11T21:26:23.403365+0000 1=1+0) need_snapflush={2=13519} (iauth excl) (ifile xlocksnap) (ixattr excl) (iversion lock) cr={13519=0-20971520@1} caps={13519=pAsxLsXsxFc/pAsxLsXsxFscb/pAsxXsxFxwb@8},l=13519 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=1 needsnapflush=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x555a1ccc7700]
2023-07-11T21:26:23.423+0000 7f720df00700 10 Capability revocation is not totally finished yet on [inode 0x10000000002 [3,head] /snapdir/subdir/file_a auth v12 ap=3 DIRTYPARENT truncating(8388608 to 0) s=0 n(v0 rc2023-07-11T21:26:23.403365+0000 1=1+0) need_snapflush={2=13519} (iauth excl) (ifile xlocksnap) (ixattr excl) (iversion lock) cr={13519=0-20971520@1} caps={13519=pAsxLsXsxFc/pAsxLsXsxFcb/pAsxXsxFxwb@8},l=13519 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=1 needsnapflush=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x555a1ccc7700], the session smithi101:0 (13519)
2023-07-11T21:26:23.423+0000 7f720df00700 10 mds.0.locker  follows 0 retains pAsxLsXsxFcb dirty - on [inode 0x10000000002 [3,head] /snapdir/subdir/file_a auth v12 ap=3 DIRTYPARENT truncating(8388608 to 0) s=0 n(v0 rc2023-07-11T21:26:23.403365+0000 1=1+0) need_snapflush={2=13519} (iauth excl) (ifile xlocksnap) (ixattr excl) (iversion lock) cr={13519=0-20971520@1} caps={13519=pAsxLsXsxFc/pAsxXsxFxwb@8},l=13519 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=1 needsnapflush=1 dirtyparent=1 dirty=1 waiter=1 authpin=1 0x555a1ccc7700]

Then since there is buggy in MDS, which is https://tracker.ceph.com/issues/61781, the MDS will incorrectly update the cap and wakes up the waiters by mistake even the Fb caps is still not released by kclient.

Then the MDS wake up the waiters and then truncated the file with the capsnap still not being flushed, please see needsnapflush=1:

2023-07-11T21:26:23.423+0000 7f720df00700 10 MDSContext::complete: 19C_MDC_RetryTruncate
2023-07-11T21:26:23.423+0000 7f720df00700 10 mds.0.cache _truncate_inode 8388608 -> 0 fscrypt last block length is 0 on [inode 0x10000000002 [3,head] /snapdir/subdir/file_a auth v12 ap=2 DIRTYPARENT truncating(8388608 to 0) s=0 n(v0 rc2023-07-11T21:26:23.403365+0000 1=1+0) need_snapflush={2=13519} (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={13519=0-20971520@1} caps={13519=pAsxLsXsxFc/pAsxXsxFxwb@8},l=13519 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=1 needsnapflush=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x555a1ccc7700]
2023-07-11T21:26:23.423+0000 7f720df00700 10 mds.0.cache  realm snaprealm(0x10000000000 seq 2 lc 2 cr 2 snaps={2=snap(2 0x10000000000 'snap1' 2023-07-11T21:26:23.267368+0000)} last_modified 2023-07-11T21:26:23.267368+0000 change_attr 1 0x555a1cd76000)
2023-07-11T21:26:23.423+0000 7f720df00700 10 mds.0.cache _truncate_inode  snapc 0x555a1cd761b0 on [inode 0x10000000002 [3,head] /snapdir/subdir/file_a auth v12 ap=2 DIRTYPARENT truncating(8388608 to 0) s=0 n(v0 rc2023-07-11T21:26:23.403365+0000 1=1+0) need_snapflush={2=13519} (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={13519=0-20971520@1} caps={13519=pAsxLsXsxFc/pAsxXsxFxwb@8},l=13519 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=1 needsnapflush=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x555a1ccc7700] fscrypt_last_block length is 0
2023-07-11T21:26:23.423+0000 7f720df00700 10 mds.0.cache _truncate_inode truncate on inode [inode 0x10000000002 [3,head] /snapdir/subdir/file_a auth v12 ap=2 DIRTYPARENT truncating(8388608 to 0) s=0 n(v0 rc2023-07-11T21:26:23.403365+0000 1=1+0) need_snapflush={2=13519} (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={13519=0-20971520@1} caps={13519=pAsxLsXsxFc/pAsxXsxFxwb@8},l=13519 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=1 needsnapflush=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x555a1ccc7700]
2023-07-11T21:26:23.423+0000 7f720df00700  1 -- [v2:172.21.15.101:6835/539709401,v1:172.21.15.101:6837/539709401] --> [v2:172.21.15.101:6821/2336127221,v1:172.21.15.101:6823/2336127221] -- osd_op(unknown.0.257:54 43.12 43:4bec3886:::10000000002.00000000:head [trimtrunc 2@0] snapc 2=[2] ondisk+write+known_if_redirected+full_force+supports_pool_eio e281) v8 -- 0x555a1cd00000 con 0x555a1cba9800
2023-07-11T21:26:23.423+0000 7f720df00700  1 -- [v2:172.21.15.101:6835/539709401,v1:172.21.15.101:6837/539709401] --> [v2:172.21.15.186:6824/1204885061,v1:172.21.15.186:6825/1204885061] -- osd_op(unknown.0.257:55 43.1d 43:bbf7764d:::10000000002.00000001:head [trimtrunc 2@0] snapc 2=[2] ondisk+write+known_if_redirected+full_force+supports_pool_eio e281) v8 -- 0x555a1ccf6380 con 0x555a1cba9c00

And then later after the dirty buffers were flushed the kclient flushed the capsnap, but it was too late:

2023-07-11T21:26:23.467+0000 7f720df00700  1 -- [v2:172.21.15.101:6835/539709401,v1:172.21.15.101:6837/539709401] <== client.13519 172.21.15.101:0/443134592 59 ==== client_caps(flushsnap ino 0x10000000002 0 seq 0 tid 4 caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=- follows 1 size 8388608/0 mtime 2023-07-11T21:26:22.779380+0000 ctime 2023-07-11T21:26:22.779380+0000 change_attr 2) v12 ==== 268+0+0 (crc 0 0 0) 0x555a1cd53880 con 0x555a1ccacc00
2023-07-11T21:26:23.467+0000 7f720df00700  7 mds.0.locker handle_client_caps  on 0x10000000002 tid 4 follows 1 op flushsnap flags 0x0
2023-07-11T21:26:23.467+0000 7f720df00700 20 mds.0.257 get_session have 0x555a1cc68d00 client.13519 172.21.15.101:0/443134592 state open
2023-07-11T21:26:23.467+0000 7f720df00700 10 mds.0.locker  head inode [inode 0x10000000002 [3,head] /snapdir/subdir/file_a auth v14 ap=1 DIRTYPARENT s=0 n(v0 rc2023-07-11T21:26:23.403365+0000 1=1+0) need_snapflush={2=13519} (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={13519=0-20971520@1} caps={13519=pAsxLsXsxFsxcrwb/pAsxXsxFxwb@9},l=13519 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=0 needsnapflush=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x555a1ccc7700]

Why only the upstream testing branch could reproduce it ? That's because recently we have one patch to fix another snapshot issue and which will do the correct thing but defer flushing the capsnap:

commit 2d12ad950b0c2a89d82f5d258309ad23aa70fc38
Author: Xiubo Li <xiubli@redhat.com>
Date:   Thu May 11 13:19:45 2023 +0800

    ceph: trigger to flush the buffer when making snapshot

    The 'i_wr_ref' is used to track the 'Fb' caps, while whenever the 'Fb'
    caps is took the kclient will always take the 'Fw' caps at the same
    time. That means it will always be a false check in __ceph_finish_cap_snap().

    When writing to buffer the kclient will take both 'Fb|Fw' caps and then
    write the contents to the buffer pages by increasing the 'i_wrbuffer_ref'
    and then just release both 'Fb|Fw'. This is different with the user
    space libcephfs, which will keep the 'Fb' being took and use 'i_wr_ref'
    instead of 'i_wrbuffer_ref' to track this until the buffer is flushed
    to Rados.

    We need to defer flushing the capsnap until the corresponding buffer
    pages are all flushed to Rados, and at the same time just trigger to
    flush the buffer pages immediately.

    Link: https://tracker.ceph.com/issues/48640
    Link: https://tracker.ceph.com/issues/59343
    Signed-off-by: Xiubo Li <xiubli@redhat.com>
    Reviewed-by: Milind Changire <mchangir@redhat.com>
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

Thanks.

Actions #11

Updated by Venky Shankar 7 months ago

Xiubo, I'm seeing this in a quincy run: https://pulpito.ceph.com/yuriw-2023-10-06_22:26:38-fs-wip-yuri3-testing-2023-10-06-0948-quincy-distro-default-smithi/7415910/

The tracker to which this is duplicated https://tracker.ceph.com/issues/61781, has the changes merged in quincy, so I'm not sure why this shows up again - perhaps we need additional fixes? The backport run is fine since this failure isn't related to any of the PRs being tested.

Actions #12

Updated by Xiubo Li 7 months ago

Venky Shankar wrote:

Xiubo, I'm seeing this in a quincy run: https://pulpito.ceph.com/yuriw-2023-10-06_22:26:38-fs-wip-yuri3-testing-2023-10-06-0948-quincy-distro-default-smithi/7415910/

The tracker to which this is duplicated https://tracker.ceph.com/issues/61781, has the changes merged in quincy, so I'm not sure why this shows up again - perhaps we need additional fixes? The backport run is fine since this failure isn't related to any of the PRs being tested.

Let me have a check.

Actions #13

Updated by Xiubo Li 7 months ago

Venky Shankar wrote:

Xiubo, I'm seeing this in a quincy run: https://pulpito.ceph.com/yuriw-2023-10-06_22:26:38-fs-wip-yuri3-testing-2023-10-06-0948-quincy-distro-default-smithi/7415910/

The tracker to which this is duplicated https://tracker.ceph.com/issues/61781, has the changes merged in quincy, so I'm not sure why this shows up again - perhaps we need additional fixes? The backport run is fine since this failure isn't related to any of the PRs being tested.

Venky, this is another known issue and the exactly the same with https://tracker.ceph.com/issues/62580, more detail please see the following logs, just in case the Fb caps was still under being revoked the MDS just issued the truncate to Rados and didn't wait the client to release the Fb:

2023-10-07T07:07:06.830+0000 7f5fbf9f6700 10 mds.0.locker simple_eval (ilink sync) on [inode 0x10000000002 [3,head] /snapdir/subdir/file_a auth v10 ap=2 DIRTYPARENT truncating(8388608 to 0) s=0 n(v0 rc2023-10-07T07:07:06.821107+0000 1=1+0) need_snapflush={2=11395} (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={11395=0-20971520@1} caps={11395=pAsxLsXsxFc/pAsxLsXsxFcb/pAsxXsxFxwb@8},l=11395 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=1 needsnapflush=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x562c69d7fb80]
2023-10-07T07:07:06.830+0000 7f5fbf9f6700 10 mds.0.locker simple_eval (ixattr excl) on [inode 0x10000000002 [3,head] /snapdir/subdir/file_a auth v10 ap=2 DIRTYPARENT truncating(8388608 to 0) s=0 n(v0 rc2023-10-07T07:07:06.821107+0000 1=1+0) need_snapflush={2=11395} (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={11395=0-20971520@1} caps={11395=pAsxLsXsxFc/pAsxLsXsxFcb/pAsxXsxFxwb@8},l=11395 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=1 needsnapflush=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x562c69d7fb80]
2023-10-07T07:07:06.830+0000 7f5fbf9f6700 10 MDSContext::complete: 19C_MDC_RetryTruncate
2023-10-07T07:07:06.830+0000 7f5fbf9f6700 10 mds.0.cache _truncate_inode 8388608 -> 0 on [inode 0x10000000002 [3,head] /snapdir/subdir/file_a auth v10 ap=2 DIRTYPARENT truncating(8388608 to 0) s=0 n(v0 rc2023-10-07T07:07:06.821107+0000 1=1+0) need_snapflush={2=11395} (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={11395=0-20971520@1} caps={11395=pAsxLsXsxFc/pAsxLsXsxFcb/pAsxXsxFxwb@8},l=11395 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=1 needsnapflush=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x562c69d7fb80]
2023-10-07T07:07:06.830+0000 7f5fbf9f6700 10 mds.0.cache  realm snaprealm(0x10000000000 seq 2 lc 2 cr 2 snaps={2=snap(2 0x10000000000 'snap1' 2023-10-07T07:07:06.737109+0000)} 0x562c68f34c00)
2023-10-07T07:07:06.830+0000 7f5fbf9f6700 10 mds.0.cache _truncate_inode  snapc 0x562c68f34da0 on [inode 0x10000000002 [3,head] /snapdir/subdir/file_a auth v10 ap=2 DIRTYPARENT truncating(8388608 to 0) s=0 n(v0 rc2023-10-07T07:07:06.821107+0000 1=1+0) need_snapflush={2=11395} (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={11395=0-20971520@1} caps={11395=pAsxLsXsxFc/pAsxLsXsxFcb/pAsxXsxFxwb@8},l=11395 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=1 needsnapflush=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x562c69d7fb80]
2023-10-07T07:07:06.830+0000 7f5fbf9f6700  1 --2- [v2:172.21.15.86:6836/3755720784,v1:172.21.15.86:6837/3755720784] >> [v2:172.21.15.135:6821/86037,v1:172.21.15.135:6823/86037] conn(0x562c69d6f800 0x562c69d81700 unknown :-1 s=NONE pgs=0 cs=0 l=1 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).connect
2023-10-07T07:07:06.830+0000 7f5fbf9f6700  1 -- [v2:172.21.15.86:6836/3755720784,v1:172.21.15.86:6837/3755720784] --> [v2:172.21.15.135:6821/86037,v1:172.21.15.135:6823/86037] -- osd_op(unknown.0.262:55 42.12 42:4bec3886:::10000000002.00000000:head [trimtrunc 2@0] snapc 2=[2] ondisk+write+known_if_redirected+full_force+supports_pool_eio e281) v8 -- 0x562c69d96800 con 0x562c69d6f800
2023-10-07T07:07:06.830+0000 7f5fbf9f6700  1 -- [v2:172.21.15.86:6836/3755720784,v1:172.21.15.86:6837/3755720784] --> [v2:172.21.15.86:6815/85936,v1:172.21.15.86:6817/85936] -- osd_op(unknown.0.262:56 42.1d 42:bbf7764d:::10000000002.00000001:head [trimtrunc 2@0] snapc 2=[2] ondisk+write+known_if_redirected+full_force+supports_pool_eio e281) v8 -- 0x562c69d96000 con 0x562c69d83800
2023-10-07T07:07:06.830+0000 7f5fbf9f6700  7 mds.0.locker get_allowed_caps loner client.11395 allowed=pAsxLsXsxFsxcrwb, xlocker allowed=pAsxLsXsxFsxcrwb, others allowed=pLs on [inode 0x10000000002 [3,head] /snapdir/subdir/file_a auth v10 ap=2 DIRTYPARENT truncating(8388608 to 0) s=0 n(v0 rc2023-10-07T07:07:06.821107+0000 1=1+0) need_snapflush={2=11395} (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={11395=0-20971520@1} caps={11395=pAsxLsXsxFc/pAsxLsXsxFcb/pAsxXsxFxwb@8},l=11395 | ptrwaiter=0 request=0 lock=0 caps=1 truncating=1 needsnapflush=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x562c69d7fb80]
2023-10-07T07:07:06.830+0000 7f5fbf9f6700 20 mds.0.locker  client.11395 pending pAsxLsXsxFc allowed pAsxLsXsxFsxcrwb wanted pAsxXsxFxwb
2023-10-07T07:07:06.830+0000 7f5fbf9f6700  7 mds.0.locker    sending MClientCaps to client.11395 seq 9 new pending pAsxLsXsxFsxcrwb was pAsxLsXsxFc
Actions

Also available in: Atom PDF