Bug #56011
openfs/thrash: snaptest-snap-rm-cmp.sh fails in mds5sum comparison
0%
Description
022-06-10T03:25:25.986 INFO:tasks.workunit.client.0.smithi070.stderr:++ md5sum /tmp/a 2022-06-10T03:25:25.987 INFO:tasks.workunit.client.0.smithi070.stderr:++ awk '{print $1}' 2022-06-10T03:25:26.013 INFO:tasks.ceph.osd.6.smithi120.stderr:2022-06-10T03:25:26.001+0000 7f5be3df8700 -1 received signal: Hangup from /usr/bin/python3 /bin/daemon-helper kill ceph-osd -f --cluster ceph -i 6 (PID: 86251) UID: 0 2022-06-10T03:25:26.104 INFO:tasks.ceph.osd.2.smithi070.stderr:2022-06-10T03:25:26.101+0000 7f75327f8700 -1 received signal: Hangup from /usr/bin/python3 /bin/daemon-helper kill ceph-osd -f --cluster ceph -i 2 (PID: 39886) UID: 0 2022-06-10T03:25:26.137 DEBUG:teuthology.orchestra.run:got remote process result: 1 2022-06-10T03:25:26.138 INFO:tasks.workunit.client.0.smithi070.stdout:FAIL: bad match, /tmp/a 4637e766853d1ad16a7b17079e2c6f03 != real c3883760b18d50e8d78819c54d579b00 2022-06-10T03:25:26.139 INFO:tasks.workunit.client.0.smithi070.stderr:+ cur=4637e766853d1ad16a7b17079e2c6f03 2022-06-10T03:25:26.139 INFO:tasks.workunit.client.0.smithi070.stderr:+ '[' 4637e766853d1ad16a7b17079e2c6f03 '!=' c3883760b18d50e8d78819c54d579b00 ']' 2022-06-10T03:25:26.139 INFO:tasks.workunit.client.0.smithi070.stderr:+ echo 'FAIL: bad match, /tmp/a 4637e766853d1ad16a7b17079e2c6f03 != real c3883760b18d50e8d78819c54d579b00' 2022-06-10T03:25:26.140 INFO:tasks.workunit.client.0.smithi070.stderr:+ false 2022-06-10T03:25:26.140 INFO:tasks.workunit:Stopping ['fs/snaps'] on client.0... 2022-06-10T03:25:26.140 DEBUG:teuthology.orchestra.run.smithi070:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
Also reproducible in main branch: https://pulpito.ceph.com/vshankar-2022-06-10_05:38:08-fs-wip-vshankar-testing1-20220607-104134-testing-default-smithi/6872838/ (wip-vshankar-testing-20220607-104134 is nearly main).
Updated by Venky Shankar almost 2 years ago
- Status changed from New to Triaged
- Assignee set to Xiubo Li
Updated by Xiubo Li almost 2 years ago
- Status changed from Triaged to In Progress
Updated by Xiubo Li almost 2 years ago
For the normal test case, which succeeded, we can see that when lookup the snapshot file #0x100000013fe//s/a, the size is s=66266488:
-8420> 2022-06-10T06:20:49.731+0000 7fa1e25f6700 7 mds.1.server dispatch_client_request client_request(client.4785:113102 lookup #0x100000013fe//s/a 2022-06-10T06:20:49.731528+0000 caller_uid=1000, caller_gid=1264{6,36,1000,1264,}) v6 -8419> 2022-06-10T06:20:49.731+0000 7fa1e25f6700 10 mds.1.server rdlock_path_pin_ref request(client.4785:113102 nref=2 cr=0x55b94fb58000) #0x100000013fe//s/a -8418> 2022-06-10T06:20:49.731+0000 7fa1e25f6700 7 mds.1.cache traverse: opening base ino 0x100000013fe snap head -8417> 2022-06-10T06:20:49.731+0000 7fa1e25f6700 12 mds.1.cache traverse: path seg depth 0 '' snapid head -8416> 2022-06-10T06:20:49.731+0000 7fa1e25f6700 10 mds.1.cache traverse: snapdir -8415> 2022-06-10T06:20:49.731+0000 7fa1e25f6700 12 mds.1.cache traverse: path seg depth 1 's' snapid snapdir -8414> 2022-06-10T06:20:49.731+0000 7fa1e25f6700 10 mds.1.cache.snaprealm(0x100000013fe seq 596 0x55b94de61c00) resolve_snapname 's' in [0,head] -8413> 2022-06-10T06:20:49.731+0000 7fa1e25f6700 15 mds.1.cache.snaprealm(0x100000013fe seq 596 0x55b94de61c00) ? snap(254 0x100000013fe 's' 2022-06-10T06:20:49.665051+0000) -8412> 2022-06-10T06:20:49.731+0000 7fa1e25f6700 10 mds.1.cache traverse: snap s -> 254 -8411> 2022-06-10T06:20:49.731+0000 7fa1e25f6700 12 mds.1.cache traverse: path seg depth 2 'a' snapid 254 -8410> 2022-06-10T06:20:49.731+0000 7fa1e25f6700 20 mds.1.cache.dir(0x100000013fe) lookup (a, '254') -8409> 2022-06-10T06:20:49.731+0000 7fa1e25f6700 20 mds.1.cache.dir(0x100000013fe) hit -> (a,254) -8408> 2022-06-10T06:20:49.731+0000 7fa1e25f6700 10 mds.1.cache path_traverse finish on snapid 254 -8407> 2022-06-10T06:20:49.731+0000 7fa1e25f6700 10 mds.1.server ref is [inode 0x200000007a2 [254,254] /client.0/tmp/a auth v210 s=66266488 n(v0 rc2022-06-10T06:20:49.653709+0000 b66266488 1=1+0) (iversion lock) | request=0 dirty=1 0x55b94e99fb80] -8406> 2022-06-10T06:20:49.731+0000 7fa1e25f6700 10 mds.1.cache.ino(0x200000007a2) auth_pin by 0x55b94e82f800 on [inode 0x200000007a2 [254,254] /client.0/tmp/a auth v210 ap=1 s=66266488 n(v0 rc2022-06-10T06:20:49.653709+0000 b66266488 1=1+0) (iversion lock) | request=0 dirty=1 authpin=1 0x55b94e99fb80] now 1
While for the failure one the size is s=65798144:
-4299> 2022-06-10T06:20:52.397+0000 7fa1e25f6700 7 mds.1.server dispatch_client_request client_request(client.4785:113134 lookup #0x100000013fe//s/a 2022-06-10T06:20:52.398271+0000 caller_uid=1000, caller_gid=1264{6,36,1000,1264,}) v6 -4298> 2022-06-10T06:20:52.397+0000 7fa1e25f6700 10 mds.1.server rdlock_path_pin_ref request(client.4785:113134 nref=2 cr=0x55b94f6fb600) #0x100000013fe//s/a -4297> 2022-06-10T06:20:52.397+0000 7fa1e25f6700 7 mds.1.cache traverse: opening base ino 0x100000013fe snap head -4296> 2022-06-10T06:20:52.397+0000 7fa1e25f6700 12 mds.1.cache traverse: path seg depth 0 '' snapid head -4295> 2022-06-10T06:20:52.397+0000 7fa1e25f6700 10 mds.1.cache traverse: snapdir -4294> 2022-06-10T06:20:52.397+0000 7fa1e25f6700 12 mds.1.cache traverse: path seg depth 1 's' snapid snapdir -4293> 2022-06-10T06:20:52.397+0000 7fa1e25f6700 10 mds.1.cache.snaprealm(0x100000013fe seq 600 0x55b94de61c00) resolve_snapname 's' in [0,head] -4292> 2022-06-10T06:20:52.397+0000 7fa1e25f6700 15 mds.1.cache.snaprealm(0x100000013fe seq 600 0x55b94de61c00) ? snap(258 0x100000013fe 's' 2022-06-10T06:20:52.220723+0000) -4291> 2022-06-10T06:20:52.397+0000 7fa1e25f6700 10 mds.1.cache traverse: snap s -> 258 -4290> 2022-06-10T06:20:52.397+0000 7fa1e25f6700 12 mds.1.cache traverse: path seg depth 2 'a' snapid 258 -4289> 2022-06-10T06:20:52.397+0000 7fa1e25f6700 20 mds.1.cache.dir(0x100000013fe) lookup (a, '258') -4288> 2022-06-10T06:20:52.397+0000 7fa1e25f6700 20 mds.1.cache.dir(0x100000013fe) hit -> (a,258) -4287> 2022-06-10T06:20:52.397+0000 7fa1e25f6700 10 mds.1.cache path_traverse finish on snapid 258 -4286> 2022-06-10T06:20:52.397+0000 7fa1e25f6700 10 mds.1.server ref is [inode 0x200000007a4 [258,258] /client.0/tmp/a auth v256 s=65798144 n(v0 rc2022-06-10T06:20:52.181418+0000 b65798144 1=1+0) (iversion lock) | ptrwaiter=0 request=0 lock=0 dirty=1 authpin=0 0x55b94f402680] -4285> 2022-06-10T06:20:52.397+0000 7fa1e25f6700 10 mds.1.cache.ino(0x200000007a4) auth_pin by 0x55b94e850c00 on [inode 0x200000007a4 [258,258] /client.0/tmp/a auth v256 ap=1 s=65798144 n(v0 rc2022-06-10T06:20:52.181418+0000 b65798144 1=1+0) (iversion lock) | ptrwaiter=0 request=0 lock=0 dirty=1 authpin=1 0x55b94f402680] now 1
Downloaded the file and list it, we can see the real size is 66266488:
# wget -q http://download.ceph.com/qa/linux-2.6.33.tar.bz2 # ll linux-2.6.33.tar.bz2 -rw-r--r--. 1 root root 66266488 Feb 25 2010 linux-2.6.33.tar.bz2
Updated by Xiubo Li almost 2 years ago
Xiubo Li wrote:
For the normal test case, which succeeded, we can see that when lookup the snapshot file #0x100000013fe//s/a, the size is s=66266488:
[...]
While for the failure one the size is s=65798144:
[...]
Downloaded the file and list it, we can see the real size is 66266488:
[...]
Checked more about the logs and when flushing the snap with follows 257, it didn't find any following snap flushsnap follows 257 -> snap head, and skipped updating it:
-5156> 2022-06-10T06:20:52.317+0000 7fa1e25f6700 1 -- [v2:172.21.15.120:6837/3555534286,v1:172.21.15.120:6839/3555534286] <== client.4785 192.168.0.1:0/3057055075 18173 ==== client_caps(flushsnap ino 0x200000007a4 0 seq 0 tid 26159 caps=pAsxLsXsxFsxcrwb dirty=Fw wanted=- follows 257 size 66266488/0 mtime 2022-06-10T06:20:52.183123+0000 xattrs(v=1 l=4)) v12 ==== 260+4+0 (crc 0 0 0) 0x55b94ee7d500 con 0x55b94db27c00 -5155> 2022-06-10T06:20:52.317+0000 7fa1e25f6700 7 mds.1.locker handle_client_caps on 0x200000007a4 tid 26159 follows 257 op flushsnap flags 0x0 -5154> 2022-06-10T06:20:52.317+0000 7fa1e25f6700 20 mds.1.73 get_session have 0x55b94db1c300 client.4785 192.168.0.1:0/3057055075 state open -5153> 2022-06-10T06:20:52.317+0000 7fa1e25f6700 10 mds.1.locker head inode [inode 0x200000007a4 [258,head] /client.0/tmp/a auth v252 DIRTYPARENT s=65798144 n(v0 rc2022-06-10T06:20:52.181418+0000 b65798144 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4785=0-134217728@257} caps={4785=pAsxLsXsxFsxcrwb/pAsxXsxFxcwb@10},l=4785 | ptrwaiter=0 request=0 lock=0 caps=1 dirtyparent=1 dirty=1 authpin=0 0x55b94f8f0680] -5152> 2022-06-10T06:20:52.317+0000 7fa1e25f6700 10 mds.1.cache.snaprealm(0x100000013fe seq 599 0x55b94de61c00) get_snaps (seq 257 cached_seq 257) -5151> 2022-06-10T06:20:52.317+0000 7fa1e25f6700 10 mds.1.locker flushsnap follows 257 -> snap head -5150> 2022-06-10T06:20:52.317+0000 7fa1e25f6700 7 mds.1.locker flushsnap snap head client.4785 on [inode 0x200000007a4 [258,head] /client.0/tmp/a auth v252 DIRTYPARENT s=65798144 n(v0 rc2022-06-10T06:20:52.181418+0000 b65798144 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4785=0-134217728@257} caps={4785=pAsxLsXsxFsxcrwb/pAsxXsxFxcwb@10},l=4785 | ptrwaiter=0 request=0 lock=0 caps=1 dirtyparent=1 dirty=1 authpin=0 0x55b94f8f0680] -5149> 2022-06-10T06:20:52.317+0000 7fa1e25f6700 10 mds.1.locker _do_snap_update dirty Fw follows 257 snap head on [inode 0x200000007a4 [258,head] /client.0/tmp/a auth v252 DIRTYPARENT s=65798144 n(v0 rc2022-06-10T06:20:52.181418+0000 b65798144 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4785=0-134217728@257} caps={4785=pAsxLsXsxFsxcrwb/pAsxXsxFxcwb@10},l=4785 | ptrwaiter=0 request=0 lock=0 caps=1 dirtyparent=1 dirty=1 authpin=0 0x55b94f8f0680] -5148> 2022-06-10T06:20:52.317+0000 7fa1e25f6700 10 mds.1.locker wow, the snap following 257 was already deleted. nothing to record, just ack. -5147> 2022-06-10T06:20:52.317+0000 7fa1e25f6700 10 mds.1.73 send_message_client_counted client.4785 seq 15737 client_caps(flushsnap_ack ino 0x200000007a4 0 seq 0 tid 26159 caps=- dirty=Fw wanted=- follows 257 size 0/0 mtime 0.000000) v12
And then just after that the snapid 258 updating request from mds.0 came:
-5143> 2022-06-10T06:20:52.319+0000 7fa1e25f6700 1 -- [v2:172.21.15.120:6837/3555534286,v1:172.21.15.120:6839/3555534286] <== mds.0 v2:172.21.15.131:6832/3629949482 2699 ==== snap_update(0x100000013fe table_tid 1212) v1 ==== 141+0+0 (crc 0 0 0) 0x55b94dbb3500 con 0x55b94dffe800 -5142> 2022-06-10T06:20:52.319+0000 7fa1e25f6700 10 mds.1.cache handle_snap_update snap_update(0x100000013fe table_tid 1212) v1 from mds.0 -5141> 2022-06-10T06:20:52.319+0000 7fa1e25f6700 10 mds.1.snapclient notify_commit tid 1212 -5140> 2022-06-10T06:20:52.320+0000 7fa1e25f6700 10 mds.1.cache.snaprealm(0x3 seq 1 0x55b94db18800) build_snap_set on snaprealm(0x3 seq 1 lc 0 cr 0 cps 1 snaps={} past_parent_snaps= 0x55b94db18800) -5139> 2022-06-10T06:20:52.320+0000 7fa1e25f6700 10 mds.1.cache.snaprealm(0x3 seq 1 0x55b94db18800) build_snap_trace my_snaps [258] -5138> 2022-06-10T06:20:52.320+0000 7fa1e25f6700 10 mds.1.cache.snaprealm(0x3 seq 1 0x55b94db18800) check_cache rebuilt 258 seq 258 cached_seq 258 cached_last_created 258 cached_last_destroyed 257) -5137> 2022-06-10T06:20:52.320+0000 7fa1e25f6700 10 mds.1.73 send_message_client_counted client.4785 seq 15738 client_snap(update split=0x3 tracelen=56) v1 -5136> 2022-06-10T06:20:52.320+0000 7fa1e25f6700 1 -- [v2:172.21.15.120:6837/3555534286,v1:172.21.15.120:6839/3555534286] --> 192.168.0.1:0/3057055075 -- client_snap(update split=0x3 tracelen=56) v1 -- 0x55b94e69cb60 con 0x55b94db27c00 -5135> 2022-06-10T06:20:52.320+0000 7fa1e25f6700 20 mds.1.cache.ino(0x100000013fe) decode_snap_blob snaprealm(0x100000013fe seq 258 lc 258 cr 258 cps 244 snaps={258=snap(258 0x100000013fe 's' 2022-06-10T06:20:52.220723+0000)} past_parent_snaps= 0x55b94de61c00) -5134> 2022-06-10T06:20:52.320+0000 7fa1e25f6700 10 mds.1.cache do_realm_invalidate_and_update_notify snaprealm(0x100000013fe seq 258 lc 258 cr 258 cps 244 snaps={258=snap(258 0x100000013fe 's' 2022-06-10T06:20:52.220723+0000)} past_parent_snaps= 0x55b94de61c00) [inode 0x100000013fe [...257,head] /client.0/tmp/ rep@0.1 v1756 snaprealm=0x55b94de61c00 f(v11 m2022-06-10T06:20:50.961607+0000 1=1+0) n(v1 rc2022-06-10T06:20:46.636121+0000 b66266488 rs1 2=1+1)/n(v0 1=0+1) (isnap lock) (inest mix dirty) (ifile mix dirty) (iversion lock) caps={4785=pAsLsXs/-@22} | dirtyscattered=2 request=0 lock=0 importing=0 dirfrag=1 caps=1 waiter=0 0x55b94f80ac00] -5133> 2022-06-10T06:20:52.320+0000 7fa1e25f6700 10 mds.1.cache realm snaprealm(0x100000013fe seq 258 lc 258 cr 258 cps 244 snaps={258=snap(258 0x100000013fe 's' 2022-06-10T06:20:52.220723+0000)} past_parent_snaps= 0x55b94de61c00) on [inode 0x100000013fe [...257,head] /client.0/tmp/ rep@0.1 v1756 snaprealm=0x55b94de61c00 f(v11 m2022-06-10T06:20:50.961607+0000 1=1+0) n(v1 rc2022-06-10T06:20:46.636121+0000 b66266488 rs1 2=1+1)/n(v0 1=0+1) (isnap lock) (inest mix dirty) (ifile mix dirty) (iversion lock) caps={4785=pAsLsXs/-@22} | dirtyscattered=2 request=0 lock=0 importing=0 dirfrag=1 caps=1 waiter=0 0x55b94f80ac00] -5132> 2022-06-10T06:20:52.320+0000 7fa1e25f6700 10 mds.1.cache.snaprealm(0x100000013fe seq 600 0x55b94de61c00) build_snap_set on snaprealm(0x100000013fe seq 258 lc 258 cr 258 cps 244 snaps={258=snap(258 0x100000013fe 's' 2022-06-10T06:20:52.220723+0000)} past_parent_snaps= 0x55b94de61c00) -5131> 2022-06-10T06:20:52.320+0000 7fa1e25f6700 10 mds.1.cache.snaprealm(0x100000013fe seq 600 0x55b94de61c00) build_snap_trace my_snaps [258] -5130> 2022-06-10T06:20:52.320+0000 7fa1e25f6700 10 mds.1.cache.snaprealm(0x100000013fe seq 600 0x55b94de61c00) check_cache rebuilt 258 seq 258 cached_seq 258 cached_last_created 258 cached_last_destroyed 257) -5129> 2022-06-10T06:20:52.320+0000 7fa1e25f6700 10 mds.1.cache 0x55b94de61c00 open_children are -5128> 2022-06-10T06:20:52.320+0000 7fa1e25f6700 10 mds.1.cache send_snaps -5127> 2022-06-10T06:20:52.320+0000 7fa1e25f6700 10 mds.1.cache client.4785 split 1099511632894 inos [] -5126> 2022-06-10T06:20:52.320+0000 7fa1e25f6700 10 mds.1.73 send_message_client_counted client.4785 seq 15739 client_snap(split split=0x100000013fe tracelen=104) v1 -5125> 2022-06-10T06:20:52.320+0000 7fa1e25f6700 1 -- [v2:172.21.15.120:6837/3555534286,v1:172.21.15.120:6839/3555534286] --> 192.168.0.1:0/3057055075 -- client_snap(split split=0x100000013fe tracelen=104) v1 -- 0x55b94d9c9860 con 0x55b94db27c00 -5124> 2022-06-10T06:20:52.320+0000 7fa1e25f6700 1 -- [v2:172.21.15.120:6837/3555534286,v1:172.21.15.120:6839/3555534286] <== mds.0 v2:172.21.15.131:6832/3629949482 2700 ==== lock(a=sync isnap 0x100000013fe.head) v1 ==== 240+0+0 (crc 0 0 0) 0x55b94e692540 con 0x55b94dffe800 -5123> 2022-06-10T06:20:52.320+0000 7fa1e25f6700 10 mds.1.locker handle_simple_lock lock(a=sync isnap 0x100000013fe.head) v1 on (isnap lock) [inode 0x100000013fe [...257,head] /client.0/tmp/ rep@0.1 v1756 snaprealm=0x55b94de61c00 f(v11 m2022-06-10T06:20:50.961607+0000 1=1+0) n(v1 rc2022-06-10T06:20:46.636121+0000 b66266488 rs1 2=1+1)/n(v0 1=0+1) (isnap lock) (inest mix dirty) (ifile mix dirty) (iversion lock) caps={4785=pAsLsXs/-@22} | dirtyscattered=2 request=0 lock=0 importing=0 dirfrag=1 caps=1 waiter=0 0x55b94f80ac00] -5122> 2022-06-10T06:20:52.320+0000 7fa1e25f6700 10 mds.1.cache.ino(0x100000013fe) decode_lock_state first 257 -> 258 -5121> 2022-06-10T06:20:52.320+0000 7fa1e25f6700 20 mds.1.cache.ino(0x100000013fe) decode_snap_blob snaprealm(0x100000013fe seq 258 lc 258 cr 258 cps 244 snaps={258=snap(258 0x100000013fe 's' 2022-06-10T06:20:52.220723+0000)} past_parent_snaps= 0x55b94de61c00)
There has sequence issue in MDS side. Will figure out it and generate a patch tomorrow.
Updated by Xiubo Li almost 2 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 46747
Updated by Patrick Donnelly 7 months ago
- Target version changed from v18.0.0 to v19.0.0
- Backport changed from quincy, pacific to reef,quincy