Bug #53240: full-object read crc is mismatch, because truncate modify oi.size and forget to clear data_digest - RADOS - Ceph

Actions

Copy link

Bug #53240

open

full-object read crc is mismatch, because truncate modify oi.size and forget to clear data_digest

Added by wendong jia over 2 years ago. Updated 2 days ago.

Status:

Fix Under Review

Priority:

High

Assignee:

Nitzan Mordechai

Category:

Target version:

Ceph - v19.0.0

% Done:

100%

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

55008

Crash signature (v1):

Crash signature (v2):

Description

I use 100 threads to dd on multiple files under the directory, so the same file can be truncated at any time.
When dd is completed, md5sum will be called to calculate the crc, and then Loop this operation .

I encountered such a situation :
osd.log:
2021-10-14 00:13:24.700 7f592ef32700 1 -- [v2:10.61.152.7:6808/412130,v1:10.61.152.7:6809/412130] --> 10.61.152.108:0/1245157660 -- osd_op_reply(15860425 10000178110.00000006 [write 3932160~0 [25@0]] v114962'126357 uv126357 ondisk = 0) v8 -- 0x7f598eebe780 con 0x7f5984f9ec00
2021-10-14 00:13:24.700 7f592ef32700 1 -- [v2:10.61.152.7:6808/412130,v1:10.61.152.7:6809/412130] --> 10.61.152.108:0/1245157660 -- osd_op_reply(15860680 10000178110.00000006 [write 4063232~0 [25@0]] v114962'126358 uv126358 ondisk = 0) v8 -- 0x7f596efcec80 con 0x7f5984f9ec00
2021-10-14 00:13:43.302 7f592ef32700 1 -- [v2:192.168.75.3:6808/412130,v1:192.168.75.3:6809/412130] --> [v2:192.168.75.2:6808/24047,v1:192.168.75.2:6809/24047] -- MOSDECSubOpRead(16.2ads1 114962/114951 ECSubRead(tid=6751702, to_read={16:b555bed4:::10000178110.00000006:head=0,2031616,0}, subchunks={16:b555bed4:::10000178110.00000006:head=[0,1]}, attrs_to_read=)) v3 -- 0x7f596af31b80 con 0x7f596ba5b000
2021-10-14 00:13:43.302 7f592ef32700 1 -- [v2:192.168.75.3:6808/412130,v1:192.168.75.3:6809/412130] --> [v2:192.168.75.3:6808/412130,v1:192.168.75.3:6809/412130] -- MOSDECSubOpRead(16.2ads0 114962/114951 ECSubRead(tid=6751702, to_read={16:b555bed4:::10000178110.00000006:head=0,2031616,0}, subchunks={16:b555bed4:::10000178110.00000006:head=[0,1]}, attrs_to_read=)) v3 -- 0x7f596cbcd180 con 0x7f5964bb9800
2021-10-14 00:13:43.302 7f5936f42700 1 -- [v2:192.168.75.3:6808/412130,v1:192.168.75.3:6809/412130] <== osd.13 v2:192.168.75.3:6808/412130 0 ==== MOSDECSubOpRead(16.2ads0 114962/114951 ECSubRead(tid=6751702, to_read={16:b555bed4:::10000178110.00000006:head=0,2031616,0}, subchunks={16:b555bed4:::10000178110.00000006:head=[0,1]}, attrs_to_read=)) v3 ==== 0+0+0 (unknown 0 0 0) 0x7f596cbcd180 con 0x7f5964bb9800
2021-10-14 00:13:43.597 7f592ef32700 1 log_channel(cluster) log [ERR] : full-object read crc 0x29d98db8 != expected 0xffffffff on 16:b555bed4:::10000178110.00000006:head
2021-10-14 00:13:43.597 7f592ef32700 1 - [v2:10.61.152.7:6808/412130,v1:10.61.152.7:6809/412130] --> 10.61.152.108:0/1245157660 -- osd_op_reply(15913661 10000178110.00000006 [read 0~0 [26@0]] v0'0 uv0 ondisk = ~~5 ((5) Input/output error)) v8 -~~ 0x7f596c20a000 con 0x7f5984f9ec00
2021-10-14 00:14:11.948 7f592ef32700 1 -- [v2:192.168.75.3:6808/412130,v1:192.168.75.3:6809/412130] --> [v2:192.168.75.2:6808/24047,v1:192.168.75.2:6809/24047] -- MOSDECSubOpRead(16.2ads1 114962/114951 ECSubRead(tid=6753957, to_read={16:b555bed4:::10000178110.00000006:head=0,2031616,0}, subchunks={16:b555bed4:::10000178110.00000006:head=[0,1]}, attrs_to_read=)) v3 -- 0x7f597358bb80 con 0x7f596ba5b000
2021-10-14 00:14:11.948 7f592ef32700 1 -- [v2:192.168.75.3:6808/412130,v1:192.168.75.3:6809/412130] --> [v2:192.168.75.3:6808/412130,v1:192.168.75.3:6809/412130] -- MOSDECSubOpRead(16.2ads0 114962/114951 ECSubRead(tid=6753957, to_read={16:b555bed4:::10000178110.00000006:head=0,2031616,0}, subchunks={16:b555bed4:::10000178110.00000006:head=[0,1]}, attrs_to_read=)) v3 -- 0x7f5973adc780 con 0x7f5964bb9800
2021-10-14 00:14:11.948 7f5936f42700 1 -- [v2:192.168.75.3:6808/412130,v1:192.168.75.3:6809/412130] <== osd.13 v2:192.168.75.3:6808/412130 0 ==== MOSDECSubOpRead(16.2ads0 114962/114951 ECSubRead(tid=6753957, to_read={16:b555bed4:::10000178110.00000006:head=0,2031616,0}, subchunks={16:b555bed4:::10000178110.00000006:head=[0,1]}, attrs_to_read=)) v3 ==== 0+0+0 (unknown 0 0 0) 0x7f5973adc780 con 0x7f5964bb9800
2021-10-14 00:14:12.131 7f592ef32700 1 log_channel(cluster) log [ERR] : full-object read crc 0x29d98db8 != expected 0xffffffff on 16:b555bed4:::10000178110.00000006:head
2021-10-14 00:14:12.131 7f592ef32700 1 - [v2:10.61.152.7:6808/412130,v1:10.61.152.7:6809/412130] --> 10.61.152.108:0/1245157660 -- osd_op_reply(16002738 10000178110.00000006 [read 0~0 [26@0]] v0'0 uv0 ondisk = ~~5 ((5) Input/output error)) v8 -~~ 0x7f59833e1900 con 0x7f5984f9ec00
2021-10-14 00:14:21.565 7f592ef32700 1 -- [v2:10.61.152.7:6808/412130,v1:10.61.152.7:6809/412130] --> 10.61.152.108:0/1245157660 -- osd_op_reply(16026297 10000178110.00000006 [write 0~131072 [26@0]] v114962'126480 uv126480 ondisk = 0) v8 -- 0x7f598435d900 con 0x7f5984f9ec00

object infO: {
"id": {
"oid": "10000178110.00000006",
"key": "",
"snapid": -2,
"hash": 729655981,
"max": 0,
"pool": 16,
"namespace": "",
"shard_id": 0,
"max": 0
},
"info": {
"oid": {
"oid": "10000178110.00000006",
"key": "",
"snapid": -2,
"hash": 729655981,
"max": 0,
"pool": 16,
"namespace": ""
},
"version": "114962'126498",
"prior_version": "114962'126497",
"last_reqid": "client.60778083.0:16028694",
"user_version": 126498,
"size": 4063232,
"mtime": "2021-10-14 00:13:32.645230",
"local_mtime": "2021-10-14 00:14:21.914049",
"lost": 0,
"flags": [
"dirty"
],
"truncate_seq": 26,
"truncate_size": 0,
*"data_digest": "0xffffffff",
"omap_digest": "0xffffffff",
"expected_object_size": 0,
"expected_write_size": 0,
"alloc_hint_flags": 0,
"manifest": {
"type": 0
},
"watchers": {}
},
"stat": {
"size": 2031616,
"blksize": 4096,
"blocks": 496,
"nlink": 1
},
"SnapSet": {
"snap_context": {
"seq": 1,
"snaps": []
},
"clones": []
},
"hinfo": {
"total_chunk_size": 2031616,
"cumulative_shard_hashes": []
}
}

Actions

Copy link

Updated by wendong jia over 2 years ago

The object oi.size should be 4194304, but it is actually 4063232.
The object data_digest is 0xffffffff, but read crc is 0x29d98db8.

-----------------------------------------------------------------
in this case:
client write 0~4M, osd received multiple ops [0~128k], [131072~128k], [262144~128k], ..., [4063232~128k]
when write complete, oi.size should be 4194304.
but other truncate will modify op's length and truncate oi.size.

Consider two abnormal cases:
1. write after truncate, need to do truncate first and then do write.
2. write before truncate, need to trim length first and then do write.]

-----------------------------------------------------------------
in case2:
if truncate is to 0,
write is [0~128k]
trim length from 128k to 0, write change to [0~0], do nothing, oi.size is 0, x1 = set_data_digest(crc32(-1)).
write is [128k~128k]
trim length from 128k to 0, write change to [128k~0], truncate oi.size to offset 128k, x2 = set_data_digest(crc32(x1)).
write is [256k~128k]
trim length from 128k to 0, write change to [256k~0], truncate oi.size to offset 256k, x3 = set_data_digest(crc32(x2)).
...
write is [4063232~128k]
trim length from 128k to 0, write change to [4063232~0], truncate oi.size to offset 4063232, xn = set_data_digest(crs32(xn-1))

Now, we can see oi.size is 4063232, and data_digest is 0xffffffff, because thelength of in_data of crc is 0 every time.

---------------------------------------------------------------------
in conclusion, There are two logical problems that need to be fixed:
1. when truncate in write, need clear data_digest and DIGEST flag
2. when write before truncate, need to trim length, when offset over than oi.size, don't truncate oi.size to offset.

Actions

Copy link

Updated by wendong jia over 2 years ago

https://github.com/ceph/ceph/pull/43902

Actions

Copy link

Updated by wendong jia over 2 years ago

my ceph version is nautilous 14.2.5

Actions

Copy link

Updated by Neha Ojha over 2 years ago

Project changed from Ceph to RADOS
Category deleted (~~OSD~~)
Status changed from New to Fix Under Review
Pull request ID set to 43902

Actions

Copy link

Updated by Konstantin Shalygin about 2 years ago

Assignee set to wendong jia

Actions

Copy link

Updated by Yuxuan Hu 5 months ago

Our cluster seems to have encountered a similar issue to yours. We haven't yet identified the root cause. I'm wondering if it might be due to the same reason you mentioned. Could you please help me check if it's the same issue as yours?
The issue with our cluster occurred like this: one of the osd.32 committed suicide due to timeout caused by the excessive pressure of data deletion. Subsequently, an object in pg 2.ac7 encountered an 'object unfound' situation.
------------------------------
118078 2023-11-20 12:07:50.149172 osd.44 (osd.44) 88770 : cluster [ERR] 2.ac7 missing primary copy of 2:e357d41e:::200516ce993.0000002d:head, will try copies on 20
118079 2023-11-20 12:07:51.101958 osd.44 (osd.44) 88771 : cluster [ERR] 2.ac7 has 1 objects unfound and apparently lost
-------------------------------

After comparing the logs of the three OSDs at the time of the failure, I found that osd.32 had not yet recovered from its suicide at the time of the failure. The primary osd, osd.44, first carried out a full-object read CRC check and discovered that the locally calculated checksum value (0xebe9ec95) in the object header did not match the expected checksum value (0xffffffff). It then planned to retrieve this object data from osd.20 for recovery.
------------------
1697 2023-11-20 12:07:50.147 7f64322e1700 0 osd.44 pg_epoch: 47175 pg[2.ac7( v 47163'155742025 (47035'155738947,47163'155742025] local-lis/les=47041/47042 n=6326 ec=261/261 lis/c 47041/46703 les/c/f 47042/46704/ 0 47040/47041/36944) [44,20,32]/[44,20] async=[32] r=0 lpr=47041 pi=[46703,47041)/1 rops=1 crt=47163'155742025 lcod 47163'155742024 mlcod 47039'155739869 active+recovering+undersized+degraded+remapped mbc={2 55={(2+0)=1}}] build_push_op 2.ac7_head full-object read crc 0xebe9ec95 != expected 0xffffffff on 2:e357d41e:::200516ce993.0000002d:head
1698 2023-11-20 12:07:50.148 7f64322e1700 0 osd.44 pg_epoch: 47175 pg[2.ac7( v 47163'155742025 (47035'155738947,47163'155742025] local-lis/les=47041/47042 n=6326 ec=261/261 lis/c 47041/46703 les/c/f 47042/46704/ 0 47040/47041/36944) [44,20,32]/[44,20] async=[32] r=0 lpr=47041 pi=[46703,47041)/1 rops=1 crt=47163'155742025 lcod 47163'155742024 mlcod 47039'155739869 active+recovering+undersized+degraded+remapped mbc={2 55={(2+0)=1}}] prep_object_replica_pushes Error 5 on oid 2:e357d41e:::200516ce993.0000002d:head
1699 2023-11-20 12:07:50.148 7f64322e1700 0 osd.44 pg_epoch: 47175 pg[2.ac7( v 47163'155742025 (47035'155738947,47163'155742025] local-lis/les=47041/47042 n=6326 ec=261/261 lis/c 47041/46703 les/c/f 47042/46704/ 0 47040/47041/36944) [44,20,32]/[44,20] async=[32] r=0 lpr=47041 pi=[46703,47041)/1 rops=1 crt=47163'155742025 lcod 47163'155742024 mlcod 47039'155739869 active+recovering+undersized+degraded+remapped mbc={2 55={(1+0)=1}}] failed_push 2:e357d41e:::200516ce993.0000002d:head from shard 44, reps on 20 unfound? 0
1700 2023-11-20 12:07:50.148 7f64322e1700 -1 log_channel(cluster) log [ERR] : 2.ac7 missing primary copy of 2:e357d41e:::200516ce993.0000002d:head, will try copies on 20
1701 2023-11-20 12:07:50.157 7f64322e1700 0 osd.44 pg_epoch: 47175 pg[2.ac7( v 47163'155742025 (47035'155738947,47163'155742025] local-lis/les=47041/47042 n=6326 ec=261/261 lis/c 47041/46703 les/c/f 47042/46704/ 0 47040/47041/36944) [44,20,32]/[44,20] async=[32] r=0 lpr=47041 pi=[46703,47041)/1 rops=1 crt=47163'155742025 lcod 47163'155742024 mlcod 47039'155739869 active+recovering+undersized+degraded+remapped m=1 u= 1 mbc={255={}}] failed_push adding 2:e357d41e:::200516ce993.0000002d:head to shard 20's missing set too
1702 2023-11-20 12:07:50.157 7f64322e1700 0 osd.44 pg_epoch: 47175 pg[2.ac7( v 47163'155742025 (47035'155738947,47163'155742025] local-lis/les=47041/47042 n=6326 ec=261/261 lis/c 47041/46703 les/c/f 47042/46704/ 0 47040/47041/36944) [44,20,32]/[44,20] async=[32] r=0 lpr=47041 pi=[46703,47041)/1 rops=1 crt=47163'155742025 lcod 47163'155742024 mlcod 47039'155739869 active+recovering+undersized+degraded+remapped m=1 u= 1 mbc={255={}}] failed_push 2:e357d41e:::200516ce993.0000002d:head from shard 20, reps on unfound? 1
1703 2023-11-20 12:07:51.101 7f642e2d9700 -1 log_channel(cluster) log [ERR] : 2.ac7 has 1 objects unfound and apparently lost
---------------------------

"After receiving the request, osd.20 also performed a full-object read. Upon reading, it conducted a CRC check and found that the locally calculated checksum value (0xebe9ec95) did not match the expected checksum value (0xffffffff), resulting in an 'object unfound' situation.
------------------------
2389 2023-11-20 12:07:50.148 7fe95d162700 0 osd.20 pg_epoch: 47175 pg[2.ac7( v 47163'155742025 (47035'155738947,47163'155742025] local-lis/les=47041/47042 n=6326 ec=261/261 lis/c 47041/46703 les/c/f 47042/46704/ 0 47040/47041/36944) [44,20,32]/[44,20] r=1 lpr=47041 pi=[46703,47041)/1 luod=0'0 lua=47039'155740414 crt=47163'155742025 lcod 47163'155742024 active+remapped mbc={}] build_push_op 2.ac7_head full-object rea d crc 0xebe9ec95 != expected 0xffffffff on 2:e357d41e:::200516ce993.0000002d:head
2390 2023-11-20 12:15:58.584 7fe95d162700 0 log_channel(cluster) log [DBG] : 2.9a7 deep-scrub starts
2391 2023-11-20 12:17:12.192 7fe95d162700 0 log_channel(cluster) log [DBG] : 2.9a7 deep-scrub ok
2392 2023-11-20 12:17:44.167 7fe95d162700 0 osd.20 pg_epoch: 47312 pg[2.ac7( v 47310'155747427 (47310'155744347,47310'155747427] local-lis/les=47311/47312 n=6330 ec=261/261 lis/c 47311/46703 les/c/f 47312/46704/ 0 47311/47311/36944) [44,20] r=1 lpr=47311 pi=[46703,47311)/1 luod=0'0 crt=47310'155747427 lcod 47310'155747426 active mbc={}] build_push_op 2.ac7_head full-object read crc 0xebe9ec95 != expected 0xffffffff on 2:e357d41e:::200516ce993.0000002d:head
------------------------

I found that the CRC values calculated from the reads by both OSDs are identical. However, the expected checksum value, oi.data_digest, seems to have experienced an anomaly, resulting in the unusual value of 0xffffffff. This situation appears to be very similar to the issue you described. Do you have any idea why oi.data_digest would show this strange value of 0xffffffff?

Actions

Copy link

Updated by Radoslaw Zarzynski 4 months ago

bump up

Actions

Copy link

Updated by Radoslaw Zarzynski 4 months ago

Assignee changed from wendong jia to Nitzan Mordechai

Hi Nitzan! Would mind taking a look on the expired PR? I think we could resurrect it, possibly bring a replicator into unittest or teuthology.
BTW, similar issue might be happening in crimson, so you might want to ask Matan.

Actions

Copy link