Bug #41601
Updated by Kefu Chai over 4 years ago
In our test environment(ceph version 14.2.1(nautilus) + replicated pool), we found scrub error like bug23701. We use rados_write interface in librados.h to write (offset = 1024, length = 0), and after scrub the corresponding pg, we found a scrub error. The log in osd belows: 2019-09-01 20:41:21.393 7f767dc21700 0 log_channel(cluster) log [DBG] : 1.63 scrub starts 2019-09-01 20:41:21.402 7f767dc21700 -1 log_channel(cluster) log [ERR] : 1.63 shard 1 soid 1:c68704ec:::test_0001:head : candidate size 1024 info size 0 mismatch 2019-09-01 20:41:21.402 7f767dc21700 -1 log_channel(cluster) log [ERR] : 1.63 shard 0 soid 1:c68704ec:::test_0001:head : candidate size 1024 info size 0 mismatch 2019-09-01 20:41:21.402 7f767dc21700 -1 log_channel(cluster) log [ERR] : 1.63 shard 3 soid 1:c68704ec:::test_0001:head : candidate size 1024 info size 0 mismatch 2019-09-01 20:41:21.402 7f767dc21700 -1 log_channel(cluster) log [ERR] : 1.63 soid 1:c68704ec:::test_0001:head : failed to pick suitable object info 2019-09-01 20:41:21.402 7f767dc21700 -1 log_channel(cluster) log [ERR] : scrub 1.63 1:c68704ec:::test_0001:head : on disk size (1024) does not match object info size (0) adjusted for ondisk to (0) 2019-09-01 20:41:21.434 7f767dc21700 -1 log_channel(cluster) log [ERR] : 1.63 scrub 4 errors After analysis the code, we found the operation CEPH_OSD_OP_WRITE will truncate object size to offset when length is 0. <pre><code class="cpp"> if (op.extent.length == 0) { if (op.extent.offset > oi.size) { t->truncate( soid, op.extent.offset); } else { t->nop(soid); } } else { t->write( soid, op.extent.offset, op.extent.length, osd_op.indata, op.flags); } </code></pre> but it doesn't update the oi.size later because the length is 0 <pre><code class="cpp"> if (write_full || (offset + length > oi.size && length)) { uint64_t new_size = offset + length; delta_stats.num_bytes -= oi.size; delta_stats.num_bytes += new_size; oi.size = new_size; } </code></pre> Moreover, we think it has the same bug when old write(e.g offset=1024, length=4096) arrived after trimtrunc(e.g truncate_seq = 2, truncate_size=0) in cephfs.