Project

General

Profile

Bug #41601

Updated by Kefu Chai over 4 years ago

In our test environment(ceph version 14.2.1(nautilus) + replicated pool), we found scrub error like bug23701. We use rados_write interface in librados.h to write (offset = 1024, length = 0), and after scrub the corresponding pg, we found a scrub error. The log in osd belows: 

 2019-09-01 20:41:21.393 7f767dc21700    0 log_channel(cluster) log [DBG] : 1.63 scrub starts 
 2019-09-01 20:41:21.402 7f767dc21700 -1 log_channel(cluster) log [ERR] : 1.63 shard 1 soid 1:c68704ec:::test_0001:head : candidate size 1024 info size 0 mismatch 
 2019-09-01 20:41:21.402 7f767dc21700 -1 log_channel(cluster) log [ERR] : 1.63 shard 0 soid 1:c68704ec:::test_0001:head : candidate size 1024 info size 0 mismatch 
 2019-09-01 20:41:21.402 7f767dc21700 -1 log_channel(cluster) log [ERR] : 1.63 shard 3 soid 1:c68704ec:::test_0001:head : candidate size 1024 info size 0 mismatch 
 2019-09-01 20:41:21.402 7f767dc21700 -1 log_channel(cluster) log [ERR] : 1.63 soid 1:c68704ec:::test_0001:head : failed to pick suitable object info 
 2019-09-01 20:41:21.402 7f767dc21700 -1 log_channel(cluster) log [ERR] : scrub 1.63 1:c68704ec:::test_0001:head : on disk size (1024) does not match object info size (0) adjusted for ondisk to (0) 
 2019-09-01 20:41:21.434 7f767dc21700 -1 log_channel(cluster) log [ERR] : 1.63 scrub 4 errors 

 After analysis the code, we found the operation CEPH_OSD_OP_WRITE will truncate object size to offset when length is 0. 
 <pre><code class="cpp"> 
	 if (op.extent.length == 0) { 
	   if (op.extent.offset > oi.size) { 
	     t->truncate( 
	       soid, op.extent.offset); 
	   } else { 
	     t->nop(soid); 
	   } 
	 } else { 
	   t->write( 
	     soid, op.extent.offset, op.extent.length, osd_op.indata, op.flags); 
	 } 
 </code></pre> 
 but it doesn't update the oi.size later because the length is 0 


 <pre><code class="cpp"> 
   if (write_full || 
       (offset + length > oi.size && length)) { 
     uint64_t new_size = offset + length; 
     delta_stats.num_bytes -= oi.size; 
     delta_stats.num_bytes += new_size; 
     oi.size = new_size; 
   } 
 </code></pre> 

 Moreover, we think it has the same bug when old write(e.g offset=1024, length=4096) arrived after trimtrunc(e.g truncate_seq = 2, truncate_size=0) in cephfs. 
 

Back