Project

General

Profile

Backport #19508

Updated by Nathan Cutler about 7 years ago

https://github.com/ceph/ceph/pull/14392 Steps to reproduce: 

 
 # Deploy a test cluster using Ceph 0.94.6: 3 OSDs, 3 monitors 
 # Make a test load (create a rbd image, run fio -ioengine=rbd) 
 # Perform the upgrade: 
    a. ceph osd set noout 
    b. Pick an OSD node, shut down some OSD daemon, upgrade ceph packages, restart the OSD 
    c. wait until all placement groups are active+clean 

 Result: after the upgraded OSD starts it requests the OSD map, fails to decode the incremental map, 
 and requests the complete map: 

 <pre> 
 2017-04-06 07:19:15.489229 7f0e7a4e0800    0 set uid:gid to 64045:64045 (ceph:ceph) 
 2017-04-06 07:19:15.489261 7f0e7a4e0800    0 ceph version 10.2.6-1~u14.04+1 (8a5b25e3b370b6abf610579a315471958813e33e), process ceph-osd, pid 9126 
 </pre> 

 [skipped] 

 <pre> 
 2017-04-06 07:19:32.642988 7f0e7a4e0800    0 osd.1 22 using 0 op queue with priority op cut off at 64. 
 2017-04-06 07:19:32.643627 7f0e7a4e0800 -1 osd.1 22 log_to_monitors {default=true} 
 2017-04-06 07:19:32.770925 7f0e7a4e0800    0 osd.1 22 done with init, starting boot process 
 2017-04-06 07:19:33.749922 7f0e547ff700    0 log_channel(cluster) log [WRN] : failed to encode map e23 with expected crc 
 2017-04-06 07:19:33.750052 7f0e547ff700    0 log_channel(cluster) log [WRN] : failed to encode map e23 with expected crc 
 2017-04-06 07:19:34.756327 7f0e52ffc700    0 log_channel(cluster) log [WRN] : failed to encode map e26 with expected crc 
 2017-04-06 07:19:34.759619 7f0e52ffc700    0 log_channel(cluster) log [WRN] : failed to encode map e26 with expected crc 
 2017-04-06 07:19:34.761147 7f0e547ff700    0 log_channel(cluster) log [WRN] : failed to encode map e26 with expected crc 
 2017-04-06 07:19:34.761200 7f0e547ff700    0 log_channel(cluster) log [WRN] : failed to encode map e26 with expected crc 
 </pre> 


 In a cluster with many (>~ 100) OSDs sending that many complete maps can easily overload monitors 

Back