Bug #13828
Updated by Kefu Chai over 8 years ago
Our cluster is made up of 4 nodes, which consist of 10 osds each. The topology of the cluster is illustrated as below: <pre> [root@ceph242 ceph]# ceph osd tree # id weight type name up/down reweight -1 66.43 root default -4 10.92 host ceph0 6 0.91 osd.6 up 1 7 0.91 osd.7 up 1 8 0.91 osd.8 up 1 9 0.91 osd.9 up 1 10 0.91 osd.10 up 1 19 0.91 osd.19 up 1 20 0.91 osd.20 up 1 21 0.91 osd.21 up 1 22 0.91 osd.22 up 1 23 0.91 osd.23 up 1 24 1.82 osd.24 up 1 -5 19.11 host ceph4 46 1.82 osd.46 up 1 48 1.82 osd.48 up 1 44 1.82 osd.44 up 1 43 1.82 osd.43 up 1 45 1.82 osd.45 up 1 3 0.91 osd.3 up 1 49 1.82 osd.49 up 1 4 1.82 osd.4 up 1 0 1.82 osd.0 up 1 2 1.82 osd.2 up 1 1 1.82 osd.1 up 1 -6 18.2 host ceph243 28 1.82 osd.28 up 1 15 1.82 osd.15 up 1 17 1.82 osd.17 up 1 11 1.82 osd.11 up 1 12 1.82 osd.12 up 1 16 1.82 osd.16 up 1 14 1.82 osd.14 up 1 13 1.82 osd.13 up 1 18 1.82 osd.18 up 1 32 1.82 osd.32 up 1 -2 18.2 host ceph242 66 1.82 osd.66 up 1 60 1.82 osd.60 up 1 31 1.82 osd.31 up 1 30 1.82 osd.30 up 1 47 1.82 osd.47 up 1 29 1.82 osd.29 up 1 25 1.82 osd.25 up 1 26 1.82 osd.26 up 1 5 1.82 osd.5 up 1 27 1.82 osd.27 up 1 </pre> When I occasionally isolate one of node(namely ceph242) from the rest of the cluster by cutting off its backend network connection, the cluster finally goes into a stable status after some transient jitter. However, the result is somewhat surprising and probably problematic, as you can see below: <pre> [root@ceph0 minion]# ceph osd tree # id weight type name up/down reweight -1 66.43 root default -4 10.92 host ceph0 6 0.91 osd.6 down 0 7 0.91 osd.7 down 0 8 0.91 osd.8 down 0 9 0.91 osd.9 down 0 10 0.91 osd.10 down 0 19 0.91 osd.19 down 0 20 0.91 osd.20 down 0 21 0.91 osd.21 down 0 22 0.91 osd.22 down 0 23 0.91 osd.23 down 0 24 1.82 osd.24 down 0 -5 19.11 host ceph4 46 1.82 osd.46 down 0 48 1.82 osd.48 down 0 44 1.82 osd.44 down 0 43 1.82 osd.43 down 0 45 1.82 osd.45 down 0 3 0.91 osd.3 down 0 49 1.82 osd.49 down 1 4 1.82 osd.4 down 0 0 1.82 osd.0 down 0 2 1.82 osd.2 down 0 1 1.82 osd.1 down 0 -6 18.2 host ceph243 28 1.82 osd.28 down 1 15 1.82 osd.15 down 0 17 1.82 osd.17 down 0 11 1.82 osd.11 down 0 12 1.82 osd.12 down 0 16 1.82 osd.16 down 0 14 1.82 osd.14 down 0 13 1.82 osd.13 down 0 18 1.82 osd.18 down 0 32 1.82 osd.32 down 0 -2 18.2 host ceph242 66 1.82 osd.66 up 1 60 1.82 osd.60 up 1 31 1.82 osd.31 up 1 30 1.82 osd.30 up 1 47 1.82 osd.47 up 1 29 1.82 osd.29 up 1 25 1.82 osd.25 up 1 26 1.82 osd.26 up 1 5 1.82 osd.5 up 1 27 1.82 osd.27 up 1 </pre> All the osds located on the isoldated node(ceph242) successfully survived at last while the rest unexpectedly died instead, which therefore caused the whole cluster to be totally unaccessible. I guess the above problem may be caused by the possibly problematical code logic of OSD::_is_healthy() but I am still not certain of this since this problem can be hardly reproduced after a handful of unsuccessful tries in our environment.