Ilya Dryomov wrote in #note-6:
Nir Soffer wrote:
Restarting the ceph-mgr pod does not help, rbd-mirroring is broken and
we don't have any workaround.
Hi Nir,
I don't think this is related to RBD mirroring. From the manager log, it's clear that MirrorSnapshotScheduleHandler in rbd_support module continued running after that exception was thrown in devicehealth module:
[...]
Have you tried restarting rbd-mirror pod like you did in https://tracker.ceph.com/issues/65487? Could the RBD mirroring part of this ticket be a duplicate of that?
I probably tried to rbd-mirror, but this is not the same issue. In this case the
cephrbdmirror never becomes Ready - it never reports any status.
In https://tracker.ceph.com/issues/65487 the cephrbdmirror becomes Ready and everyhing
looks normal, but the cephblockpool .status.mirroringStatus.summary.deamon_health never
becomes OK.
In our test log you can see:
cephcluster is created successfully:
2024-04-09 16:02:40,569 DEBUG [dr2/0] Deploying rook ceph cluster
2024-04-09 16:02:41,236 DEBUG [dr2/0] cephblockpool.ceph.rook.io/builtin-mgr created
2024-04-09 16:02:41,250 DEBUG [dr2/0] cephcluster.ceph.rook.io/my-cluster created
2024-04-09 16:02:41,253 DEBUG [dr2/0] Waiting until rook ceph cluster is ready
2024-04-09 16:02:41,318 DEBUG [dr2/0] 'cephcluster/my-cluster' output='jsonpath={.status.phase}' found in 0.06 seconds
2024-04-09 16:05:07,151 DEBUG [dr2/0] cephcluster.ceph.rook.io/my-cluster condition met
2024-04-09 16:05:07,234 DEBUG [dr2/0] ceph cluster status:
2024-04-09 16:05:07,234 DEBUG [dr2/0] conditions:
2024-04-09 16:05:07,234 DEBUG [dr2/0] - lastHeartbeatTime: '2024-04-09T13:05:07Z'
2024-04-09 16:05:07,234 DEBUG [dr2/0] lastTransitionTime: '2024-04-09T13:05:07Z'
2024-04-09 16:05:07,234 DEBUG [dr2/0] message: Cluster created successfully
2024-04-09 16:05:07,234 DEBUG [dr2/0] reason: ClusterCreated
2024-04-09 16:05:07,234 DEBUG [dr2/0] status: 'True'
2024-04-09 16:05:07,234 DEBUG [dr2/0] type: Ready
2024-04-09 16:05:07,235 DEBUG [dr2/0] message: Cluster created successfully
2024-04-09 16:05:07,235 DEBUG [dr2/0] observedGeneration: 2
2024-04-09 16:05:07,235 DEBUG [dr2/0] phase: Ready
2024-04-09 16:05:07,235 DEBUG [dr2/0] state: Created
2024-04-09 16:05:07,235 DEBUG [dr2/0] storage:
2024-04-09 16:05:07,235 DEBUG [dr2/0] osd:
2024-04-09 16:05:07,235 DEBUG [dr2/0] storeType:
2024-04-09 16:05:07,235 DEBUG [dr2/0] bluestore: 1
2024-04-09 16:05:07,235 DEBUG [dr2/0] version:
2024-04-09 16:05:07,235 DEBUG [dr2/0] image: quay.io/ceph/ceph:v18
2024-04-09 16:05:07,235 DEBUG [dr2/0] version: 18.2.2-0
ceph status after cluster is created:
2024-04-09 16:05:10,870 DEBUG [dr2/0] deployment "rook-ceph-tools" successfully rolled out
2024-04-09 16:05:10,872 DEBUG [dr2/0] ceph status:
2024-04-09 16:05:11,683 DEBUG [dr2/0] cluster:
2024-04-09 16:05:11,684 DEBUG [dr2/0] id: 2aeb5e7d-352f-4227-9c99-6d2d422751f9
2024-04-09 16:05:11,684 DEBUG [dr2/0] health: HEALTH_OK
2024-04-09 16:05:11,684 DEBUG [dr2/0]
2024-04-09 16:05:11,684 DEBUG [dr2/0] services:
2024-04-09 16:05:11,684 DEBUG [dr2/0] mon: 1 daemons, quorum a (age 42s)
2024-04-09 16:05:11,684 DEBUG [dr2/0] mgr: a(active, since 3s)
2024-04-09 16:05:11,684 DEBUG [dr2/0] osd: 1 osds: 0 up, 1 in (since 10s)
2024-04-09 16:05:11,684 DEBUG [dr2/0]
2024-04-09 16:05:11,684 DEBUG [dr2/0] data:
2024-04-09 16:05:11,684 DEBUG [dr2/0] pools: 0 pools, 0 pgs
2024-04-09 16:05:11,684 DEBUG [dr2/0] objects: 0 objects, 0 B
2024-04-09 16:05:11,684 DEBUG [dr2/0] usage: 0 B used, 0 B / 0 B avail
2024-04-09 16:05:11,684 DEBUG [dr2/0] pgs:
cephblockpool created successfully:
2024-04-09 16:05:11,827 DEBUG [dr2/0] Creating rbd pool and storage class
2024-04-09 16:05:12,020 DEBUG [dr2/0] cephblockpool.ceph.rook.io/replicapool created
2024-04-09 16:05:12,033 DEBUG [dr2/0] storageclass.storage.k8s.io/rook-ceph-block created
2024-04-09 16:05:12,036 DEBUG [dr2/0] Waiting until ceph block pool is ready
2024-04-09 16:05:16,353 DEBUG [dr2/0] 'cephblockpool/replicapool' output='jsonpath={.status.phase}' found in 4.32 seconds
2024-04-09 16:05:41,886 DEBUG [dr2/0] cephblockpool.ceph.rook.io/replicapool condition met
2024-04-09 16:05:41,888 DEBUG [dr2/0] Waiting for replica pool peer token
2024-04-09 16:05:42,048 DEBUG [dr2/0] cephblockpool.ceph.rook.io/replicapool condition met
2024-04-09 16:05:42,127 DEBUG [dr2/0] ceph pool status:
2024-04-09 16:05:42,127 DEBUG [dr2/0] info:
2024-04-09 16:05:42,127 DEBUG [dr2/0] rbdMirrorBootstrapPeerSecretName: pool-peer-token-replicapool
2024-04-09 16:05:42,127 DEBUG [dr2/0] mirroringInfo:
2024-04-09 16:05:42,127 DEBUG [dr2/0] lastChecked: '2024-04-09T13:05:42Z'
2024-04-09 16:05:42,127 DEBUG [dr2/0] mode: image
2024-04-09 16:05:42,127 DEBUG [dr2/0] site_name: 2aeb5e7d-352f-4227-9c99-6d2d422751f9
2024-04-09 16:05:42,127 DEBUG [dr2/0] mirroringStatus:
2024-04-09 16:05:42,127 DEBUG [dr2/0] lastChecked: '2024-04-09T13:05:42Z'
2024-04-09 16:05:42,127 DEBUG [dr2/0] summary:
2024-04-09 16:05:42,127 DEBUG [dr2/0] daemon_health: UNKNOWN
2024-04-09 16:05:42,127 DEBUG [dr2/0] health: UNKNOWN
2024-04-09 16:05:42,127 DEBUG [dr2/0] image_health: OK
2024-04-09 16:05:42,127 DEBUG [dr2/0] states: {}
2024-04-09 16:05:42,127 DEBUG [dr2/0] observedGeneration: 1
2024-04-09 16:05:42,127 DEBUG [dr2/0] phase: Ready
2024-04-09 16:05:42,127 DEBUG [dr2/0] snapshotScheduleStatus:
2024-04-09 16:05:42,127 DEBUG [dr2/0] lastChecked: '2024-04-09T13:05:42Z'
2024-04-09 16:05:42,127 DEBUG [dr2/0] snapshotSchedules:
2024-04-09 16:05:42,127 DEBUG [dr2/0] - image: '-'
2024-04-09 16:05:42,127 DEBUG [dr2/0] items:
2024-04-09 16:05:42,127 DEBUG [dr2/0] - interval: 2m
2024-04-09 16:05:42,127 DEBUG [dr2/0] start_time: 14:00:00-05:00
2024-04-09 16:05:42,127 DEBUG [dr2/0] namespace: '-'
2024-04-09 16:05:42,127 DEBUG [dr2/0] pool: replicapool
Configuring rbd mirroring on both clusters:
2024-04-09 16:06:50,002 DEBUG [rdr/0] Getting mirroring info site name for cluster 'dr1'
2024-04-09 16:06:50,084 DEBUG [rdr/0] 'cephblockpools.ceph.rook.io/replicapool' output='jsonpath={.status.mirroringInfo.site_name}' found in 0.08 seconds
2024-04-09 16:06:50,084 DEBUG [rdr/0] Getting rbd mirror boostrap peer secret name for cluster 'dr1'
2024-04-09 16:06:50,183 DEBUG [rdr/0] Getting secret pool-peer-token-replicapool token for cluster 'dr1'
2024-04-09 16:06:50,253 DEBUG [rdr/0] Getting mirroring info site name for cluster 'dr2'
2024-04-09 16:06:50,369 DEBUG [rdr/0] 'cephblockpools.ceph.rook.io/replicapool' output='jsonpath={.status.mirroringInfo.site_name}' found in 0.12 seconds
2024-04-09 16:06:50,369 DEBUG [rdr/0] Getting rbd mirror boostrap peer secret name for cluster 'dr2'
2024-04-09 16:06:50,454 DEBUG [rdr/0] Getting secret pool-peer-token-replicapool token for cluster 'dr2'
2024-04-09 16:06:50,547 DEBUG [rdr/0] Setting up mirroring from 'dr2' to 'dr1'
2024-04-09 16:06:50,547 DEBUG [rdr/0] Applying rbd mirror secret in cluster 'dr1'
2024-04-09 16:06:50,708 DEBUG [rdr/0] secret/2aeb5e7d-352f-4227-9c99-6d2d422751f9 created
2024-04-09 16:06:50,711 DEBUG [rdr/0] Configure peers for cluster 'dr1'
2024-04-09 16:06:50,842 DEBUG [rdr/0] cephblockpool.ceph.rook.io/replicapool patched
2024-04-09 16:06:50,847 DEBUG [rdr/0] Apply rbd mirror to cluster 'dr1'
2024-04-09 16:06:50,984 DEBUG [rdr/0] cephrbdmirror.ceph.rook.io/my-rbd-mirror created
2024-04-09 16:06:50,990 DEBUG [rdr/0] Setting up mirroring from 'dr1' to 'dr2'
2024-04-09 16:06:50,990 DEBUG [rdr/0] Applying rbd mirror secret in cluster 'dr2'
2024-04-09 16:06:51,146 DEBUG [rdr/0] secret/c5123696-8638-4c9e-aaf4-2a59333bf389 created
2024-04-09 16:06:51,152 DEBUG [rdr/0] Configure peers for cluster 'dr2'
2024-04-09 16:06:51,241 DEBUG [rdr/0] cephblockpool.ceph.rook.io/replicapool patched
2024-04-09 16:06:51,244 DEBUG [rdr/0] Apply rbd mirror to cluster 'dr2'
2024-04-09 16:06:51,399 DEBUG [rdr/0] cephrbdmirror.ceph.rook.io/my-rbd-mirror created
Waiting for cephrbdmirror for cluster dr1 succeeded in 5 seconds:
2024-04-09 16:06:51,406 DEBUG [rdr/0] Waiting until rbd mirror is ready in cluster 'dr1'
2024-04-09 16:06:55,809 DEBUG [rdr/0] 'cephrbdmirror/my-rbd-mirror' output='jsonpath={.status.phase}' found in 4.40 seconds
2024-04-09 16:06:55,986 DEBUG [rdr/0] cephrbdmirror.ceph.rook.io/my-rbd-mirror condition met
2024-04-09 16:06:56,057 DEBUG [rdr/0] Cluster '{cluster}' rbd mirror status:
2024-04-09 16:06:56,057 DEBUG [rdr/0] observedGeneration: 1
2024-04-09 16:06:56,057 DEBUG [rdr/0] phase: Ready
2024-04-09 16:06:56,057 DEBUG [rdr/0]
Waiting for cephrbdmirror .status.phase for cluster dr2 times out after 300 seconds
this resources is never reconciled:
2024-04-09 16:06:56,057 DEBUG [rdr/0] Waiting until rbd mirror is ready in cluster 'dr2'
2024-04-09 16:11:56,172 ERROR Command failed
Traceback (most recent call last):
...
drenv.commands.Error: Command failed:
command: ('addons/rbd-mirror/start', 'dr1', 'dr2')
exitcode: 1
error:
Traceback (most recent call last):
...
RuntimeError: Timeout waiting for 'cephrbdmirror/my-rbd-mirror' output='jsonpath={.status.phase}' namespace='rook-ceph' profile='dr2' timeout=300