Actions
Bug #47275
closeddaemon may be missing in mgr service map
% Done:
0%
Source:
Tags:
Backport:
octopus, nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Description
It have been sporadically observed when running rbd-mirror functional tests: some of running rbd-mirror daemons may be missing in the mgr service map.
Here is an example [1].
There are only two rbd-mirror daemons registered in the service map:
2020-09-01T15:23:45.481 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stderr:+ ceph --cluster cluster2 -s 2020-09-01T15:23:45.481 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: 2020-09-01T15:23:45.482 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:cluster2 status 2020-09-01T15:23:45.948 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: cluster: 2020-09-01T15:23:45.949 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: id: 50c2ce51-a5bd-4cf9-a22e-ee524f36a3ff 2020-09-01T15:23:45.949 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: health: HEALTH_OK 2020-09-01T15:23:45.949 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: 2020-09-01T15:23:45.950 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: services: 2020-09-01T15:23:45.950 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: mon: 1 daemons, quorum a (age 5m) 2020-09-01T15:23:45.950 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: mgr: x(active, since 5m) 2020-09-01T15:23:45.950 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: osd: 3 osds: 3 up (since 5m), 3 in (since 5m) 2020-09-01T15:23:45.951 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: rbd-mirror: 2 daemons active (4380, 4384) 2020-09-01T15:23:45.951 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: 2020-09-01T15:23:45.951 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: data: 2020-09-01T15:23:45.951 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: pools: 4 pools, 137 pgs 2020-09-01T15:23:45.952 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: objects: 152 objects, 298 MiB 2020-09-01T15:23:45.952 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: usage: 62 MiB used, 270 GiB / 270 GiB avail 2020-09-01T15:23:45.952 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: pgs: 137 active+clean 2020-09-01T15:23:45.953 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: 2020-09-01T15:23:45.953 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: io: 2020-09-01T15:23:45.953 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: client: 12 KiB/s rd, 597 B/s wr, 12 op/s rd, 0 op/s wr 2020-09-01T15:23:45.953 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: 2020-09-01T15:23:45.958 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stderr:+ CEPH_ARGS= 2020-09-01T15:23:45.959 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stderr:+ ceph --cluster cluster2 service dump 2020-09-01T15:23:46.316 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:{ 2020-09-01T15:23:46.316 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "epoch": 5, 2020-09-01T15:23:46.316 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "modified": "2020-09-01T15:20:39.851107+0000", 2020-09-01T15:23:46.317 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "services": { 2020-09-01T15:23:46.317 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "rbd-mirror": { 2020-09-01T15:23:46.317 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "daemons": { 2020-09-01T15:23:46.317 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "summary": "", 2020-09-01T15:23:46.318 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "4380": { 2020-09-01T15:23:46.318 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "start_epoch": 5, 2020-09-01T15:23:46.318 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "start_stamp": "2020-09-01T15:20:39.808598+0000", 2020-09-01T15:23:46.318 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "gid": 4380, 2020-09-01T15:23:46.318 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "addr": "172.21.15.146:0/1490269339", 2020-09-01T15:23:46.319 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "metadata": { 2020-09-01T15:23:46.319 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "arch": "x86_64", 2020-09-01T15:23:46.319 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "ceph_release": "pacific", 2020-09-01T15:23:46.319 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "ceph_version": "ceph version 16.0.0-4970-g3eec59f6962 (3eec59f6962e804c64d36520f4882f41f9ce481b) pacific (dev)", 2020-09-01T15:23:46.319 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "ceph_version_short": "16.0.0-4970-g3eec59f6962", 2020-09-01T15:23:46.320 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "cpu": "Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz", 2020-09-01T15:23:46.320 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "distro": "centos", 2020-09-01T15:23:46.320 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "distro_description": "CentOS Linux 8 (Core)", 2020-09-01T15:23:46.320 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "distro_version": "8", 2020-09-01T15:23:46.320 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "hostname": "smithi146", 2020-09-01T15:23:46.321 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "id": "mirror.0", 2020-09-01T15:23:46.321 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "instance_id": "4380", 2020-09-01T15:23:46.321 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "kernel_description": "#1 SMP Sun Jul 26 03:54:29 UTC 2020", 2020-09-01T15:23:46.321 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "kernel_version": "4.18.0-193.14.2.el8_2.x86_64", 2020-09-01T15:23:46.321 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "mem_swap_kb": "0", 2020-09-01T15:23:46.322 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "mem_total_kb": "32653172", 2020-09-01T15:23:46.322 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "os": "Linux" 2020-09-01T15:23:46.322 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: }, 2020-09-01T15:23:46.322 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "task_status": {} 2020-09-01T15:23:46.322 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: }, 2020-09-01T15:23:46.323 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "4384": { 2020-09-01T15:23:46.323 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "start_epoch": 5, 2020-09-01T15:23:46.323 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "start_stamp": "2020-09-01T15:20:39.820979+0000", 2020-09-01T15:23:46.323 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "gid": 4384, 2020-09-01T15:23:46.323 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "addr": "172.21.15.146:0/342955495", 2020-09-01T15:23:46.324 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "metadata": { 2020-09-01T15:23:46.324 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "arch": "x86_64", 2020-09-01T15:23:46.324 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "ceph_release": "pacific", 2020-09-01T15:23:46.324 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "ceph_version": "ceph version 16.0.0-4970-g3eec59f6962 (3eec59f6962e804c64d36520f4882f41f9ce481b) pacific (dev)", 2020-09-01T15:23:46.324 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "ceph_version_short": "16.0.0-4970-g3eec59f6962", 2020-09-01T15:23:46.325 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "cpu": "Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz", 2020-09-01T15:23:46.325 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "distro": "centos", 2020-09-01T15:23:46.325 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "distro_description": "CentOS Linux 8 (Core)", 2020-09-01T15:23:46.325 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "distro_version": "8", 2020-09-01T15:23:46.325 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "hostname": "smithi146", 2020-09-01T15:23:46.326 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "id": "mirror.1", 2020-09-01T15:23:46.326 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "instance_id": "4384", 2020-09-01T15:23:46.326 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "kernel_description": "#1 SMP Sun Jul 26 03:54:29 UTC 2020", 2020-09-01T15:23:46.326 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "kernel_version": "4.18.0-193.14.2.el8_2.x86_64", 2020-09-01T15:23:46.326 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "mem_swap_kb": "0", 2020-09-01T15:23:46.327 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "mem_total_kb": "32653172", 2020-09-01T15:23:46.327 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "os": "Linux" 2020-09-01T15:23:46.327 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: }, 2020-09-01T15:23:46.327 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "task_status": {} 2020-09-01T15:23:46.327 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: } 2020-09-01T15:23:46.328 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: } 2020-09-01T15:23:46.328 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: } 2020-09-01T15:23:46.328 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: } 2020-09-01T15:23:46.328 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:}
While there should be 4, and all these 4 are actually seen in the `ceph service status` output:
2020-09-01T15:23:46.329 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stderr:+ ceph --cluster cluster2 service status 2020-09-01T15:23:46.681 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:{ 2020-09-01T15:23:46.681 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "rbd-mirror": { 2020-09-01T15:23:46.681 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "4380": { 2020-09-01T15:23:46.682 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "status_stamp": "2020-09-01T15:23:44.815638+0000", 2020-09-01T15:23:46.682 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "last_beacon": "2020-09-01T15:23:44.815638+0000", 2020-09-01T15:23:46.682 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "status": { 2020-09-01T15:23:46.682 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "json": "{\"3\":{\"name\":\"mirror\",\"callouts\":{},\"image_assigned_count\":1,\"image_error_count\":0,\"image_local_count\":3,\"image_remote_count\":3,\"image_warning_count\":0,\"instance_id\":\"4387\",\"leader\":true,\"namespaces\":{\"ns1\":{\"image_assigned_count\":0,\"image_error_count\":0,\"image_local_count\":0,\"image_remote_count\":0,\"image_warning_count\":0},\"ns2\":{\"image_assigned_count\":0,\"image_error_count\":0,\"image_local_count\":0,\"image_remote_count\":0,\"image_warning_count\":0}}},\"4\":{\"name\":\"mirror_parent\",\"callouts\":{},\"image_assigned_count\":0,\"image_error_count\":0,\"image_local_count\":0,\"image_remote_count\":0,\"image_warning_count\":0,\"instance_id\":\"4406\",\"leader\":true}}" 2020-09-01T15:23:46.682 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: } 2020-09-01T15:23:46.683 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: }, 2020-09-01T15:23:46.683 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "4384": { 2020-09-01T15:23:46.683 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "status_stamp": "2020-09-01T15:23:44.828198+0000", 2020-09-01T15:23:46.683 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "last_beacon": "2020-09-01T15:23:44.828198+0000", 2020-09-01T15:23:46.683 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "status": { 2020-09-01T15:23:46.684 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "json": "{\"3\":{\"name\":\"mirror\",\"callouts\":{},\"image_assigned_count\":1,\"image_error_count\":0,\"image_warning_count\":0,\"instance_id\":\"4394\",\"namespaces\":{\"ns1\":{\"image_assigned_count\":0,\"image_error_count\":0,\"image_warning_count\":0},\"ns2\":{\"image_assigned_count\":0,\"image_error_count\":0,\"image_warning_count\":0}}},\"4\":{\"name\":\"mirror_parent\",\"callouts\":{},\"image_assigned_count\":0,\"image_error_count\":0,\"image_warning_count\":0,\"instance_id\":\"4410\"}}" 2020-09-01T15:23:46.684 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: } 2020-09-01T15:23:46.684 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: }, 2020-09-01T15:23:46.684 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "4393": { 2020-09-01T15:23:46.684 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "status_stamp": "2020-09-01T15:23:44.863397+0000", 2020-09-01T15:23:46.685 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "last_beacon": "2020-09-01T15:23:44.863397+0000", 2020-09-01T15:23:46.685 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "status": { 2020-09-01T15:23:46.685 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "json": "{\"3\":{\"name\":\"mirror\",\"callouts\":{},\"image_assigned_count\":1,\"image_error_count\":0,\"image_warning_count\":0,\"instance_id\":\"4400\",\"namespaces\":{\"ns1\":{\"image_assigned_count\":0,\"image_error_count\":0,\"image_warning_count\":0},\"ns2\":{\"image_assigned_count\":0,\"image_error_count\":0,\"image_warning_count\":0}}},\"4\":{\"name\":\"mirror_parent\",\"callouts\":{},\"image_assigned_count\":0,\"image_error_count\":0,\"image_warning_count\":0,\"instance_id\":\"4413\"}}" 2020-09-01T15:23:46.685 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: } 2020-09-01T15:23:46.685 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: }, 2020-09-01T15:23:46.686 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "4401": { 2020-09-01T15:23:46.687 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "status_stamp": "2020-09-01T15:23:44.886579+0000", 2020-09-01T15:23:46.687 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "last_beacon": "2020-09-01T15:23:44.886579+0000", 2020-09-01T15:23:46.687 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "status": { 2020-09-01T15:23:46.687 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: "json": "{\"3\":{\"name\":\"mirror\",\"callouts\":{},\"image_assigned_count\":0,\"image_error_count\":0,\"image_warning_count\":0,\"instance_id\":\"4411\",\"namespaces\":{\"ns1\":{\"image_assigned_count\":0,\"image_error_count\":0,\"image_warning_count\":0},\"ns2\":{\"image_assigned_count\":0,\"image_error_count\":0,\"image_warning_count\":0}}},\"4\":{\"name\":\"mirror_parent\",\"callouts\":{},\"image_assigned_count\":0,\"image_error_count\":0,\"image_warning_count\":0,\"instance_id\":\"4416\"}}" 2020-09-01T15:23:46.687 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: } 2020-09-01T15:23:46.687 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: } 2020-09-01T15:23:46.688 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout: } 2020-09-01T15:23:46.688 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:}
Actions