Project

General

Profile

Actions

Bug #47275

closed

daemon may be missing in mgr service map

Added by Mykola Golub over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
octopus, nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

It have been sporadically observed when running rbd-mirror functional tests: some of running rbd-mirror daemons may be missing in the mgr service map.

Here is an example [1].

There are only two rbd-mirror daemons registered in the service map:

2020-09-01T15:23:45.481 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stderr:+ ceph --cluster cluster2 -s
2020-09-01T15:23:45.481 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:
2020-09-01T15:23:45.482 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:cluster2 status
2020-09-01T15:23:45.948 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:  cluster:
2020-09-01T15:23:45.949 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:    id:     50c2ce51-a5bd-4cf9-a22e-ee524f36a3ff
2020-09-01T15:23:45.949 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:    health: HEALTH_OK
2020-09-01T15:23:45.949 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:
2020-09-01T15:23:45.950 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:  services:
2020-09-01T15:23:45.950 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:    mon:        1 daemons, quorum a (age 5m)
2020-09-01T15:23:45.950 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:    mgr:        x(active, since 5m)
2020-09-01T15:23:45.950 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:    osd:        3 osds: 3 up (since 5m), 3 in (since 5m)
2020-09-01T15:23:45.951 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:    rbd-mirror: 2 daemons active (4380, 4384)
2020-09-01T15:23:45.951 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:
2020-09-01T15:23:45.951 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:  data:
2020-09-01T15:23:45.951 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:    pools:   4 pools, 137 pgs
2020-09-01T15:23:45.952 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:    objects: 152 objects, 298 MiB
2020-09-01T15:23:45.952 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:    usage:   62 MiB used, 270 GiB / 270 GiB avail
2020-09-01T15:23:45.952 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:    pgs:     137 active+clean
2020-09-01T15:23:45.953 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:
2020-09-01T15:23:45.953 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:  io:
2020-09-01T15:23:45.953 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:    client:   12 KiB/s rd, 597 B/s wr, 12 op/s rd, 0 op/s wr
2020-09-01T15:23:45.953 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:
2020-09-01T15:23:45.958 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stderr:+ CEPH_ARGS=
2020-09-01T15:23:45.959 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stderr:+ ceph --cluster cluster2 service dump
2020-09-01T15:23:46.316 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:{
2020-09-01T15:23:46.316 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:    "epoch": 5,
2020-09-01T15:23:46.316 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:    "modified": "2020-09-01T15:20:39.851107+0000",
2020-09-01T15:23:46.317 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:    "services": {
2020-09-01T15:23:46.317 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:        "rbd-mirror": {
2020-09-01T15:23:46.317 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:            "daemons": {
2020-09-01T15:23:46.317 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                "summary": "",
2020-09-01T15:23:46.318 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                "4380": {
2020-09-01T15:23:46.318 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                    "start_epoch": 5,
2020-09-01T15:23:46.318 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                    "start_stamp": "2020-09-01T15:20:39.808598+0000",
2020-09-01T15:23:46.318 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                    "gid": 4380,
2020-09-01T15:23:46.318 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                    "addr": "172.21.15.146:0/1490269339",
2020-09-01T15:23:46.319 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                    "metadata": {
2020-09-01T15:23:46.319 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "arch": "x86_64",
2020-09-01T15:23:46.319 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "ceph_release": "pacific",
2020-09-01T15:23:46.319 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "ceph_version": "ceph version 16.0.0-4970-g3eec59f6962 (3eec59f6962e804c64d36520f4882f41f9ce481b) pacific (dev)",
2020-09-01T15:23:46.319 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "ceph_version_short": "16.0.0-4970-g3eec59f6962",
2020-09-01T15:23:46.320 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "cpu": "Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz",
2020-09-01T15:23:46.320 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "distro": "centos",
2020-09-01T15:23:46.320 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "distro_description": "CentOS Linux 8 (Core)",
2020-09-01T15:23:46.320 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "distro_version": "8",
2020-09-01T15:23:46.320 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "hostname": "smithi146",
2020-09-01T15:23:46.321 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "id": "mirror.0",
2020-09-01T15:23:46.321 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "instance_id": "4380",
2020-09-01T15:23:46.321 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "kernel_description": "#1 SMP Sun Jul 26 03:54:29 UTC 2020",
2020-09-01T15:23:46.321 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "kernel_version": "4.18.0-193.14.2.el8_2.x86_64",
2020-09-01T15:23:46.321 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "mem_swap_kb": "0",
2020-09-01T15:23:46.322 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "mem_total_kb": "32653172",
2020-09-01T15:23:46.322 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "os": "Linux" 
2020-09-01T15:23:46.322 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                    },
2020-09-01T15:23:46.322 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                    "task_status": {}
2020-09-01T15:23:46.322 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                },
2020-09-01T15:23:46.323 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                "4384": {
2020-09-01T15:23:46.323 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                    "start_epoch": 5,
2020-09-01T15:23:46.323 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                    "start_stamp": "2020-09-01T15:20:39.820979+0000",
2020-09-01T15:23:46.323 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                    "gid": 4384,
2020-09-01T15:23:46.323 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                    "addr": "172.21.15.146:0/342955495",
2020-09-01T15:23:46.324 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                    "metadata": {
2020-09-01T15:23:46.324 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "arch": "x86_64",
2020-09-01T15:23:46.324 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "ceph_release": "pacific",
2020-09-01T15:23:46.324 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "ceph_version": "ceph version 16.0.0-4970-g3eec59f6962 (3eec59f6962e804c64d36520f4882f41f9ce481b) pacific (dev)",
2020-09-01T15:23:46.324 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "ceph_version_short": "16.0.0-4970-g3eec59f6962",
2020-09-01T15:23:46.325 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "cpu": "Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz",
2020-09-01T15:23:46.325 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "distro": "centos",
2020-09-01T15:23:46.325 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "distro_description": "CentOS Linux 8 (Core)",
2020-09-01T15:23:46.325 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "distro_version": "8",
2020-09-01T15:23:46.325 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "hostname": "smithi146",
2020-09-01T15:23:46.326 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "id": "mirror.1",
2020-09-01T15:23:46.326 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "instance_id": "4384",
2020-09-01T15:23:46.326 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "kernel_description": "#1 SMP Sun Jul 26 03:54:29 UTC 2020",
2020-09-01T15:23:46.326 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "kernel_version": "4.18.0-193.14.2.el8_2.x86_64",
2020-09-01T15:23:46.326 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "mem_swap_kb": "0",
2020-09-01T15:23:46.327 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "mem_total_kb": "32653172",
2020-09-01T15:23:46.327 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                        "os": "Linux" 
2020-09-01T15:23:46.327 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                    },
2020-09-01T15:23:46.327 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                    "task_status": {}
2020-09-01T15:23:46.327 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                }
2020-09-01T15:23:46.328 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:            }
2020-09-01T15:23:46.328 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:        }
2020-09-01T15:23:46.328 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:    }
2020-09-01T15:23:46.328 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:}

While there should be 4, and all these 4 are actually seen in the `ceph service status` output:
2020-09-01T15:23:46.329 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stderr:+ ceph --cluster cluster2 service status
2020-09-01T15:23:46.681 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:{
2020-09-01T15:23:46.681 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:    "rbd-mirror": {
2020-09-01T15:23:46.681 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:        "4380": {
2020-09-01T15:23:46.682 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:            "status_stamp": "2020-09-01T15:23:44.815638+0000",
2020-09-01T15:23:46.682 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:            "last_beacon": "2020-09-01T15:23:44.815638+0000",
2020-09-01T15:23:46.682 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:            "status": {
2020-09-01T15:23:46.682 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                "json": "{\"3\":{\"name\":\"mirror\",\"callouts\":{},\"image_assigned_count\":1,\"image_error_count\":0,\"image_local_count\":3,\"image_remote_count\":3,\"image_warning_count\":0,\"instance_id\":\"4387\",\"leader\":true,\"namespaces\":{\"ns1\":{\"image_assigned_count\":0,\"image_error_count\":0,\"image_local_count\":0,\"image_remote_count\":0,\"image_warning_count\":0},\"ns2\":{\"image_assigned_count\":0,\"image_error_count\":0,\"image_local_count\":0,\"image_remote_count\":0,\"image_warning_count\":0}}},\"4\":{\"name\":\"mirror_parent\",\"callouts\":{},\"image_assigned_count\":0,\"image_error_count\":0,\"image_local_count\":0,\"image_remote_count\":0,\"image_warning_count\":0,\"instance_id\":\"4406\",\"leader\":true}}" 
2020-09-01T15:23:46.682 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:            }
2020-09-01T15:23:46.683 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:        },
2020-09-01T15:23:46.683 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:        "4384": {
2020-09-01T15:23:46.683 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:            "status_stamp": "2020-09-01T15:23:44.828198+0000",
2020-09-01T15:23:46.683 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:            "last_beacon": "2020-09-01T15:23:44.828198+0000",
2020-09-01T15:23:46.683 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:            "status": {
2020-09-01T15:23:46.684 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                "json": "{\"3\":{\"name\":\"mirror\",\"callouts\":{},\"image_assigned_count\":1,\"image_error_count\":0,\"image_warning_count\":0,\"instance_id\":\"4394\",\"namespaces\":{\"ns1\":{\"image_assigned_count\":0,\"image_error_count\":0,\"image_warning_count\":0},\"ns2\":{\"image_assigned_count\":0,\"image_error_count\":0,\"image_warning_count\":0}}},\"4\":{\"name\":\"mirror_parent\",\"callouts\":{},\"image_assigned_count\":0,\"image_error_count\":0,\"image_warning_count\":0,\"instance_id\":\"4410\"}}" 
2020-09-01T15:23:46.684 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:            }
2020-09-01T15:23:46.684 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:        },
2020-09-01T15:23:46.684 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:        "4393": {
2020-09-01T15:23:46.684 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:            "status_stamp": "2020-09-01T15:23:44.863397+0000",
2020-09-01T15:23:46.685 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:            "last_beacon": "2020-09-01T15:23:44.863397+0000",
2020-09-01T15:23:46.685 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:            "status": {
2020-09-01T15:23:46.685 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                "json": "{\"3\":{\"name\":\"mirror\",\"callouts\":{},\"image_assigned_count\":1,\"image_error_count\":0,\"image_warning_count\":0,\"instance_id\":\"4400\",\"namespaces\":{\"ns1\":{\"image_assigned_count\":0,\"image_error_count\":0,\"image_warning_count\":0},\"ns2\":{\"image_assigned_count\":0,\"image_error_count\":0,\"image_warning_count\":0}}},\"4\":{\"name\":\"mirror_parent\",\"callouts\":{},\"image_assigned_count\":0,\"image_error_count\":0,\"image_warning_count\":0,\"instance_id\":\"4413\"}}" 
2020-09-01T15:23:46.685 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:            }
2020-09-01T15:23:46.685 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:        },
2020-09-01T15:23:46.686 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:        "4401": {
2020-09-01T15:23:46.687 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:            "status_stamp": "2020-09-01T15:23:44.886579+0000",
2020-09-01T15:23:46.687 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:            "last_beacon": "2020-09-01T15:23:44.886579+0000",
2020-09-01T15:23:46.687 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:            "status": {
2020-09-01T15:23:46.687 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:                "json": "{\"3\":{\"name\":\"mirror\",\"callouts\":{},\"image_assigned_count\":0,\"image_error_count\":0,\"image_warning_count\":0,\"instance_id\":\"4411\",\"namespaces\":{\"ns1\":{\"image_assigned_count\":0,\"image_error_count\":0,\"image_warning_count\":0},\"ns2\":{\"image_assigned_count\":0,\"image_error_count\":0,\"image_warning_count\":0}}},\"4\":{\"name\":\"mirror_parent\",\"callouts\":{},\"image_assigned_count\":0,\"image_error_count\":0,\"image_warning_count\":0,\"instance_id\":\"4416\"}}" 
2020-09-01T15:23:46.687 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:            }
2020-09-01T15:23:46.687 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:        }
2020-09-01T15:23:46.688 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:    }
2020-09-01T15:23:46.688 INFO:tasks.workunit.cluster1.client.mirror.smithi146.stdout:}

[1] http://qa-proxy.ceph.com/teuthology/trociny-2020-09-01_14:55:56-rbd-wip-mgolub-testing-distro-basic-smithi/5397684/teuthology.log


Related issues 2 (0 open2 closed)

Copied to mgr - Backport #47410: octopus: daemon may be missing in mgr service mapResolvedLaura PaduanoActions
Copied to mgr - Backport #47411: nautilus: daemon may be missing in mgr service mapResolvedLaura PaduanoActions
Actions

Also available in: Atom PDF