Project

General

Profile

Actions

Bug #49591

open

no active mgr (MGR_DOWN)" in cluster log

Added by Deepika Upadhyay about 3 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific, octopus, nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

seen in nautilus

description: rados/verify/{ceph clusters/{fixed-2 openstack} d-thrash/none msgr-failures/few
  msgr/random objectstore/bluestore-comp-zlib rados tasks/rados_api_tests validater/valgrind}
duration: 2455.64199924469
failure_reason: '"2021-03-02 14:24:52.608800 mon.a (mon.0) 331 : cluster [WRN] Health
  check failed: no active mgr (MGR_DOWN)" in cluster log'

examining mgr logs:

{
    "PG_DEGRADED": {
        "severity": "HEALTH_WARN",
        "summary": {
            "message": "Degraded data redundancy: 1/2 objects degraded (50.000%), 1 pg degraded" 
        },
        "detail": [
            {
                "message": "pg 23.1 is active+undersized+degraded, acting [7]" 
            }
        ]
    },

2021-03-02 14:43:58.900 7fba2ce69700  0 log_channel(cluster) log [DBG] : pgmap v1257: 40 pgs: 1 active+undersized+degraded, 17 active+undersized, 20 stale+active+undersized, 2 active+clean; 5 B data, 35 MiB used, 712 GiB / 720 GiB avail; 1/2 objects degraded (50.000%)

observed slow op as well

/ceph/teuthology-archive/yuriw-2021-03-01_23:47:21-rados-wip-yuri2-testing-2021-03-01-1417-nautilus-distro-basic-smithi/5925885/teuthology.log


Related issues 1 (0 open1 closed)

Has duplicate Ceph - Bug #49985: "[ERR] : Health check failed: no active mgr (MGR_DOWN)" in radosDuplicate

Actions
Actions #1

Updated by Neha Ojha about 3 years ago

  • Subject changed from no active mgr (MGR_DOWN)" in cluster log to nautilus: no active mgr (MGR_DOWN)" in cluster log
Actions #2

Updated by Neha Ojha about 3 years ago

  • Subject changed from nautilus: no active mgr (MGR_DOWN)" in cluster log to no active mgr (MGR_DOWN)" in cluster log

/a/yuriw-2021-04-21_15:39:30-rados-wip-yuri5-testing-2021-04-20-0819-pacific-distro-basic-smithi/6061973/

Actions #3

Updated by Neha Ojha about 3 years ago

  • Has duplicate Bug #49985: "[ERR] : Health check failed: no active mgr (MGR_DOWN)" in rados added
Actions #4

Updated by Nitzan Mordechai almost 2 years ago

/home/teuthworker/archive/yuriw-2022-04-29_15:44:49-rados-wip-yuri5-testing-2022-04-28-1007-distro-default-smithi/6813824

Actions #5

Updated by Radoslaw Zarzynski almost 2 years ago

I can't find Degraded data redundancy in the mgr's log but I can find messages about expired cephx keys:

rzarzynski@teuthology:~$ less /home/teuthworker/archive/yuriw-2022-04-29_15:44:49-rados-wip-yuri5-testing-2022-04-28-1007-distro-default-smithi/6813824/remote/smithi160/log/ceph-mgr.x.log.gz
...
2022-04-29T17:14:42.748+0000 7f25edb3c700 10 cephx client: build_rotating_request
2022-04-29T17:14:42.748+0000 7f25edb3c700 10 monclient: _send_mon_message to mon.noname-b at v2:172.21.15.160:3300/0
2022-04-29T17:14:42.748+0000 7f25edb3c700  1 -- 172.21.15.160:0/26190 --> [v2:172.21.15.160:3300/0,v1:172.21.15.160:6789/0] -- auth(proto 2 2 bytes epoch 0) v1 -- 0x5621e4565980 con 0x5621e3858800
2022-04-29T17:14:43.748+0000 7f25edb3c700 10 monclient: tick
2022-04-29T17:14:43.748+0000 7f25edb3c700 10 cephx: validate_tickets want 55 have 32 need 23
2022-04-29T17:14:43.748+0000 7f25edb3c700 20 cephx client: need_tickets: want=55 have=32 need=23
2022-04-29T17:14:43.748+0000 7f25edb3c700 10 monclient: _check_auth_tickets getting new tickets!
2022-04-29T17:14:43.748+0000 7f25edb3c700 10 cephx client: validate_tickets: want=55 need=23 have=32
2022-04-29T17:14:43.748+0000 7f25edb3c700 10 cephx: validate_tickets want 55 have 32 need 23
2022-04-29T17:14:43.748+0000 7f25edb3c700 10 cephx client: want=55 need=23 have=32
2022-04-29T17:14:43.748+0000 7f25edb3c700 10 cephx client: build_request
2022-04-29T17:14:43.748+0000 7f25edb3c700 10 cephx client: get service keys: want=55 need=23 have=32
2022-04-29T17:14:43.748+0000 7f25edb3c700 10 monclient: _send_mon_message to mon.noname-b at v2:172.21.15.160:3300/0
2022-04-29T17:14:43.748+0000 7f25edb3c700  1 -- 172.21.15.160:0/26190 --> [v2:172.21.15.160:3300/0,v1:172.21.15.160:6789/0] -- auth(proto 2 165 bytes epoch 0) v1 -- 0x5621e4565b00 con 0x5621e3858800
2022-04-29T17:14:43.748+0000 7f25edb3c700 10 monclient: _check_auth_rotating renewing rotating keys (they expired before 2022-04-29T17:14:13.751398+0000)
2022-04-29T17:14:43.748+0000 7f25edb3c700 10 cephx client: build_rotating_request

which fits better to https://tracker.ceph.com/issues/44229.

Actions

Also available in: Atom PDF