Project

General

Profile

Actions

Bug #52846

open

octopus: mgr fails and freezes while doing pg dump

Added by Deepika Upadhyay over 2 years ago. Updated 17 days ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
reef quincy squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-10-06T02:40:07.181 DEBUG:teuthology.orchestra.run.smithi029:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph pg dump --format=json
2021-10-06T02:40:08.669 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mgr.x is failed for ~101s
2021-10-06T02:40:14.276 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mgr.x is failed for ~106s
2021-10-06T02:40:19.885 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mgr.x is failed for ~112s
2021-10-06T02:40:25.490 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mgr.x is failed for ~117s
2021-10-06T02:40:31.097 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mgr.x is failed for ~123s
2021-10-06T02:40:36.704 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mgr.x is failed for ~129s
2021-10-06T02:40:42.310 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mgr.x is failed for ~134s
2021-10-06T02:40:47.915 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mgr.x is failed for ~140s
2021-10-06T02:40:53.519 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mgr.x is failed for ~145s
2021-10-06T02:40:59.124 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mgr.x is failed for ~151s
2021-10-06T02:41:04.730 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mgr.x is failed for ~157s
2021-10-06T02:41:10.336 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mgr.x is failed for ~162s
2021-10-06T02:41:15.940 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mgr.x is failed for ~168s
2021-10-06T02:41:21.544 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mgr.x is failed for ~173s
2021-10-06T02:41:27.148 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mgr.x is failed for ~179s
2021-10-06T02:41:32.752 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mgr.x is failed for ~185s
2021-10-06T02:41:38.358 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mgr.x is failed for ~190s
2021-10-06T02:41:43.963 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mgr.x is failed for ~196s
2021-10-06T02:41:49.567 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mgr.x is failed for ~202s
2021-10-06T02:41:55.171 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mgr.x is failed for ~207s
2021-10-06T02:42:00.776 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mgr.x is failed for ~213s
2021-10-06T02:42:06.380 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.mgr.x is failed for ~218s
2021-10-06T02:42:07.198 DEBUG:teuthology.orchestra.run:got remote process result: 124
2021-10-06T02:42:07.200 ERROR:teuthology.contextutil:Saw exception from nested tasks

looking into mgr after failure of mgr, no activity seen:

2021-10-06T02:38:22.973+0000 7fd774dc1700 20 mgr send_beacon active
2021-10-06T02:38:22.973+0000 7fd774dc1700 15 mgr send_beacon noting RADOS client for blacklist: v2:172.21.15.29:0/3604913844
2021-10-06T02:38:22.973+0000 7fd774dc1700 15 mgr send_beacon noting RADOS client for blacklist: v2:172.21.15.29:0/1143307673
2021-10-06T02:38:22.973+0000 7fd774dc1700 10 mgr send_beacon sending beacon as gid 4111
~

/ceph/teuthology-archive/yuriw-2021-10-05_16:53:07-rados-wip-yuri3-testing-2021-10-05-0806-octopus-distro-basic-smithi/6423870/teuthology.log

Actions #1

Updated by Deepika Upadhyay over 2 years ago

  • Description updated (diff)
  • Backport set to pacific, octopus
Actions #2

Updated by Neha Ojha over 2 years ago

  • Status changed from New to Need More Info

Deepika, can you check if https://tracker.ceph.com/issues/51815 is the same issue?

Actions #3

Updated by Deepika Upadhyay over 2 years ago

@Neha ., hit another instance, don't think so it's related to https://tracker.ceph.com/issues/51815

2021-10-06T02:38:14.041+0000 7fd77cdd1700 10 cephx client: get auth session key: client_challenge 576f7aba520849b1
2021-10-06T02:38:14.041+0000 7fd77cdd1700  1 --2- 172.21.15.29:0/13976 >> [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] conn(0x556718679800 0x55671457de00 unknown :-1 s=AUTH_CONNECTING pgs=0 cs=0 l=0 rev1=1 rx=0 tx=0).handle_auth_bad_method method=2 result (13) Permission denied, allowed methods=[2], allowed modes=[2,1]
2021-10-06T02:38:14.041+0000 7fd77cdd1700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
2021-10-06T02:38:14.041+0000 7fd77cdd1700  1 -- 172.21.15.29:0/13976 >> [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] conn(0x556718679800 msgr2=0x55671457de00 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=0).mark_down
2021-10-06T02:38:14.041+0000 7fd77cdd1700  1 --2- 172.21.15.29:0/13976 >> [v2:172.21.15.29:3300/0,v1:172.21.15.29:6789/0] conn(0x556718679800 0x55671457de00 unknown :-1 s=AUTH_CONNECTING pgs=0 cs=0 l=0 rev1=1 rx=0 tx=0).stop
2021-10-06T02:38:14.041+0000 7fd77cdd1700  1 --2- 172.21.15.29:0/13976 >> [v2:172.21.15.29:3302/0,v1:172.21.15.29:6791/0] conn(0x556718515800 0x55671457d400 unknown :-1 s=AUTH_CONNECTING pgs=0 cs=0 l=0 rev1=1 rx=0 tx=0).handle_auth_bad_method method=2 result (13) Permission denied, allowed methods=[2], allowed modes=[2,1]

/ceph/teuthology-archive/ideepika-2021-10-25_09:33:09-rados-wip-yuri5-testing-2021-10-20-0922-octopus-distro-basic-smithi/6460335/remote/smithi071/log/ceph-mgr.x.log.gz

Actions #4

Updated by Konstantin Shalygin 17 days ago

  • Target version set to v20.0.0
  • Backport changed from pacific, octopus to reef quincy squid
Actions

Also available in: Atom PDF