Project

General

Profile

Actions

Bug #54406

open

cephadm/mgr-nfs-upgrade: cluster [WRN] overall HEALTH_WARN no active mgr

Added by Laura Flores about 2 years ago. Updated about 2 years ago.

Status:
Triaged
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/yuriw-2022-02-21_15:48:20-rados-wip-yuri7-testing-2022-02-17-0852-pacific-distro-default-smithi/6698628

2022-02-21T21:30:00.283 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 21 21:30:00 smithi006 conmon[29154]: cluster
2022-02-21T21:30:00.283 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 21 21:30:00 smithi006 conmon[29154]: 2022-02-21T21:30:00.000151+0000
2022-02-21T21:30:00.283 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 21 21:30:00 smithi006 conmon[29154]: mon.smithi006 (mon.0) 831 : cluster [WRN]
2022-02-21T21:30:00.283 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 21 21:30:00 smithi006 conmon[29154]:  Health detail: HEALTH_WARN 1 MDSs report slow requests
2022-02-21T21:30:00.284 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 21 21:30:00 smithi006 conmon[29154]: cluster
2022-02-21T21:30:00.284 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 21 21:30:00 smithi006 conmon[29154]: 2022-02-21T
2022-02-21T21:30:00.284 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 21 21:30:00 smithi006 conmon[29154]: 21:30:00.000204+0000 mon.smithi006 (
2022-02-21T21:30:00.284 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 21 21:30:00 smithi006 conmon[29154]: mon.0) 832 : cluster [WRN]
2022-02-21T21:30:00.285 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 21 21:30:00 smithi006 conmon[29154]: [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
2022-02-21T21:30:00.285 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 21 21:30:00 smithi006 conmon[29154]: cluster
2022-02-21T21:30:00.285 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 21 21:30:00 smithi006 conmon[29154]: 2022-02-21T21:
2022-02-21T21:30:00.285 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 21 21:30:00 smithi006 conmon[29154]: 30:00.000224+0000 mon.smithi006 (
2022-02-21T21:30:00.285 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 21 21:30:00 smithi006 conmon[29154]: mon.0) 833 : cluster [WRN]

...

2022-02-21T21:30:01.343 INFO:journalctl@ceph.mon.smithi084.smithi084.stdout:Feb 21 21:30:01 smithi084 conmon[32205]: mon.smithi006 (mon.0) 834 : cluster [WRN] Health check failed: 1 clients failing to respond to capability release (MDS_CLIENT_LATE_RELEASE) ---> this repeats several times

...

2022-02-21T21:30:01.345 INFO:journalctl@ceph.mon.smithi084.smithi084.stdout:Feb 21 21:30:01 smithi084 conmon[32205]: cluster 2022-02-21T21:30:00.506984+0000 mds.foofs.smithi006.hyukcc (mds.0) 11 : cluster [WRN] 1 slow requests, 0 included below; oldest blocked for > 66.896486 secs --> this part also repeats

...

2022-02-21T21:30:46.843 INFO:journalctl@ceph.mon.smithi084.smithi084.stdout:Feb 21 21:30:46 smithi084 conmon[32205]: cluster 2022-02-21T21:30:45.675935+0000 mon.smithi006 (mon.0) 17 : cluster [WRN] Health check failed: no active mgr (MGR_DOWN)

... actions leading up to dead job ...

2022-02-22T03:00:00.355 INFO:journalctl@ceph.mon.smithi084.smithi084.stdout:Feb 22 03:00:00 smithi084 conmon[32205]: mon.smithi006 (mon.0) 67 : cluster [WRN] overall HEALTH_WARN no active mgr
2022-02-22T03:00:00.356 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 22 03:00:00 smithi006 conmon[105699]: cluster 2022-02-22T03:00:00.000080
2022-02-22T03:00:00.356 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 22 03:00:00 smithi006 conmon[105699]: +0000 mon.smithi006 (mon.0) 67 : cluster [WRN] overall HEALTH_WARN no active mgr
2022-02-22T03:10:00.335 INFO:journalctl@ceph.mon.smithi084.smithi084.stdout:Feb 22 03:10:00 smithi084 conmon[32205]: cluster 2022-02-22T03:10:00.000075+0000 mon.smithi006 (mon.0) 68 : cluster [WRN] overall HEALTH_WARN no active mgr
2022-02-22T03:10:00.353 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 22 03:10:00 smithi006 conmon[105699]: cluster 2022-02-22T03:10:00.000075+0000
2022-02-22T03:10:00.354 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 22 03:10:00 smithi006 conmon[105699]: mon.smithi006 (mon.0) 68 : cluster [WRN] overall HEALTH_WARN no active mgr
2022-02-22T03:11:02.139 INFO:journalctl@ceph.mon.smithi084.smithi084.stdout:Feb 22 03:11:01 smithi084 conmon[32205]: debug 2022-02-22T03:11:01.990+0000 7f1160fc3700 -1 received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
2022-02-22T03:11:02.140 INFO:journalctl@ceph.osd.5.smithi084.stdout:Feb 22 03:11:01 smithi084 conmon[42449]: debug 2022-02-22T03:11:01.990+0000 7f02a3d94700 -1 received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
2022-02-22T03:11:02.141 INFO:journalctl@ceph.osd.6.smithi084.stdout:Feb 22 03:11:01 smithi084 conmon[46647]: debug 2022-02-22T03:11:01.990+0000 7fbf14871700 -1 received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
2022-02-22T03:11:02.141 INFO:journalctl@ceph.osd.7.smithi084.stdout:Feb 22 03:11:01 smithi084 conmon[50852]: debug 2022-02-22T03:11:01.990+0000 7fd23dae4700 -1 received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
2022-02-22T03:11:02.142 INFO:journalctl@ceph.osd.4.smithi084.stdout:Feb 22 03:11:01 smithi084 conmon[38345]: debug 2022-02-22T03:11:01.990+0000 7f3c84052700 -1 received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
2022-02-22T03:20:00.330 INFO:journalctl@ceph.mon.smithi084.smithi084.stdout:Feb 22 03:20:00 smithi084 conmon[32205]: cluster 2022-02-22T03:20:00.000099
2022-02-22T03:20:00.331 INFO:journalctl@ceph.mon.smithi084.smithi084.stdout:Feb 22 03:20:00 smithi084 conmon[32205]: +0000 mon.smithi006 (mon.0) 69 : cluster [WRN] overall HEALTH_WARN no active mgr
2022-02-22T03:20:00.352 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 22 03:20:00 smithi006 conmon[105699]: cluster 2022-02-22T
2022-02-22T03:20:00.352 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 22 03:20:00 smithi006 conmon[105699]: 03:20:00.000099+0000 mon.smithi006 (mon.0) 69 : cluster [WRN] overall HEALTH_WARN no active mgr
2022-02-22T03:30:00.334 INFO:journalctl@ceph.mon.smithi084.smithi084.stdout:Feb 22 03:30:00 smithi084 conmon[32205]: cluster 2022-02-22T03:30:00.000079+0000 mon.smithi006 (mon.
2022-02-22T03:30:00.351 INFO:journalctl@ceph.mon.smithi084.smithi084.stdout:Feb 22 03:30:00 smithi084 conmon[32205]: 0) 70 : cluster [WRN] overall HEALTH_WARN no active mgr
2022-02-22T03:30:00.363 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 22 03:30:00 smithi006 conmon[105699]: cluster 2022-02-22T03:30:00.000079+0000
2022-02-22T03:30:00.363 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 22 03:30:00 smithi006 conmon[105699]:  mon.smithi006 (mon.0) 70 : cluster [WRN] overall HEALTH_WARN no active mgr
2022-02-22T03:35:02.119 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 22 03:35:01 smithi006 conmon[105699]: debug 2022-02-22T03:35:01.816+0000 7f58911f5700 -1 received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
2022-02-22T03:35:02.121 INFO:journalctl@ceph.osd.0.smithi006.stdout:Feb 22 03:35:01 smithi006 conmon[45979]: debug 2022-02-22T03:35:01.816+0000 7f379f986700 -1 received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
2022-02-22T03:35:02.121 INFO:journalctl@ceph.osd.1.smithi006.stdout:Feb 22 03:35:01 smithi006 conmon[50175]: debug 2022-02-22T03:35:01.816+0000 7fd9c8e27700 -1 received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
2022-02-22T03:35:02.122 INFO:journalctl@ceph.osd.2.smithi006.stdout:Feb 22 03:35:01 smithi006 conmon[55725]: debug 2022-02-22T03:35:01.816+0000 7f76cccc9700 -1 received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
2022-02-22T03:35:02.122 INFO:journalctl@ceph.osd.3.smithi006.stdout:Feb 22 03:35:01 smithi006 conmon[61261]: debug 2022-02-22T03:35:01.816+0000 7fa0c5fad700 -1 received  signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
2022-02-22T03:40:00.347 INFO:journalctl@ceph.mon.smithi084.smithi084.stdout:Feb 22 03:40:00 smithi084 conmon[32205]: cluster 2022-02-22T03:40:00.000162+0000 mon.smithi006 (mon.0
2022-02-22T03:40:00.348 INFO:journalctl@ceph.mon.smithi084.smithi084.stdout:Feb 22 03:40:00 smithi084 conmon[32205]: ) 71 : cluster [WRN] overall HEALTH_WARN no active mgr
2022-02-22T03:40:00.371 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 22 03:40:00 smithi006 conmon[105699]: cluster 2022-02-22T03:40:00.000162
2022-02-22T03:40:00.371 INFO:journalctl@ceph.mon.smithi006.smithi006.stdout:Feb 22 03:40:00 smithi006 conmon[105699]: +0000 mon.smithi006 (mon.0) 71 : cluster [WRN] overall HEALTH_WARN no active mgr
2022-02-22T03:42:39.393 DEBUG:teuthology.exit:Got signal 15; running 1 handler...
2022-02-22T03:42:39.396 DEBUG:teuthology.task.console_log:Killing console logger for smithi006
2022-02-22T03:42:39.397 DEBUG:teuthology.task.console_log:Killing console logger for smithi084
2022-02-22T03:42:39.398 DEBUG:teuthology.exit:Finished running handlers

Actions #1

Updated by Laura Flores about 2 years ago

  • Backport set to pacific
Actions #2

Updated by Venky Shankar about 2 years ago

  • Status changed from New to Triaged
  • Assignee set to Ramana Raja
Actions #3

Updated by Aishwarya Mathuria about 2 years ago

/a/yuriw-2022-03-23_14:51:02-rados-wip-yuri4-testing-2022-03-21-1648-pacific-distro-default-smithi/6756019

Actions

Also available in: Atom PDF