Project

General

Profile

Actions

Bug #23928

open

qa: spurious cluster "[WRN] Manager daemon y is unresponsive. No standby daemons available." in cluster log

Added by Patrick Donnelly about 6 years ago. Updated about 5 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
testing
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

During shutdown we sometimes see this:

2018-04-28T18:27:35.688 INFO:teuthology.misc:Shutting down mgr daemons...
2018-04-28T18:27:35.689 DEBUG:tasks.ceph.mgr.y:waiting for process to exit
2018-04-28T18:27:35.689 INFO:teuthology.orchestra.run:waiting for 300
2018-04-28T18:27:35.690 INFO:tasks.ceph.mgr.y.smithi080.stderr:2018-04-28 18:27:35.690 7f0ab5ffb700 -1 received  signal: Terminated from /usr/bin/python /bin/daemon-helper term ceph-mgr -f --cluster ceph -i y  (PID: 20319) UID: 0
2018-04-28T18:27:35.690 INFO:tasks.ceph.mgr.y.smithi080.stderr:2018-04-28 18:27:35.690 7f0ab5ffb700 -1 mgr handle_signal *** Got signal Terminated ***
2018-04-28T18:27:35.787 INFO:tasks.ceph.mgr.y:Stopped
2018-04-28T18:27:35.788 DEBUG:tasks.ceph.mgr.x:waiting for process to exit
2018-04-28T18:27:35.788 INFO:teuthology.orchestra.run:waiting for 300
2018-04-28T18:27:35.789 INFO:tasks.ceph.mgr.x.smithi047.stderr:2018-04-28 18:27:35.793 7f6151ffb700 -1 received  signal: Terminated from /usr/bin/python /bin/daemon-helper term ceph-mgr -f --cluster ceph -i x  (PID: 20150) UID: 0
2018-04-28T18:27:35.790 INFO:tasks.ceph.mgr.x.smithi047.stderr:2018-04-28 18:27:35.793 7f6151ffb700 -1 mgr handle_signal *** Got signal Terminated ***
2018-04-28T18:27:35.837 INFO:tasks.ceph.mgr.x:Stopped
2018-04-28T18:27:35.838 INFO:teuthology.misc:Shutting down mon daemons...
2018-04-28T18:27:35.838 DEBUG:tasks.ceph.mon.a:waiting for process to exit
2018-04-28T18:27:35.838 INFO:teuthology.orchestra.run:waiting for 300
2018-04-28T18:27:35.874 INFO:tasks.ceph.mon.a.smithi080.stderr:2018-04-28 18:27:35.847 1545a700 -1 received  signal: Terminated from /usr/bin/python /bin/daemon-helper term valgrind --trace-children=no --child-silent-after-fork=yes --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/mon.a.log --time-stamp=yes --tool=memcheck --leak-check=full --show-reachable=yes ceph-mon -f --cluster ceph -i a  (PID: 20265) UID: 0
2018-04-28T18:27:35.878 INFO:tasks.ceph.mon.a.smithi080.stderr:2018-04-28 18:27:35.850 1545a700 -1 mon.a@1(peon) e1 *** Got Signal Terminated ***
2018-04-28T18:29:13.470 INFO:tasks.ceph.mon.a:Stopped
2018-04-28T18:29:13.471 DEBUG:tasks.ceph.mon.c:waiting for process to exit
2018-04-28T18:29:13.471 INFO:teuthology.orchestra.run:waiting for 300
2018-04-28T18:29:13.508 INFO:tasks.ceph.mon.c.smithi080.stderr:2018-04-28 18:29:13.480 1545a700 -1 received  signal: Terminated from /usr/bin/python /bin/daemon-helper term valgrind --trace-children=no --child-silent-after-fork=yes --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/mon.c.log --time-stamp=yes --tool=memcheck --leak-check=full --show-reachable=yes ceph-mon -f --cluster ceph -i c  (PID: 20266) UID: 0
2018-04-28T18:29:13.512 INFO:tasks.ceph.mon.c.smithi080.stderr:2018-04-28 18:29:13.483 1545a700 -1 mon.c@2(peon) e1 *** Got Signal Terminated ***
2018-04-28T18:29:19.573 INFO:tasks.ceph.mon.c:Stopped
2018-04-28T18:29:19.574 DEBUG:tasks.ceph.mon.b:waiting for process to exit
2018-04-28T18:29:19.574 INFO:teuthology.orchestra.run:waiting for 300
2018-04-28T18:29:19.598 INFO:tasks.ceph.mon.b.smithi047.stderr:2018-04-28 18:29:19.589 1545a700 -1 received  signal: Terminated from /usr/bin/python /bin/daemon-helper term valgrind --trace-children=no --child-silent-after-fork=yes --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/mon.b.log --time-stamp=yes --tool=memcheck --leak-check=full --show-reachable=yes ceph-mon -f --cluster ceph -i b  (PID: 20142) UID: 0
2018-04-28T18:29:19.604 INFO:tasks.ceph.mon.b.smithi047.stderr:2018-04-28 18:29:19.592 1545a700 -1 mon.b@0(leader) e1 *** Got Signal Terminated ***
2018-04-28T18:29:25.676 INFO:tasks.ceph.mon.b:Stopped

(Note the odd ~2 minute turnaround for stopping mon.a)

From: http://pulpito.ceph.com/pdonnell-2018-04-28_06:27:06-multimds-wip-pdonnell-testing-20180428.041811-testing-basic-smithi/2450419/

Not a huge deal and we could silence it with a log whitelist. But, is there a better way to ignore this during shutdown?

Actions

Also available in: Atom PDF