Project

General

Profile

Actions

Bug #21941

closed

mgr systemd startup issue

Added by Vasu Kulkarni over 6 years ago. Updated over 6 years ago.

Status:
Rejected
Priority:
Low
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

1) I am running a test that uses systed to start daemon's instead of teuthology's old way of daemon-helper

cluster config:

2017-10-26T22:40:10.438 INFO:teuthology.task.internal:roles: ubuntu@ovh031.front.sepia.ceph.com - ['mon.a', 'mds.a', 'mgr.x', 'osd.0', 'osd.1']
2017-10-26T22:40:10.438 INFO:teuthology.task.internal:roles: ubuntu@ovh001.front.sepia.ceph.com - ['mon.b', 'mgr.y', 'osd.2', 'osd.3']
2017-10-26T22:40:10.439 INFO:teuthology.task.internal:roles: ubuntu@ovh092.front.sepia.ceph.com - ['mds.b', 'osd.4', 'osd.5']
2017-10-26T22:40:10.439 INFO:teuthology.task.internal:roles: ubuntu@ovh013.front.sepia.ceph.com - ['mon.c', 'mds.s']
2017-10-26T22:40:10.439 INFO:teuthology.task.internal:roles: ubuntu@ovh077.front.sepia.ceph.com - ['client.0']

2) The ceph code here starts all the daemon's and the status is shown as loaded/running.

2017-10-26T23:04:29.661 INFO:tasks.ceph.mon.a:Restarting daemon using systemd
2017-10-26T23:04:29.661 INFO:teuthology.orchestra.run.ovh031:Running: 'ps -ef | grep "ceph-mon.*--id a" | grep -v grep | awk {\'print $2\'}'
2017-10-26T23:04:29.823 INFO:tasks.ceph.mon.a:starting a non-running daemon
2017-10-26T23:04:29.823 INFO:teuthology.orchestra.run.ovh031:Running: 'sudo systemctl start ceph-mon@a'
2017-10-26T23:04:30.072 INFO:teuthology.orchestra.run.ovh031:Running: 'sudo systemctl show ceph-mon@a | grep -i state'
2017-10-26T23:04:30.201 INFO:teuthology.orchestra.run.ovh031.stdout:LoadState=loaded
2017-10-26T23:04:30.201 INFO:teuthology.orchestra.run.ovh031.stdout:ActiveState=active
2017-10-26T23:04:30.201 INFO:teuthology.orchestra.run.ovh031.stdout:SubState=running
2017-10-26T23:04:30.201 INFO:teuthology.orchestra.run.ovh031.stdout:UnitFileState=disabled
2017-10-26T23:04:30.202 INFO:teuthology.orchestra.run.ovh031.stdout:StateChangeTimestamp=Thu 2017-10-26 23:04:29 UTC
2017-10-26T23:04:30.202 INFO:teuthology.orchestra.run.ovh031.stdout:StateChangeTimestampMonotonic=1285209053
2017-10-26T23:04:30.202 INFO:teuthology.orchestra.run.ovh001:Running: 'which systemctl'
2017-10-26T23:04:30.310 INFO:teuthology.orchestra.run.ovh001.stdout:/bin/systemctl
2017-10-26T23:04:30.311 INFO:tasks.ceph.mon.b:Restarting daemon using systemd
2017-10-26T23:04:30.311 INFO:teuthology.orchestra.run.ovh001:Running: 'ps -ef | grep "ceph-mon.*--id b" | grep -v grep | awk {\'print $2\'}'
2017-10-26T23:04:30.502 INFO:tasks.ceph.mon.b:starting a non-running daemon
2017-10-26T23:04:30.503 INFO:teuthology.orchestra.run.ovh001:Running: 'sudo systemctl start ceph-mon@b'
2017-10-26T23:04:30.802 INFO:teuthology.orchestra.run.ovh001:Running: 'sudo systemctl show ceph-mon@b | grep -i state'
2017-10-26T23:04:30.974 INFO:teuthology.orchestra.run.ovh001.stdout:LoadState=loaded
2017-10-26T23:04:30.975 INFO:teuthology.orchestra.run.ovh001.stdout:ActiveState=active
2017-10-26T23:04:30.975 INFO:teuthology.orchestra.run.ovh001.stdout:SubState=running
2017-10-26T23:04:30.975 INFO:teuthology.orchestra.run.ovh001.stdout:UnitFileState=disabled
2017-10-26T23:04:30.975 INFO:teuthology.orchestra.run.ovh001.stdout:StateChangeTimestamp=Thu 2017-10-26 23:04:30 UTC
2017-10-26T23:04:30.975 INFO:teuthology.orchestra.run.ovh001.stdout:StateChangeTimestampMonotonic=1245133931
2017-10-26T23:04:30.976 INFO:teuthology.orchestra.run.ovh013:Running: 'which systemctl'
2017-10-26T23:04:31.082 INFO:teuthology.orchestra.run.ovh013.stdout:/bin/systemctl
2017-10-26T23:04:31.083 INFO:tasks.ceph.mon.c:Restarting daemon using systemd
2017-10-26T23:04:31.083 INFO:teuthology.orchestra.run.ovh013:Running: 'ps -ef | grep "ceph-mon.*--id c" | grep -v grep | awk {\'print $2\'}'
2017-10-26T23:04:31.261 INFO:tasks.ceph.mon.c:starting a non-running daemon
2017-10-26T23:04:31.262 INFO:teuthology.orchestra.run.ovh013:Running: 'sudo systemctl start ceph-mon@c'
2017-10-26T23:04:31.360 INFO:teuthology.orchestra.run.ovh001:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2017-10-26T23:04:31.462 INFO:teuthology.orchestra.run.ovh013:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2017-10-26T23:04:31.508 INFO:teuthology.orchestra.run.ovh013:Running: 'sudo systemctl show ceph-mon@c | grep -i state'
2017-10-26T23:04:31.584 INFO:teuthology.orchestra.run.ovh031:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2017-10-26T23:04:31.649 INFO:teuthology.orchestra.run.ovh013.stdout:LoadState=loaded
2017-10-26T23:04:31.650 INFO:teuthology.orchestra.run.ovh013.stdout:ActiveState=active
2017-10-26T23:04:31.650 INFO:teuthology.orchestra.run.ovh013.stdout:SubState=running
2017-10-26T23:04:31.650 INFO:teuthology.orchestra.run.ovh013.stdout:UnitFileState=disabled
2017-10-26T23:04:31.650 INFO:teuthology.orchestra.run.ovh013.stdout:StateChangeTimestamp=Thu 2017-10-26 23:04:31 UTC
2017-10-26T23:04:31.650 INFO:teuthology.orchestra.run.ovh013.stdout:StateChangeTimestampMonotonic=1265005766
2017-10-26T23:04:31.650 INFO:tasks.ceph:Starting mgr daemons in cluster ceph...
2017-10-26T23:04:31.651 INFO:tasks.ceph.mgr.x:Restarting daemon using systemd
2017-10-26T23:04:31.651 INFO:teuthology.orchestra.run.ovh031:Running: 'ps -ef | grep "ceph-mgr.*--id x" | grep -v grep | awk {\'print $2\'}'
2017-10-26T23:04:31.685 INFO:teuthology.orchestra.run.ovh077:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2017-10-26T23:04:31.760 INFO:teuthology.orchestra.run.ovh092:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2017-10-26T23:04:31.851 INFO:tasks.ceph.mgr.x:starting a non-running daemon
2017-10-26T23:04:31.852 INFO:teuthology.orchestra.run.ovh031:Running: 'sudo systemctl start ceph-mgr@x'
2017-10-26T23:04:32.018 INFO:teuthology.orchestra.run.ovh031:Running: 'sudo systemctl show ceph-mgr@x | grep -i state'
2017-10-26T23:04:32.190 INFO:teuthology.orchestra.run.ovh031.stdout:LoadState=loaded
2017-10-26T23:04:32.190 INFO:teuthology.orchestra.run.ovh031.stdout:ActiveState=active
2017-10-26T23:04:32.190 INFO:teuthology.orchestra.run.ovh031.stdout:SubState=running
2017-10-26T23:04:32.190 INFO:teuthology.orchestra.run.ovh031.stdout:UnitFileState=disabled
2017-10-26T23:04:32.190 INFO:teuthology.orchestra.run.ovh031.stdout:StateChangeTimestamp=Thu 2017-10-26 23:04:32 UTC
2017-10-26T23:04:32.190 INFO:teuthology.orchestra.run.ovh031.stdout:StateChangeTimestampMonotonic=1287403608
2017-10-26T23:04:32.191 INFO:tasks.ceph.mgr.y:Restarting daemon using systemd
2017-10-26T23:04:32.191 INFO:teuthology.orchestra.run.ovh001:Running: 'ps -ef | grep "ceph-mgr.*--id y" | grep -v grep | awk {\'print $2\'}'
2017-10-26T23:04:32.296 INFO:tasks.ceph.mgr.y:starting a non-running daemon
2017-10-26T23:04:32.296 INFO:teuthology.orchestra.run.ovh001:Running: 'sudo systemctl start ceph-mgr@y'
2017-10-26T23:04:32.457 INFO:teuthology.orchestra.run.ovh001:Running: 'sudo systemctl show ceph-mgr@y | grep -i state'
2017-10-26T23:04:32.621 INFO:teuthology.orchestra.run.ovh001.stdout:LoadState=loaded
2017-10-26T23:04:32.622 INFO:teuthology.orchestra.run.ovh001.stdout:ActiveState=active
2017-10-26T23:04:32.622 INFO:teuthology.orchestra.run.ovh001.stdout:SubState=running
2017-10-26T23:04:32.622 INFO:teuthology.orchestra.run.ovh001.stdout:UnitFileState=disabled
2017-10-26T23:04:32.622 INFO:teuthology.orchestra.run.ovh001.stdout:StateChangeTimestamp=Thu 2017-10-26 23:04:32 UTC
2017-10-26T23:04:32.622 INFO:teuthology.orchestra.run.ovh001.stdout:StateChangeTimestampMonotonic=1247047407
2017-10-26T23:04:32.622 INFO:tasks.ceph:Setting crush tunables to default

3. In the above point you can see the status from mgr is running, but the query shows no active mgr

2017-10-26T23:05:11.332 INFO:tasks.ceph.ceph_manager.ceph:waiting for mgr available
2017-10-26T23:05:11.333 INFO:teuthology.orchestra.run.ovh031:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph mgr dump --format=json'
2017-10-26T23:05:11.781 INFO:teuthology.orchestra.run.ovh031.stdout:
2017-10-26T23:05:11.781 INFO:teuthology.orchestra.run.ovh031.stdout:{"epoch":8,"active_gid":0,"active_name":"","active_addr":"-","available":false,"standbys":[],"modules":["restful","status"],"available_modules":[]}
2017-10-26T23:05:14.782 INFO:teuthology.orchestra.run.ovh031:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph mgr dump --format=json'
2017-10-26T23:05:15.220 INFO:teuthology.orchestra.run.ovh031.stdout:
2017-10-26T23:05:15.220 INFO:teuthology.orchestra.run.ovh031.stdout:{"epoch":8,"active_gid":0,"active_name":"","active_addr":"-","available":false,"standbys":[],"modules":["restful","status"],"available_modules":[]}
2017-10-26T23:05:18.222 INFO:teuthology.orchestra.run.ovh031:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph mgr dump --format=json'
2017-10-26T23:05:18.586 INFO:teuthology.orchestra.run.ovh031.stdout:
2017-10-26T23:05:18.586 INFO:teuthology.orchestra.run.ovh031.stdout:{"epoch":9,"active_gid":0,"active_name":"","active_addr":"-","available":false,"standbys":[],"modules":["restful","status"],"available_modules":[]}
2017-10-26T23:05:21.587 INFO:teuthology.orchestra.run.ovh031:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph mgr dump --format=json'
2017-10-26T23:05:21.983 INFO:teuthology.orchestra.run.ovh031.stdout:
2017-10-26T23:05:21.983 INFO:teuthology.orchestra.run.ovh031.stdout:{"epoch":10,"active_gid":0,"active_name":"","active_addr":"-","available":false,"standbys":[],"modules":["restful","status"],"available_modules":[]}
2017-10-26T23:05:24.984 INFO:teuthology.orchestra.run.ovh031:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph mgr dump --format=json'
2017-10-26T23:05:25.459 INFO:teuthology.orchestra.run.ovh031.stdout:
2017-10-26T23:05:25.459 INFO:teuthology.orchestra.run.ovh031.stdout:{"epoch":10,"active_gid":0,"active_name":"","active_addr":"-","available":false,"standbys":[],"modules":["restful","status"],"available_modules":[]}
2017-10-26T23:05:28.468 INFO:teuthology.orchestra.run.ovh031:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph mgr dump --format=json'
2017-10-26T23:05:28.869 INFO:teuthology.orchestra.run.ovh031.stdout:
2017-10-26T23:05:28.869 INFO:teuthology.orchestra.run.ovh031.stdout:{"epoch":11,"active_gid":0,"active_name":"","active_addr":"-","available":false,"standbys":[],"modules":["restful","status"],"available_modules":[]}
2017-10-26T23:05:31.870 INFO:teuthology.orchestra.run.ovh031:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph mgr dump --format=json'
2017-10-26T23:05:32.297 INFO:teuthology.orchestra.run.ovh031.stdout:
2017-10-26T23:05:32.298 INFO:teuthology.orchestra.run.ovh031.stdout:{"epoch":12,"active_gid":0,"active_name":"","active_addr":"-","available":false,"standbys":[],"modules":["restful","status"],"available_modules":[]}

All the other daemons are behaving nicely except the mgr, please let me know if its a mgr issue.

other daemons:

2017-10-26T23:04:55.735 DEBUG:teuthology.misc:6 of 6 OSDs are up
2017-10-26T23:04:55.735 INFO:tasks.ceph:Creating RBD pool
2017-10-26T23:04:55.735 INFO:teuthology.orchestra.run.ovh031:Running: 'sudo ceph --cluster ceph osd pool create rbd 8'
2017-10-26T23:04:56.846 INFO:teuthology.orchestra.run.ovh031.stderr:pool 'rbd' created
2017-10-26T23:04:56.876 INFO:teuthology.orchestra.run.ovh031:Running: 'sudo ceph --cluster ceph osd pool application enable rbd rbd --yes-i-really-mean-it'
2017-10-26T23:04:58.072 INFO:teuthology.orchestra.run.ovh031.stderr:enabled application 'rbd' on pool 'rbd'
2017-10-26T23:04:58.093 INFO:tasks.ceph:Setting up CephFS filesystem...

Full logs: http://qa-proxy.ceph.com/teuthology/vasu-2017-10-26_22:31:46-smoke:basic-master-testing-basic-ovh/1778000/teuthology.log

Actions #1

Updated by Vasu Kulkarni over 6 years ago

Checking the logs looks like probably something that needs to be fixed in ceph.py, checking further on that

2017-10-26 23:04:32.466 7fd7f44cc8c0 -1 auth: unable to find a keyring on /var/lib/ceph/mgr/ceph-y/keyring: (13) Permission denied

Actions #2

Updated by Vasu Kulkarni over 6 years ago

  • Priority changed from High to Low
Actions #3

Updated by Vasu Kulkarni over 6 years ago

  • Status changed from New to Rejected

was mgr/keyring/dir permission issue

Actions

Also available in: Atom PDF