Project

General

Profile

Bug #19585

monitors dont come up after reboot on xenial

Added by Vasu Kulkarni 5 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Immediate
Assignee:
-
Category:
-
Target version:
-
Start date:
04/11/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No

Description

1) setup ceph cluster with ceph-deploy
2) reboot, mon doesn't comeup but other daemons have no issue

http://qa-proxy.ceph.com/teuthology/vasu-2017-04-11_21:01:29-smoke-master---basic-vps/1013452/teuthology.log

2017-04-11T21:26:42.311 DEBUG:teuthology.misc:waited 2.28795719147
2017-04-11T21:26:43.312 INFO:teuthology.misc:trying to connect to ubuntu@vpm175.front.sepia.ceph.com
2017-04-11T21:26:43.313 DEBUG:teuthology.orchestra.connection:{'username': 'ubuntu', 'hostname': 'vpm175.front.sepia.ceph.com', 'key_filename': ['/home/teuthworker/.ssh/id_rsa'], 'timeout': 60}
2017-04-11T21:26:43.464 INFO:teuthology.orchestra.run.vpm175:Running: 'true'
2017-04-11T21:26:43.688 DEBUG:teuthology.misc:waited 3.66490721703
2017-04-11T21:26:44.690 INFO:teuthology.orchestra.run.vpm017:Running: 'sudo ps -eaf | grep ceph'
2017-04-11T21:26:44.741 INFO:teuthology.orchestra.run.vpm017.stdout:ceph      4139     1  0 21:25 ?        00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph
2017-04-11T21:26:44.741 INFO:teuthology.orchestra.run.vpm017.stdout:ceph      4883     1  0 21:25 ?        00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph
2017-04-11T21:26:44.741 INFO:teuthology.orchestra.run.vpm017.stdout:ubuntu    7094  7092  0 21:26 ?        00:00:00 bash -c sudo ps -eaf | grep ceph
2017-04-11T21:26:44.741 INFO:teuthology.orchestra.run.vpm017.stdout:ubuntu    7096  7094  0 21:26 ?        00:00:00 grep ceph
2017-04-11T21:26:44.741 INFO:teuthology.orchestra.run.vpm057:Running: 'sudo ps -eaf | grep ceph'
2017-04-11T21:26:44.821 INFO:teuthology.orchestra.run.vpm057.stdout:ubuntu    6408  6406  0 21:26 ?        00:00:00 bash -c sudo ps -eaf | grep ceph
2017-04-11T21:26:44.822 INFO:teuthology.orchestra.run.vpm057.stdout:ubuntu    6410  6408  0 21:26 ?        00:00:00 grep ceph
2017-04-11T21:26:44.822 INFO:teuthology.orchestra.run.vpm175:Running: 'sudo ps -eaf | grep ceph'
2017-04-11T21:26:44.865 INFO:teuthology.orchestra.run.vpm175.stdout:ceph      3925     1  0 21:25 ?        00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph
2017-04-11T21:26:44.865 INFO:teuthology.orchestra.run.vpm175.stdout:ubuntu    7921  7919  0 21:26 ?        00:00:00 bash -c sudo ps -eaf | grep ceph
2017-04-11T21:26:44.865 INFO:teuthology.orchestra.run.vpm175.stdout:ubuntu    7923  7921  0 21:26 ?        00:00:00 grep ceph
2017-04-11T21:26:44.865 INFO:teuthology.orchestra.run.vpm197:Running: 'sudo ps -eaf | grep ceph'
2017-04-11T21:26:44.918 INFO:teuthology.orchestra.run.vpm197.stdout:ceph      2522     1  0 21:25 ?        00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup ceph
2017-04-11T21:26:44.918 INFO:teuthology.orchestra.run.vpm197.stdout:ubuntu    6713  6711  0 21:26 ?        00:00:00 bash -c sudo ps -eaf | grep ceph
2017-04-11T21:26:44.918 INFO:teuthology.orchestra.run.vpm197.stdout:ubuntu    6715  6713  0 21:26 ?        00:00:00 grep ceph
2017-04-11T21:26:44.919 INFO:teuthology.orchestra.run.vpm057:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage sudo ceph --cluster ceph health'
2017-04-11T21:31:45.479 INFO:teuthology.misc.health.vpm057.stderr:2017-04-11 21:31:45.494424 7fe1756c5700  0 monclient(hunting): authenticate timed out after 300
2017-04-11T21:31:45.479 INFO:teuthology.misc.health.vpm057.stderr:2017-04-11 21:31:45.494466 7fe1756c5700  0 librados: client.admin authentication error (110) Connection timed out
2017-04-11T21:31:45.488 INFO:teuthology.misc.health.vpm057.stderr:error connecting to the cluster

History

#1 Updated by Марк Коренберг 5 months ago

Exactly the same with Kraken and debian 8.

#2 Updated by Sage Weil 5 months ago

  • Project changed from devops to Ceph
  • Priority changed from High to Urgent

#3 Updated by Sage Weil 3 months ago

  • Status changed from New to Verified
  • Priority changed from Urgent to Immediate

confirmed this is a problem;

- ceph-deploy does systemctl enable ceph-mon@$hostname
- after reboot i see

root@smithi203:~# systemctl | grep ceph
  ceph.target                                                                              loaded active active    ceph target allowing to start/stop all ceph*@.service instances at once
root@smithi203:~# systemctl status ceph-mon@smithi203
* ceph-mon@smithi203.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
   Active: inactive (dead)
root@smithi203:~# systemctl enable ceph-mon@smithi203
root@smithi203:~# systemctl status ceph-mon@smithi203
* ceph-mon@smithi203.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
   Active: inactive (dead)
root@smithi203:~# systemctl start ceph-mon@smithi203
root@smithi203:~# systemctl status ceph-mon@smithi203
* ceph-mon@smithi203.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2017-06-28 19:27:46 UTC; 1s ago
 Main PID: 6093 (ceph-mon)
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@smithi203.service
           `-6093 /usr/bin/ceph-mon -f --cluster ceph --id smithi203 --setuser ceph --setgroup ceph

Jun 28 19:27:46 smithi203 systemd[1]: Started Ceph cluster monitor daemon.

- if i reboot it is the same (enabled but not started)

??

#4 Updated by Kefu Chai 2 months ago

# /usr/bin/ceph-mon -f --cluster ceph --id smithi203 --setuser ceph --setgroup ceph
Corruption: VersionEdit: unknown tag
2017-07-06 15:20:55.802529 7f5df0490700 -1 error opening mon data directory at '/var/lib/ceph/mon/ceph-smithi203': (22) Invalid argument

seems the monstore of mon.smithi203 is corrupted.

#5 Updated by Sage Weil 2 months ago

on ceph-deploy node, i just did

  502  rm ceph.*
  503  ./ceph-deploy new smithi203
  504  ./ceph-deploy purge smithi203
  505  ./ceph-deploy purgedata smithi203
  506  ./ceph-deploy purge smithi203
  507  ./ceph-deploy install --dev master smithi203
  508  ./ceph-deploy mon create-initial
  509  ./ceph-deploy admin smithi203

and after that on smithi203 i see
root@smithi203:~# systemctl status ceph-mon@smithi203
* ceph-mon@smithi203.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2017-07-07 03:04:03 UTC; 51s ago
 Main PID: 16140 (ceph-mon)
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@smithi203.service
           `-16140 /usr/bin/ceph-mon -f --cluster ceph --id smithi203 --setuser ceph --setgroup ceph

Jul 07 03:04:03 smithi203 systemd[1]: Started Ceph cluster monitor daemon.
root@smithi203:~# reboot

after reboot,
ps axubuntu@smithi203:~$ systemctl | grep ceph
  ceph.target                                                                              loaded active active    ceph target allowing to start/stop all ceph*@.service instances at once
ubuntu@smithi203:~$ sudo bash
root@smithi203:~# systemctl | grep ceph
  ceph.target                                                                              loaded active active    ceph target allowing to start/stop all ceph*@.service instances at once
root@smithi203:~# systemctl status ceph.target
* ceph.target - ceph target allowing to start/stop all ceph*@.service instances at once
   Loaded: loaded (/lib/systemd/system/ceph.target; enabled; vendor preset: enabled)
   Active: active since Fri 2017-07-07 03:06:35 UTC; 21s ago

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
root@smithi203:~# systemctl status ceph-mon@smithi203
* ceph-mon@smithi203.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
   Active: inactive (dead)
root@smithi203:~# systemctl status ceph-mon.target
* ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once
   Loaded: loaded (/lib/systemd/system/ceph-mon.target; disabled; vendor preset: enabled)
   Active: inactive (dead)
root@smithi203:~# systemctl start ceph.target
root@smithi203:~# systemctl status ceph-mon.target
* ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once
   Loaded: loaded (/lib/systemd/system/ceph-mon.target; disabled; vendor preset: enabled)
   Active: inactive (dead)
root@smithi203:~# systemctl status ceph-mon@smithi203
* ceph-mon@smithi203.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
   Active: inactive (dead)
root@smithi203:~# ps ax|grep ceph-
 5592 pts/0    S+     0:00 grep --color=auto ceph-
root@smithi203:~# systemctl restart ceph.target
root@smithi203:~# ps ax|grep ceph-
root@smithi203:~# ps ax|grep ceph-
 5598 pts/0    S+     0:00 grep --color=auto ceph-
root@smithi203:~# systemctl status ceph-mon@smithi203
* ceph-mon@smithi203.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
   Active: inactive (dead)
root@smithi203:~# systemctl status ceph.target
* ceph.target - ceph target allowing to start/stop all ceph*@.service instances at once
   Loaded: loaded (/lib/systemd/system/ceph.target; enabled; vendor preset: enabled)
   Active: active since Fri 2017-07-07 03:07:39 UTC; 8s ago

Jul 07 03:07:39 smithi203 systemd[1]: Stopped target ceph target allowing to start/stop all ceph*@.service instances at once.
Jul 07 03:07:39 smithi203 systemd[1]: Stopping ceph target allowing to start/stop all ceph*@.service instances at once.
Jul 07 03:07:39 smithi203 systemd[1]: Reached target ceph target allowing to start/stop all ceph*@.service instances at once.
root@smithi203:~# systemctl status ceph-mon.target
* ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once
   Loaded: loaded (/lib/systemd/system/ceph-mon.target; disabled; vendor preset: enabled)
   Active: inactive (dead)
root@smithi203:~# systemctl restart ceph-mon.target
root@smithi203:~# systemctl status ceph-mon.target
* ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once
   Loaded: loaded (/lib/systemd/system/ceph-mon.target; disabled; vendor preset: enabled)
   Active: active since Fri 2017-07-07 03:07:56 UTC; 1s ago

Jul 07 03:07:56 smithi203 systemd[1]: Reached target ceph target allowing to start/stop all ceph-mon@.service instances at once.
root@smithi203:~# systemctl status ceph-mon@smithi203
* ceph-mon@smithi203.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2017-07-07 03:07:56 UTC; 7s ago
 Main PID: 5608 (ceph-mon)
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@smithi203.service
           `-5608 /usr/bin/ceph-mon -f --cluster ceph --id smithi203 --setuser ceph --setgroup ceph

Jul 07 03:07:56 smithi203 systemd[1]: Started Ceph cluster monitor daemon.
root@smithi203:~# ps ax|grep cpeh-
 5650 pts/0    S+     0:00 grep --color=auto cpeh-
root@smithi203:~# ps ax|grep ceph-
 5608 ?        Ssl    0:00 /usr/bin/ceph-mon -f --cluster ceph --id smithi203 --setuser ceph --setgroup ceph
 5652 pts/0    S+     0:00 grep --color=auto ceph-

#6 Updated by Sage Weil 2 months ago

  • Status changed from Verified to Testing

#7 Updated by Sage Weil 2 months ago

  • Status changed from Testing to Resolved

#8 Updated by Kefu Chai 2 months ago

see https://wiki.debian.org/Teams/pkg-systemd/Packaging#Using_debhelper_with_dh_systemd

New in debhelper compat 10 is that dh-systemd is now automatically enabled if you're using dh sequencer!
3. If you are using plain debhelper, make sure to run dh_systemd_enable before dh_installinit and dh_systemd_start after dh_installinit

and trusty comes with debhelper 9.20131227ubuntu1, once we can drop the support of trusty, we will be able to remove this fix from d/rules.

Also available in: Atom PDF