Bug #19585
closed
monitors dont come up after reboot on xenial
Added by Vasu Kulkarni about 7 years ago.
Updated almost 7 years ago.
Description
1) setup ceph cluster with ceph-deploy
2) reboot, mon doesn't comeup but other daemons have no issue
http://qa-proxy.ceph.com/teuthology/vasu-2017-04-11_21:01:29-smoke-master---basic-vps/1013452/teuthology.log
2017-04-11T21:26:42.311 DEBUG:teuthology.misc:waited 2.28795719147
2017-04-11T21:26:43.312 INFO:teuthology.misc:trying to connect to ubuntu@vpm175.front.sepia.ceph.com
2017-04-11T21:26:43.313 DEBUG:teuthology.orchestra.connection:{'username': 'ubuntu', 'hostname': 'vpm175.front.sepia.ceph.com', 'key_filename': ['/home/teuthworker/.ssh/id_rsa'], 'timeout': 60}
2017-04-11T21:26:43.464 INFO:teuthology.orchestra.run.vpm175:Running: 'true'
2017-04-11T21:26:43.688 DEBUG:teuthology.misc:waited 3.66490721703
2017-04-11T21:26:44.690 INFO:teuthology.orchestra.run.vpm017:Running: 'sudo ps -eaf | grep ceph'
2017-04-11T21:26:44.741 INFO:teuthology.orchestra.run.vpm017.stdout:ceph 4139 1 0 21:25 ? 00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph
2017-04-11T21:26:44.741 INFO:teuthology.orchestra.run.vpm017.stdout:ceph 4883 1 0 21:25 ? 00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph
2017-04-11T21:26:44.741 INFO:teuthology.orchestra.run.vpm017.stdout:ubuntu 7094 7092 0 21:26 ? 00:00:00 bash -c sudo ps -eaf | grep ceph
2017-04-11T21:26:44.741 INFO:teuthology.orchestra.run.vpm017.stdout:ubuntu 7096 7094 0 21:26 ? 00:00:00 grep ceph
2017-04-11T21:26:44.741 INFO:teuthology.orchestra.run.vpm057:Running: 'sudo ps -eaf | grep ceph'
2017-04-11T21:26:44.821 INFO:teuthology.orchestra.run.vpm057.stdout:ubuntu 6408 6406 0 21:26 ? 00:00:00 bash -c sudo ps -eaf | grep ceph
2017-04-11T21:26:44.822 INFO:teuthology.orchestra.run.vpm057.stdout:ubuntu 6410 6408 0 21:26 ? 00:00:00 grep ceph
2017-04-11T21:26:44.822 INFO:teuthology.orchestra.run.vpm175:Running: 'sudo ps -eaf | grep ceph'
2017-04-11T21:26:44.865 INFO:teuthology.orchestra.run.vpm175.stdout:ceph 3925 1 0 21:25 ? 00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph
2017-04-11T21:26:44.865 INFO:teuthology.orchestra.run.vpm175.stdout:ubuntu 7921 7919 0 21:26 ? 00:00:00 bash -c sudo ps -eaf | grep ceph
2017-04-11T21:26:44.865 INFO:teuthology.orchestra.run.vpm175.stdout:ubuntu 7923 7921 0 21:26 ? 00:00:00 grep ceph
2017-04-11T21:26:44.865 INFO:teuthology.orchestra.run.vpm197:Running: 'sudo ps -eaf | grep ceph'
2017-04-11T21:26:44.918 INFO:teuthology.orchestra.run.vpm197.stdout:ceph 2522 1 0 21:25 ? 00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup ceph
2017-04-11T21:26:44.918 INFO:teuthology.orchestra.run.vpm197.stdout:ubuntu 6713 6711 0 21:26 ? 00:00:00 bash -c sudo ps -eaf | grep ceph
2017-04-11T21:26:44.918 INFO:teuthology.orchestra.run.vpm197.stdout:ubuntu 6715 6713 0 21:26 ? 00:00:00 grep ceph
2017-04-11T21:26:44.919 INFO:teuthology.orchestra.run.vpm057:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage sudo ceph --cluster ceph health'
2017-04-11T21:31:45.479 INFO:teuthology.misc.health.vpm057.stderr:2017-04-11 21:31:45.494424 7fe1756c5700 0 monclient(hunting): authenticate timed out after 300
2017-04-11T21:31:45.479 INFO:teuthology.misc.health.vpm057.stderr:2017-04-11 21:31:45.494466 7fe1756c5700 0 librados: client.admin authentication error (110) Connection timed out
2017-04-11T21:31:45.488 INFO:teuthology.misc.health.vpm057.stderr:error connecting to the cluster
Exactly the same with Kraken and debian 8.
- Project changed from devops to Ceph
- Priority changed from High to Urgent
- Status changed from New to 12
- Priority changed from Urgent to Immediate
confirmed this is a problem;
- ceph-deploy does systemctl enable ceph-mon@$hostname
- after reboot i see
root@smithi203:~# systemctl | grep ceph
ceph.target loaded active active ceph target allowing to start/stop all ceph*@.service instances at once
root@smithi203:~# systemctl status ceph-mon@smithi203
* ceph-mon@smithi203.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Active: inactive (dead)
root@smithi203:~# systemctl enable ceph-mon@smithi203
root@smithi203:~# systemctl status ceph-mon@smithi203
* ceph-mon@smithi203.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Active: inactive (dead)
root@smithi203:~# systemctl start ceph-mon@smithi203
root@smithi203:~# systemctl status ceph-mon@smithi203
* ceph-mon@smithi203.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2017-06-28 19:27:46 UTC; 1s ago
Main PID: 6093 (ceph-mon)
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@smithi203.service
`-6093 /usr/bin/ceph-mon -f --cluster ceph --id smithi203 --setuser ceph --setgroup ceph
Jun 28 19:27:46 smithi203 systemd[1]: Started Ceph cluster monitor daemon.
- if i reboot it is the same (enabled but not started)
??
# /usr/bin/ceph-mon -f --cluster ceph --id smithi203 --setuser ceph --setgroup ceph
Corruption: VersionEdit: unknown tag
2017-07-06 15:20:55.802529 7f5df0490700 -1 error opening mon data directory at '/var/lib/ceph/mon/ceph-smithi203': (22) Invalid argument
seems the monstore of mon.smithi203 is corrupted.
on ceph-deploy node, i just did
502 rm ceph.*
503 ./ceph-deploy new smithi203
504 ./ceph-deploy purge smithi203
505 ./ceph-deploy purgedata smithi203
506 ./ceph-deploy purge smithi203
507 ./ceph-deploy install --dev master smithi203
508 ./ceph-deploy mon create-initial
509 ./ceph-deploy admin smithi203
and after that on smithi203 i see
root@smithi203:~# systemctl status ceph-mon@smithi203
* ceph-mon@smithi203.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2017-07-07 03:04:03 UTC; 51s ago
Main PID: 16140 (ceph-mon)
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@smithi203.service
`-16140 /usr/bin/ceph-mon -f --cluster ceph --id smithi203 --setuser ceph --setgroup ceph
Jul 07 03:04:03 smithi203 systemd[1]: Started Ceph cluster monitor daemon.
root@smithi203:~# reboot
after reboot,
ps axubuntu@smithi203:~$ systemctl | grep ceph
ceph.target loaded active active ceph target allowing to start/stop all ceph*@.service instances at once
ubuntu@smithi203:~$ sudo bash
root@smithi203:~# systemctl | grep ceph
ceph.target loaded active active ceph target allowing to start/stop all ceph*@.service instances at once
root@smithi203:~# systemctl status ceph.target
* ceph.target - ceph target allowing to start/stop all ceph*@.service instances at once
Loaded: loaded (/lib/systemd/system/ceph.target; enabled; vendor preset: enabled)
Active: active since Fri 2017-07-07 03:06:35 UTC; 21s ago
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
root@smithi203:~# systemctl status ceph-mon@smithi203
* ceph-mon@smithi203.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Active: inactive (dead)
root@smithi203:~# systemctl status ceph-mon.target
* ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once
Loaded: loaded (/lib/systemd/system/ceph-mon.target; disabled; vendor preset: enabled)
Active: inactive (dead)
root@smithi203:~# systemctl start ceph.target
root@smithi203:~# systemctl status ceph-mon.target
* ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once
Loaded: loaded (/lib/systemd/system/ceph-mon.target; disabled; vendor preset: enabled)
Active: inactive (dead)
root@smithi203:~# systemctl status ceph-mon@smithi203
* ceph-mon@smithi203.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Active: inactive (dead)
root@smithi203:~# ps ax|grep ceph-
5592 pts/0 S+ 0:00 grep --color=auto ceph-
root@smithi203:~# systemctl restart ceph.target
root@smithi203:~# ps ax|grep ceph-
root@smithi203:~# ps ax|grep ceph-
5598 pts/0 S+ 0:00 grep --color=auto ceph-
root@smithi203:~# systemctl status ceph-mon@smithi203
* ceph-mon@smithi203.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Active: inactive (dead)
root@smithi203:~# systemctl status ceph.target
* ceph.target - ceph target allowing to start/stop all ceph*@.service instances at once
Loaded: loaded (/lib/systemd/system/ceph.target; enabled; vendor preset: enabled)
Active: active since Fri 2017-07-07 03:07:39 UTC; 8s ago
Jul 07 03:07:39 smithi203 systemd[1]: Stopped target ceph target allowing to start/stop all ceph*@.service instances at once.
Jul 07 03:07:39 smithi203 systemd[1]: Stopping ceph target allowing to start/stop all ceph*@.service instances at once.
Jul 07 03:07:39 smithi203 systemd[1]: Reached target ceph target allowing to start/stop all ceph*@.service instances at once.
root@smithi203:~# systemctl status ceph-mon.target
* ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once
Loaded: loaded (/lib/systemd/system/ceph-mon.target; disabled; vendor preset: enabled)
Active: inactive (dead)
root@smithi203:~# systemctl restart ceph-mon.target
root@smithi203:~# systemctl status ceph-mon.target
* ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once
Loaded: loaded (/lib/systemd/system/ceph-mon.target; disabled; vendor preset: enabled)
Active: active since Fri 2017-07-07 03:07:56 UTC; 1s ago
Jul 07 03:07:56 smithi203 systemd[1]: Reached target ceph target allowing to start/stop all ceph-mon@.service instances at once.
root@smithi203:~# systemctl status ceph-mon@smithi203
* ceph-mon@smithi203.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2017-07-07 03:07:56 UTC; 7s ago
Main PID: 5608 (ceph-mon)
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@smithi203.service
`-5608 /usr/bin/ceph-mon -f --cluster ceph --id smithi203 --setuser ceph --setgroup ceph
Jul 07 03:07:56 smithi203 systemd[1]: Started Ceph cluster monitor daemon.
root@smithi203:~# ps ax|grep cpeh-
5650 pts/0 S+ 0:00 grep --color=auto cpeh-
root@smithi203:~# ps ax|grep ceph-
5608 ? Ssl 0:00 /usr/bin/ceph-mon -f --cluster ceph --id smithi203 --setuser ceph --setgroup ceph
5652 pts/0 S+ 0:00 grep --color=auto ceph-
- Status changed from 12 to 7
- Status changed from 7 to Resolved
see https://wiki.debian.org/Teams/pkg-systemd/Packaging#Using_debhelper_with_dh_systemd
New in debhelper compat 10 is that dh-systemd is now automatically enabled if you're using dh sequencer!
3. If you are using plain debhelper, make sure to run dh_systemd_enable before dh_installinit and dh_systemd_start after dh_installinit
and trusty comes with debhelper 9.20131227ubuntu1, once we can drop the support of trusty, we will be able to remove this fix from d/rules.
Also available in: Atom
PDF