Bug #19585
closedmonitors dont come up after reboot on xenial
0%
Description
1) setup ceph cluster with ceph-deploy
2) reboot, mon doesn't comeup but other daemons have no issue
2017-04-11T21:26:42.311 DEBUG:teuthology.misc:waited 2.28795719147 2017-04-11T21:26:43.312 INFO:teuthology.misc:trying to connect to ubuntu@vpm175.front.sepia.ceph.com 2017-04-11T21:26:43.313 DEBUG:teuthology.orchestra.connection:{'username': 'ubuntu', 'hostname': 'vpm175.front.sepia.ceph.com', 'key_filename': ['/home/teuthworker/.ssh/id_rsa'], 'timeout': 60} 2017-04-11T21:26:43.464 INFO:teuthology.orchestra.run.vpm175:Running: 'true' 2017-04-11T21:26:43.688 DEBUG:teuthology.misc:waited 3.66490721703 2017-04-11T21:26:44.690 INFO:teuthology.orchestra.run.vpm017:Running: 'sudo ps -eaf | grep ceph' 2017-04-11T21:26:44.741 INFO:teuthology.orchestra.run.vpm017.stdout:ceph 4139 1 0 21:25 ? 00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph 2017-04-11T21:26:44.741 INFO:teuthology.orchestra.run.vpm017.stdout:ceph 4883 1 0 21:25 ? 00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph 2017-04-11T21:26:44.741 INFO:teuthology.orchestra.run.vpm017.stdout:ubuntu 7094 7092 0 21:26 ? 00:00:00 bash -c sudo ps -eaf | grep ceph 2017-04-11T21:26:44.741 INFO:teuthology.orchestra.run.vpm017.stdout:ubuntu 7096 7094 0 21:26 ? 00:00:00 grep ceph 2017-04-11T21:26:44.741 INFO:teuthology.orchestra.run.vpm057:Running: 'sudo ps -eaf | grep ceph' 2017-04-11T21:26:44.821 INFO:teuthology.orchestra.run.vpm057.stdout:ubuntu 6408 6406 0 21:26 ? 00:00:00 bash -c sudo ps -eaf | grep ceph 2017-04-11T21:26:44.822 INFO:teuthology.orchestra.run.vpm057.stdout:ubuntu 6410 6408 0 21:26 ? 00:00:00 grep ceph 2017-04-11T21:26:44.822 INFO:teuthology.orchestra.run.vpm175:Running: 'sudo ps -eaf | grep ceph' 2017-04-11T21:26:44.865 INFO:teuthology.orchestra.run.vpm175.stdout:ceph 3925 1 0 21:25 ? 00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph 2017-04-11T21:26:44.865 INFO:teuthology.orchestra.run.vpm175.stdout:ubuntu 7921 7919 0 21:26 ? 00:00:00 bash -c sudo ps -eaf | grep ceph 2017-04-11T21:26:44.865 INFO:teuthology.orchestra.run.vpm175.stdout:ubuntu 7923 7921 0 21:26 ? 00:00:00 grep ceph 2017-04-11T21:26:44.865 INFO:teuthology.orchestra.run.vpm197:Running: 'sudo ps -eaf | grep ceph' 2017-04-11T21:26:44.918 INFO:teuthology.orchestra.run.vpm197.stdout:ceph 2522 1 0 21:25 ? 00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup ceph 2017-04-11T21:26:44.918 INFO:teuthology.orchestra.run.vpm197.stdout:ubuntu 6713 6711 0 21:26 ? 00:00:00 bash -c sudo ps -eaf | grep ceph 2017-04-11T21:26:44.918 INFO:teuthology.orchestra.run.vpm197.stdout:ubuntu 6715 6713 0 21:26 ? 00:00:00 grep ceph 2017-04-11T21:26:44.919 INFO:teuthology.orchestra.run.vpm057:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage sudo ceph --cluster ceph health' 2017-04-11T21:31:45.479 INFO:teuthology.misc.health.vpm057.stderr:2017-04-11 21:31:45.494424 7fe1756c5700 0 monclient(hunting): authenticate timed out after 300 2017-04-11T21:31:45.479 INFO:teuthology.misc.health.vpm057.stderr:2017-04-11 21:31:45.494466 7fe1756c5700 0 librados: client.admin authentication error (110) Connection timed out 2017-04-11T21:31:45.488 INFO:teuthology.misc.health.vpm057.stderr:error connecting to the cluster
Updated by Марк Коренберг about 7 years ago
Exactly the same with Kraken and debian 8.
Updated by Sage Weil about 7 years ago
- Project changed from devops to Ceph
- Priority changed from High to Urgent
Updated by Sage Weil almost 7 years ago
- Status changed from New to 12
- Priority changed from Urgent to Immediate
confirmed this is a problem;
- ceph-deploy does systemctl enable ceph-mon@$hostname
- after reboot i see
root@smithi203:~# systemctl | grep ceph ceph.target loaded active active ceph target allowing to start/stop all ceph*@.service instances at once root@smithi203:~# systemctl status ceph-mon@smithi203 * ceph-mon@smithi203.service - Ceph cluster monitor daemon Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled) Active: inactive (dead) root@smithi203:~# systemctl enable ceph-mon@smithi203 root@smithi203:~# systemctl status ceph-mon@smithi203 * ceph-mon@smithi203.service - Ceph cluster monitor daemon Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled) Active: inactive (dead) root@smithi203:~# systemctl start ceph-mon@smithi203 root@smithi203:~# systemctl status ceph-mon@smithi203 * ceph-mon@smithi203.service - Ceph cluster monitor daemon Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2017-06-28 19:27:46 UTC; 1s ago Main PID: 6093 (ceph-mon) CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@smithi203.service `-6093 /usr/bin/ceph-mon -f --cluster ceph --id smithi203 --setuser ceph --setgroup ceph Jun 28 19:27:46 smithi203 systemd[1]: Started Ceph cluster monitor daemon.
- if i reboot it is the same (enabled but not started)
??
Updated by Kefu Chai almost 7 years ago
# /usr/bin/ceph-mon -f --cluster ceph --id smithi203 --setuser ceph --setgroup ceph Corruption: VersionEdit: unknown tag 2017-07-06 15:20:55.802529 7f5df0490700 -1 error opening mon data directory at '/var/lib/ceph/mon/ceph-smithi203': (22) Invalid argument
seems the monstore of mon.smithi203 is corrupted.
Updated by Sage Weil almost 7 years ago
on ceph-deploy node, i just did
502 rm ceph.* 503 ./ceph-deploy new smithi203 504 ./ceph-deploy purge smithi203 505 ./ceph-deploy purgedata smithi203 506 ./ceph-deploy purge smithi203 507 ./ceph-deploy install --dev master smithi203 508 ./ceph-deploy mon create-initial 509 ./ceph-deploy admin smithi203
and after that on smithi203 i see
root@smithi203:~# systemctl status ceph-mon@smithi203 * ceph-mon@smithi203.service - Ceph cluster monitor daemon Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2017-07-07 03:04:03 UTC; 51s ago Main PID: 16140 (ceph-mon) CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@smithi203.service `-16140 /usr/bin/ceph-mon -f --cluster ceph --id smithi203 --setuser ceph --setgroup ceph Jul 07 03:04:03 smithi203 systemd[1]: Started Ceph cluster monitor daemon. root@smithi203:~# reboot
after reboot,
ps axubuntu@smithi203:~$ systemctl | grep ceph ceph.target loaded active active ceph target allowing to start/stop all ceph*@.service instances at once ubuntu@smithi203:~$ sudo bash root@smithi203:~# systemctl | grep ceph ceph.target loaded active active ceph target allowing to start/stop all ceph*@.service instances at once root@smithi203:~# systemctl status ceph.target * ceph.target - ceph target allowing to start/stop all ceph*@.service instances at once Loaded: loaded (/lib/systemd/system/ceph.target; enabled; vendor preset: enabled) Active: active since Fri 2017-07-07 03:06:35 UTC; 21s ago Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable. root@smithi203:~# systemctl status ceph-mon@smithi203 * ceph-mon@smithi203.service - Ceph cluster monitor daemon Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled) Active: inactive (dead) root@smithi203:~# systemctl status ceph-mon.target * ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once Loaded: loaded (/lib/systemd/system/ceph-mon.target; disabled; vendor preset: enabled) Active: inactive (dead) root@smithi203:~# systemctl start ceph.target root@smithi203:~# systemctl status ceph-mon.target * ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once Loaded: loaded (/lib/systemd/system/ceph-mon.target; disabled; vendor preset: enabled) Active: inactive (dead) root@smithi203:~# systemctl status ceph-mon@smithi203 * ceph-mon@smithi203.service - Ceph cluster monitor daemon Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled) Active: inactive (dead) root@smithi203:~# ps ax|grep ceph- 5592 pts/0 S+ 0:00 grep --color=auto ceph- root@smithi203:~# systemctl restart ceph.target root@smithi203:~# ps ax|grep ceph- root@smithi203:~# ps ax|grep ceph- 5598 pts/0 S+ 0:00 grep --color=auto ceph- root@smithi203:~# systemctl status ceph-mon@smithi203 * ceph-mon@smithi203.service - Ceph cluster monitor daemon Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled) Active: inactive (dead) root@smithi203:~# systemctl status ceph.target * ceph.target - ceph target allowing to start/stop all ceph*@.service instances at once Loaded: loaded (/lib/systemd/system/ceph.target; enabled; vendor preset: enabled) Active: active since Fri 2017-07-07 03:07:39 UTC; 8s ago Jul 07 03:07:39 smithi203 systemd[1]: Stopped target ceph target allowing to start/stop all ceph*@.service instances at once. Jul 07 03:07:39 smithi203 systemd[1]: Stopping ceph target allowing to start/stop all ceph*@.service instances at once. Jul 07 03:07:39 smithi203 systemd[1]: Reached target ceph target allowing to start/stop all ceph*@.service instances at once. root@smithi203:~# systemctl status ceph-mon.target * ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once Loaded: loaded (/lib/systemd/system/ceph-mon.target; disabled; vendor preset: enabled) Active: inactive (dead) root@smithi203:~# systemctl restart ceph-mon.target root@smithi203:~# systemctl status ceph-mon.target * ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once Loaded: loaded (/lib/systemd/system/ceph-mon.target; disabled; vendor preset: enabled) Active: active since Fri 2017-07-07 03:07:56 UTC; 1s ago Jul 07 03:07:56 smithi203 systemd[1]: Reached target ceph target allowing to start/stop all ceph-mon@.service instances at once. root@smithi203:~# systemctl status ceph-mon@smithi203 * ceph-mon@smithi203.service - Ceph cluster monitor daemon Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2017-07-07 03:07:56 UTC; 7s ago Main PID: 5608 (ceph-mon) CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@smithi203.service `-5608 /usr/bin/ceph-mon -f --cluster ceph --id smithi203 --setuser ceph --setgroup ceph Jul 07 03:07:56 smithi203 systemd[1]: Started Ceph cluster monitor daemon. root@smithi203:~# ps ax|grep cpeh- 5650 pts/0 S+ 0:00 grep --color=auto cpeh- root@smithi203:~# ps ax|grep ceph- 5608 ? Ssl 0:00 /usr/bin/ceph-mon -f --cluster ceph --id smithi203 --setuser ceph --setgroup ceph 5652 pts/0 S+ 0:00 grep --color=auto ceph-
Updated by Sage Weil almost 7 years ago
- Status changed from 7 to Resolved
Updated by Kefu Chai almost 7 years ago
see https://wiki.debian.org/Teams/pkg-systemd/Packaging#Using_debhelper_with_dh_systemd
New in debhelper compat 10 is that dh-systemd is now automatically enabled if you're using dh sequencer!
3. If you are using plain debhelper, make sure to run dh_systemd_enable before dh_installinit and dh_systemd_start after dh_installinit
and trusty comes with debhelper 9.20131227ubuntu1, once we can drop the support of trusty, we will be able to remove this fix from d/rules.