Bug #20171
closed'ceph-create-keys[229798]: INFO:ceph-create-keys:ceph-mon admin socket not ready yet. ' in syslog
0%
Description
http://pulpito.ceph.com/sage-2017-06-02_16:49:20-upgrade:jewel-x-master---basic-smithi/1256758
http://pulpito.ceph.com/sage-2017-06-02_16:49:20-upgrade:jewel-x-master---basic-smithi/1256764
zgrep ceph-create-keys remote/*/syslog/misc.log* | wc -l
2470
seems to be X upgrade/jewel-x/parallel when run on xenial
Files
Updated by Yuri Weinstein almost 7 years ago
Updated by Nathan Cutler almost 7 years ago
This is also showing up (in the rgw suite) in jewel 10.2.8 integration testing: http://tracker.ceph.com/issues/19538#note-61
Updated by Nathan Cutler almost 7 years ago
- File misc.log.gz misc.log.gz added
This failure is plaguing the jewel nightlies something awful. Here is an example: http://qa-proxy.ceph.com/teuthology/teuthology-2017-06-19_02:00:02-rados-jewel-distro-basic-smithi/1299602/teuthology.log
There is no mds specified in the roles yaml, yet it seems that something is repeatedly trying (and failing) to start ceph-mds and logging messages like this:
2017-06-19T02:10:11.934493+00:00 smithi097 systemd[1]: Started Ceph metadata server daemon. 2017-06-19T02:10:11.934828+00:00 smithi097 systemd[1]: Reached target ceph target allowing to start/stop all ceph-mds@.service instances at once. 2017-06-19T02:10:11.967845+00:00 smithi097 ceph-mds[25452]: 2017-06-19 02:10:11.967680 7fc297b6e200 -1 did not load config file, using default settings. 2017-06-19T02:10:11.968772+00:00 smithi097 ceph-mds[25452]: starting mds.smithi097 at :/0 2017-06-19T02:10:11.970751+00:00 smithi097 ceph-mds[25452]: no monitors specified to connect to. 2017-06-19T02:10:11.990611+00:00 smithi097 systemd[1]: ceph-mds@smithi097.service: Main process exited, code=exited, status=255/n/a 2017-06-19T02:10:11.990928+00:00 smithi097 systemd[1]: ceph-mds@smithi097.service: Unit entered failed state. 2017-06-19T02:10:11.991108+00:00 smithi097 systemd[1]: ceph-mds@smithi097.service: Failed with result 'exit-code'. 2017-06-19T02:10:12.300674+00:00 smithi097 ceph-create-keys[25195]: admin_socket: exception getting command descriptions: [Errno 2] No such file or directory 2017-06-19T02:10:12.307489+00:00 smithi097 ceph-create-keys[25195]: INFO:ceph-create-keys:ceph-mon admin socket not ready yet. 2017-06-19T02:10:12.318338+00:00 smithi097 systemd[1]: ceph-mds@smithi097.service: Service hold-off time over, scheduling restart. 2017-06-19T02:10:12.318737+00:00 smithi097 systemd[1]: Stopped Ceph metadata server daemon. 2017-06-19T02:10:12.334574+00:00 smithi097 systemd[1]: Started Ceph metadata server daemon. 2017-06-19T02:10:12.370851+00:00 smithi097 ceph-mds[25464]: 2017-06-19 02:10:12.370690 7f0fa41ef200 -1 did not load config file, using default settings. 2017-06-19T02:10:12.371836+00:00 smithi097 ceph-mds[25464]: starting mds.smithi097 at :/0 2017-06-19T02:10:12.373765+00:00 smithi097 ceph-mds[25464]: no monitors specified to connect to.
I'm attaching the full misc.log.gz
Updated by Nathan Cutler almost 7 years ago
The first instance of the error happens after ceph-mon@smithi097 fails to start:
2017-06-19T02:09:59.920684+00:00 smithi097 systemd[1]: Reached target ceph target allowing to start/stop all ceph-mds@.service instances at once. 2017-06-19T02:09:59.920858+00:00 smithi097 systemd[1]: Reached target ceph target allowing to start/stop all ceph*@.service instances at once. 2017-06-19T02:10:00.004050+00:00 smithi097 ceph-mds[25199]: 2017-06-19 02:10:00.003926 7f26bbb4f200 -1 did not load config file, using default settings. 2017-06-19T02:10:00.004875+00:00 smithi097 ceph-mds[25199]: starting mds.smithi097 at :/0 2017-06-19T02:10:00.005982+00:00 smithi097 ceph-mon[25197]: 2017-06-19 02:10:00.005777 7fb5f4af1700 -1 did not load config file, using default settings. 2017-06-19T02:10:00.006359+00:00 smithi097 ceph-mon[25197]: monitor data directory at '/var/lib/ceph/mon/ceph-smithi097' does not exist: have you run 'mkfs'? 2017-06-19T02:10:00.026394+00:00 smithi097 systemd[1]: ceph-mon@smithi097.service: Main process exited, code=exited, status=1/FAILURE 2017-06-19T02:10:00.026629+00:00 smithi097 systemd[1]: ceph-mon@smithi097.service: Unit entered failed state. 2017-06-19T02:10:00.026714+00:00 smithi097 systemd[1]: ceph-mon@smithi097.service: Failed with result 'exit-code'. 2017-06-19T02:10:00.421705+00:00 smithi097 ceph-create-keys[25195]: admin_socket: exception getting command descriptions: [Errno 2] No such file or directory 2017-06-19T02:10:00.430523+00:00 smithi097 ceph-create-keys[25195]: INFO:ceph-create-keys:ceph-mon admin socket not ready yet.
Updated by Nathan Cutler almost 7 years ago
The test has two nodes, one running Ubuntu 16.04 (smithi097) and the other (smithi035) running CentOS. The "ceph-create-keys" syslog messages occur only on the Ubuntu 16.04 node.
Updated by Nathan Cutler almost 7 years ago
On the CentOS node, ceph-mds.target is started by %post when the ceph-mds RPM is installed, but no ceph-mds@.service unit is started and hence the ceph-mds daemon is not started, either.
Contrast this with Xenial, which does [ -x /sbin/start ] && start ceph-mds-all || :
in ceph-mds.postinst . . . apparently "start ceph-mds-all" starts ceph-mds@.service, which starts the ceph-mds daemon.
Anyway, the ceph-create-keys message will not appear in anything later than jewel because of https://github.com/ceph/ceph/pull/9345
Updated by Nathan Cutler almost 7 years ago
In IRC, Kefu suggested that "another way to fix/workaround this is to prevent the postinst script from starting ceph-mds"
I agree that would resolve this particular issue, but I'd be concerned that users might be relying on this "autostart on install" behavior.
Updated by Kefu Chai almost 7 years ago
diff --git a/debian/rules b/debian/rules index d4b3c97f71..2fbf419f92 100755 --- a/debian/rules +++ b/debian/rules @@ -132,7 +132,9 @@ override_dh_installinit: dh_installinit -p radosgw --no-start override_dh_systemd_start: - dh_systemd_start --no-restart-on-upgrade + dh_systemd_start -p ceph-mds --no-restart-on-upgrade --no-start + dh_systemd_start --remaining-packages --no-restart-on-upgrade
but ceph-create-keys is depended by ceph-mon@.service not ceph-mds@.service. but ceph-create-keys expects a running cluster. sounds like a egg-and-hen problem.
Updated by Kefu Chai almost 7 years ago
Nathan, i just wanted to raise the intention that we might need to patch d/rules also.
Updated by Kefu Chai almost 7 years ago
- Status changed from 12 to Fix Under Review
Updated by Kefu Chai almost 7 years ago
- Project changed from Ceph to teuthology
Updated by Nathan Cutler almost 7 years ago
I think Kefu nailed it. Before, the logs said:
2017-06-22T01:18:47.970740+00:00 smithi097 ceph-create-keys: INFO:ceph-create-keys:ceph-mon admin socket not ready yet.
and these were ignored by teuthology via grep -v 'ceph-create-keys: INFO'
But then Xenial syslog started inserting the PID like this:
2017-06-22T01:18:47.970740+00:00 smithi097 ceph-create-keys[25231]: INFO:ceph-create-keys:ceph-mon admin socket not ready yet.
and the grep no longer matched.
Updated by Kefu Chai almost 7 years ago
- Status changed from Fix Under Review to Resolved
- Assignee set to Kefu Chai