Project

General

Profile

Actions

Bug #20171

closed

'ceph-create-keys[229798]: INFO:ceph-create-keys:ceph-mon admin socket not ready yet. ' in syslog

Added by Sage Weil almost 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description


Files

misc.log.gz (59.7 KB) misc.log.gz Nathan Cutler, 06/20/2017 08:11 AM
Actions #2

Updated by Nathan Cutler almost 7 years ago

This is also showing up (in the rgw suite) in jewel 10.2.8 integration testing: http://tracker.ceph.com/issues/19538#note-61

Actions #3

Updated by Nathan Cutler almost 7 years ago

This failure is plaguing the jewel nightlies something awful. Here is an example: http://qa-proxy.ceph.com/teuthology/teuthology-2017-06-19_02:00:02-rados-jewel-distro-basic-smithi/1299602/teuthology.log

There is no mds specified in the roles yaml, yet it seems that something is repeatedly trying (and failing) to start ceph-mds and logging messages like this:

2017-06-19T02:10:11.934493+00:00 smithi097 systemd[1]: Started Ceph metadata server daemon.
2017-06-19T02:10:11.934828+00:00 smithi097 systemd[1]: Reached target ceph target allowing to start/stop all ceph-mds@.service instances at once.
2017-06-19T02:10:11.967845+00:00 smithi097 ceph-mds[25452]: 2017-06-19 02:10:11.967680 7fc297b6e200 -1 did not load config file, using default settings.
2017-06-19T02:10:11.968772+00:00 smithi097 ceph-mds[25452]: starting mds.smithi097 at :/0
2017-06-19T02:10:11.970751+00:00 smithi097 ceph-mds[25452]: no monitors specified to connect to.
2017-06-19T02:10:11.990611+00:00 smithi097 systemd[1]: ceph-mds@smithi097.service: Main process exited, code=exited, status=255/n/a
2017-06-19T02:10:11.990928+00:00 smithi097 systemd[1]: ceph-mds@smithi097.service: Unit entered failed state.
2017-06-19T02:10:11.991108+00:00 smithi097 systemd[1]: ceph-mds@smithi097.service: Failed with result 'exit-code'.
2017-06-19T02:10:12.300674+00:00 smithi097 ceph-create-keys[25195]: admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
2017-06-19T02:10:12.307489+00:00 smithi097 ceph-create-keys[25195]: INFO:ceph-create-keys:ceph-mon admin socket not ready yet.
2017-06-19T02:10:12.318338+00:00 smithi097 systemd[1]: ceph-mds@smithi097.service: Service hold-off time over, scheduling restart.
2017-06-19T02:10:12.318737+00:00 smithi097 systemd[1]: Stopped Ceph metadata server daemon.
2017-06-19T02:10:12.334574+00:00 smithi097 systemd[1]: Started Ceph metadata server daemon.
2017-06-19T02:10:12.370851+00:00 smithi097 ceph-mds[25464]: 2017-06-19 02:10:12.370690 7f0fa41ef200 -1 did not load config file, using default settings.
2017-06-19T02:10:12.371836+00:00 smithi097 ceph-mds[25464]: starting mds.smithi097 at :/0
2017-06-19T02:10:12.373765+00:00 smithi097 ceph-mds[25464]: no monitors specified to connect to.

I'm attaching the full misc.log.gz

Actions #4

Updated by Nathan Cutler almost 7 years ago

The first instance of the error happens after ceph-mon@smithi097 fails to start:

2017-06-19T02:09:59.920684+00:00 smithi097 systemd[1]: Reached target ceph target allowing to start/stop all ceph-mds@.service instances at once.
2017-06-19T02:09:59.920858+00:00 smithi097 systemd[1]: Reached target ceph target allowing to start/stop all ceph*@.service instances at once.
2017-06-19T02:10:00.004050+00:00 smithi097 ceph-mds[25199]: 2017-06-19 02:10:00.003926 7f26bbb4f200 -1 did not load config file, using default settings.
2017-06-19T02:10:00.004875+00:00 smithi097 ceph-mds[25199]: starting mds.smithi097 at :/0
2017-06-19T02:10:00.005982+00:00 smithi097 ceph-mon[25197]: 2017-06-19 02:10:00.005777 7fb5f4af1700 -1 did not load config file, using default settings.
2017-06-19T02:10:00.006359+00:00 smithi097 ceph-mon[25197]: monitor data directory at '/var/lib/ceph/mon/ceph-smithi097' does not exist: have you run 'mkfs'?
2017-06-19T02:10:00.026394+00:00 smithi097 systemd[1]: ceph-mon@smithi097.service: Main process exited, code=exited, status=1/FAILURE
2017-06-19T02:10:00.026629+00:00 smithi097 systemd[1]: ceph-mon@smithi097.service: Unit entered failed state.
2017-06-19T02:10:00.026714+00:00 smithi097 systemd[1]: ceph-mon@smithi097.service: Failed with result 'exit-code'.
2017-06-19T02:10:00.421705+00:00 smithi097 ceph-create-keys[25195]: admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
2017-06-19T02:10:00.430523+00:00 smithi097 ceph-create-keys[25195]: INFO:ceph-create-keys:ceph-mon admin socket not ready yet.
Actions #5

Updated by Nathan Cutler almost 7 years ago

The test has two nodes, one running Ubuntu 16.04 (smithi097) and the other (smithi035) running CentOS. The "ceph-create-keys" syslog messages occur only on the Ubuntu 16.04 node.

Actions #6

Updated by Nathan Cutler almost 7 years ago

On the CentOS node, ceph-mds.target is started by %post when the ceph-mds RPM is installed, but no ceph-mds@.service unit is started and hence the ceph-mds daemon is not started, either.

Contrast this with Xenial, which does [ -x /sbin/start ] && start ceph-mds-all || : in ceph-mds.postinst . . . apparently "start ceph-mds-all" starts ceph-mds@.service, which starts the ceph-mds daemon.

Anyway, the ceph-create-keys message will not appear in anything later than jewel because of https://github.com/ceph/ceph/pull/9345

Actions #7

Updated by Nathan Cutler almost 7 years ago

In IRC, Kefu suggested that "another way to fix/workaround this is to prevent the postinst script from starting ceph-mds"

I agree that would resolve this particular issue, but I'd be concerned that users might be relying on this "autostart on install" behavior.

Actions #8

Updated by Kefu Chai almost 7 years ago

diff --git a/debian/rules b/debian/rules
index d4b3c97f71..2fbf419f92 100755
--- a/debian/rules
+++ b/debian/rules
@@ -132,7 +132,9 @@ override_dh_installinit:
        dh_installinit -p radosgw --no-start

 override_dh_systemd_start:
-       dh_systemd_start --no-restart-on-upgrade
+       dh_systemd_start -p ceph-mds --no-restart-on-upgrade --no-start
+       dh_systemd_start --remaining-packages --no-restart-on-upgrade

but ceph-create-keys is depended by ceph-mon@.service not ceph-mds@.service. but ceph-create-keys expects a running cluster. sounds like a egg-and-hen problem.

Actions #9

Updated by Kefu Chai almost 7 years ago

Nathan, i just wanted to raise the intention that we might need to patch d/rules also.

Actions #10

Updated by Kefu Chai almost 7 years ago

  • Status changed from 12 to Fix Under Review
Actions #11

Updated by Kefu Chai almost 7 years ago

  • Project changed from Ceph to teuthology
Actions #12

Updated by Nathan Cutler almost 7 years ago

I think Kefu nailed it. Before, the logs said:

2017-06-22T01:18:47.970740+00:00 smithi097 ceph-create-keys: INFO:ceph-create-keys:ceph-mon admin socket not ready yet.

and these were ignored by teuthology via grep -v 'ceph-create-keys: INFO'

But then Xenial syslog started inserting the PID like this:

2017-06-22T01:18:47.970740+00:00 smithi097 ceph-create-keys[25231]: INFO:ceph-create-keys:ceph-mon admin socket not ready yet.

and the grep no longer matched.

Actions #13

Updated by Kefu Chai almost 7 years ago

  • Status changed from Fix Under Review to Resolved
  • Assignee set to Kefu Chai
Actions

Also available in: Atom PDF