Project

General

Profile

Bug #5369

fedora18: sysvinit doesn't start mon on reboot

Added by Sage Weil almost 11 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Sandon Van Ness
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

mon log indicates it can't bind to the ip, suggesting it is starting before the network. however, note that

### BEGIN INIT INFO
# Provides:          ceph
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Required-Start:    $remote_fs $named $network $time
# Required-Stop:     $remote_fs $named $network $time
# Short-Description: Start Ceph distributed file system daemons at boot time
# Description:       Enable Ceph distributed file system services.
### END INIT INFO

so... dunno.

History

#1 Updated by Sage Weil over 10 years ago

(06:25:53 PM) mbiebl: systemctl enable NetworkManager-wait-online.service

here's the full exhcnage:


(06:03:02 PM) Topic for #systemd set by poettering!~poetterin@tango.0pointer.de at 08:56:57 AM on 05/09/2013
(06:03:25 PM) sagelap: does systemd completely ignore the Required-Start: $network line in the LSB init section of sysvinit scripts?
(06:05:13 PM) ohsix: nope
(06:05:32 PM) ohsix: it's ordered after network.target, iirc systemd.special lists them
(06:07:50 PM) sagelap: ohsix: hmm, i have a fedora18 system with ' # Required-Start:    $remote_fs $named $network $time' and my daemon is failed to bind.  maybe it's an older systemd version?
(06:08:55 PM) ohsix: no, that stuff has been in there almost as it is from very early, what's 'failed to bind'? is it trying to use a specific address?
(06:09:11 PM) sagelap: yeah
(06:09:31 PM) sagelap: it's trying to bind to a specific port/address (the machines main ip)
(06:10:06 PM) ohsix: if you start it after the machine is totally booted does it work? if yes, what's systemctl status network.target say
(06:10:28 PM) sagelap: it's fine after a full boot, yeah.  checking..
(06:13:05 PM) sagelap: network.target - Network
(06:13:05 PM) sagelap:           Loaded: loaded (/usr/lib/systemd/system/network.target; static)
(06:13:05 PM) sagelap:           Active: active since Mon, 2013-07-01 18:11:57 PDT; 24s ago
(06:13:05 PM) sagelap:             Docs: man:systemd.special(7)
(06:14:12 PM) sagelap: network seems to only start right near the end of bootup (right around when network manager starts).  maybe this machine's networking is not configured in the normal way?  (i'm not usually a fedora user :/)
(06:14:33 PM) ohsix: dunno, don't use fedora and don't know what exceptions may apply with respect to that, there are lots of people from redhat here tho
(06:20:34 PM) mbiebl: sagelap: are you using NetworkManager?
(06:23:47 PM) sagelap: i guess so..  this is a headless machine that someone else installed in our lab, so i wouldn't have expected it, but the process is running
(06:24:11 PM) sagelap: the files in /etc/sysconfig/network-scripts/ifcfg-* define the static ips
(06:24:57 PM) mbiebl: if NetworkManager is in use, you should enable the NetworkManager-wait-online service
(06:25:14 PM) sagelap: aha, how is that done?
(06:25:53 PM) mbiebl: systemctl enable NetworkManager-wait-online.service
(06:26:56 PM) sagelap: and that essentially makes the $network conditional also wait for NM to be started?
(06:27:56 PM) mbiebl: that make NetworkManager pull the network.target
(06:27:57 PM) ohsix: nah, $network stays the same, but it makes network.target actually wait until -wait-online finishes
(06:28:05 PM) mbiebl: and run nm-online
(06:29:42 PM) heftig: i think nm-online shouldn't be before network.target, only before network-online.target
(06:30:37 PM) sagelap: hrm, didn't seem to work.
(06:32:00 PM) mbiebl: heftig: network-online is something pretty new
(06:32:51 PM) heftig: though i'm not sure whether NM provides a mechanism suitable for only network.target
(06:32:57 PM) sagelap: is there a systemctl command that will dump state so i can log it when teh sysvinit script runs?
(06:33:14 PM) heftig: for that i'd want to wait until all 'static' interfaces are configured
(06:33:31 PM) heftig: nm-online potentially waits for a http probe, which can take a long time
(06:33:49 PM) sagelap: (fwiw in this case i just want to wait for static ips)
(06:34:05 PM) mbiebl: heftig: it doesn't by default
(06:34:11 PM) mbiebl: only if explicitly configured
(06:34:15 PM) ohsix: ERR
(06:34:28 PM) ohsix: heh there's a sysctl to let services to bind to anything, just set that
(06:34:37 PM) heftig: 7.262s NetworkManager-wait-online.service
(06:35:07 PM) heftig: even without, that service takes the most time to start, by far
(06:35:11 PM) mbiebl: heftig: nm-online only waits until the device has acquired and ip address and routes etc are setup
(06:35:39 PM) ohsix: this was before 'ERR' but went to the wrong channel: can the service not have a unit written? can it be socket activated? can it bind *?
(06:36:23 PM) mbiebl: the connectivity check you are referring to, is off by default, as said
(06:36:29 PM) mbiebl: heftig: and yeah, you should only use NM-wait-online
(06:36:41 PM) mbiebl: if you have services, which don't deal with networking changes properly
(06:36:49 PM) ohsix: net.ipv4.ip_nonlocal_bind
(06:36:57 PM) heftig: mbiebl: that's sensible, then. still, i think network-online.target should imply a successful probe, so NM needs two waiting services
(06:37:17 PM) ohsix: 'online' is just as fuzzy as 'network' :<
(06:37:32 PM) ohsix: problems arise when you think it implies something it doesn't
(06:44:50 PM) mbiebl: heftig: too be honest, I actually don't know what the distinction between network.target and network-online.target is in newer releases
(06:45:06 PM) mbiebl: wasn't that introduced in v204 or so?
(06:45:29 PM) heftig: http://www.freedesktop.org/software/systemd/man/systemd.special.html#network-online.target
(06:46:04 PM) mbiebl: and as ohsix said, the concept of "network online" is a broken one anyway
(06:46:41 PM) heftig: mbiebl: i guess this means is that network-online.target should want NetworkManager-wait-online.service
(06:46:59 PM) heftig: so that it's only pulled in when a unit wants network-online.target
(06:47:05 PM) mbiebl: it's still fuzzy
(06:47:18 PM) heftig: unlike the network.target mechanism, where the dependencies are the other way around
(06:47:27 PM) ohsix: mbiebl: you're going to start making me feel bad when i annoy you!  :D
(06:48:53 PM) fdo-vcs: lennart master 1ee306e (26 files in 6 dirs) * machined: split out machine registration stuff from logind
(06:49:23 PM) ohsix: a service that needs a 'network' that strictly doesn't just start and not care usually implies being able to reach things they probably won't be able to, nothing can really help that but people keep making the same mistakes overloading the meaning of network access
(06:49:58 PM) ohsix: a seperate heartbeat target or something might be interesting, and NM-wait-online checking something on the internet is probably really close to everything, but still
(06:52:18 PM) ohsix: even what -wait-online might know relatively well within a few minutes of it happening it doesn't embody that it might not be true in even a few moments
(06:56:27 PM) jenkins-systemd: Project build-master build #1186: SUCCESS in 6 min 27 sec: http://systemd.getpantheon.com:8080/jenkins/job/build-master/1186/
(06:57:35 PM) SeoZ-work[AWAY] is now known as SeoZ-work
(06:58:35 PM) fdo-vcs: lennart master 51da82a TODO src/machine/machine-dbus.c * machined: fix bus path unescaping
(07:03:17 PM) jenkins-systemd: Project build-master build #1187: SUCCESS in 3 min 17 sec: http://systemd.getpantheon.com:8080/jenkins/job/build-master/1187/
(07:23:35 PM) diego1 is now known as diegoviola
(09:09:26 PM) The account has disconnected and you are no longer in this chat. You will automatically rejoin the chat when the account reconnects.
(07/02/2013 09:12:45 AM) The topic for #systemd is: Version 204 released! || http://www.freedesktop.org/software/systemd/systemd-204.tar.xz || http://www.freedesktop.org/wiki/Software/systemd || https://www.youtube.com/watch?v=9UnEV9SPuw8
(09:12:45 AM) Topic for #systemd set by poettering!~poetterin@tango.0pointer.de at 08:56:57 AM on 05/09/2013
(09:29:40 AM) DarylXian: In "/usr/lib/systemd/system/getty@.service" I see a 38.4K baud rate defined, " ExecStart=-/sbin/agetty --noclear %I 38400 linux".
(09:30:00 AM) DarylXian: Why is 38.4K set, and not, e.g., the typical 115.2K?
(09:30:43 AM) fcrozat is now known as fcrozat|

#2 Updated by Sage Weil over 10 years ago

  • Status changed from New to 4

this is a network.service vs NetworkManager problem.. the $network LSB line allegedly doesn't wait for networkmanager to start unless you do

systemctl enable NetworkManager-wait-online

...except that didn't work for me, and -wait-online doesn't appear in systemctl -a. instead i just switched to the regular network service,

  1. chkconfig network on
  2. chkconfig NetworkManager off
  3. service NetworkManager stop

and all is well.

so i don't think there is anything for us to fix here, except to make sure that our cloud-init images and/or ceph-qa-chef do this on fedora. maybe a faq?

#3 Updated by Sage Weil over 10 years ago

  • Assignee set to Sandon Van Ness

sandon, can you make sure our images or chef or whatever is updated to avoid this pitfall?

#4 Updated by Sandon Van Ness over 10 years ago

  • Status changed from 4 to Resolved

Doing a chkconfig --list wasn't listing the network manager so it threw me off a bit but it was indeed enabled. I tried disabling it using the chef-proper 'service' method but even though it said it was disabling it it was not so I ended up using execute to do it with chickconfig manually.

This is fixed in commit:

d5c9a70439bb9786ccf514b1d41e71db0cf7d79d

of ceph-qa-chef.

Also available in: Atom PDF