Project

General

Profile

Bug #7627

ceph-disk: does not start daemons properly under systemd

Added by Alfredo Deza over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
03/06/2014
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No

Description

After creating a cluster with ceph-deploy, the test waits up to 15 minutes to get a HEALTH_OK back, but as the output shows the health command
starts showing OSDs down:

2014-03-05T03:00:00.069 DEBUG:teuthology.orchestra.run:Running [10.214.138.177]: 'cd /home/ubuntu/cephtest && sudo ceph health'
2014-03-05T03:00:00.326 DEBUG:teuthology.task.ceph-deploy:Ceph health: HEALTH_WARN 104 pgs degraded; 88 pgs incomplete; 104 pgs stale; 88 pgs stuck inactive; 192 pgs stuck unclean; 1/9 in osds are down
2014-03-05T03:00:10.327 DEBUG:teuthology.orchestra.run:Running [10.214.138.177]: 'cd /home/ubuntu/cephtest && sudo ceph health'
2014-03-05T03:00:10.586 DEBUG:teuthology.task.ceph-deploy:Ceph health: HEALTH_WARN 104 pgs degraded; 88 pgs incomplete; 192 pgs stale; 88 pgs stuck inactive; 192 pgs stuck unclean; 2/9 in osds are down
2014-03-05T03:00:20.586 DEBUG:teuthology.orchestra.run:Running [10.214.138.177]: 'cd /home/ubuntu/cephtest && sudo ceph health'
2014-03-05T03:00:20.857 DEBUG:teuthology.task.ceph-deploy:Ceph health: HEALTH_WARN 104 pgs degraded; 88 pgs incomplete; 192 pgs stale; 88 pgs stuck inactive; 192 pgs stuck unclean; 3/9 in osds are down
2014-03-05T03:00:30.857 DEBUG:teuthology.orchestra.run:Running [10.214.138.177]: 'cd /home/ubuntu/cephtest && sudo ceph health'
2014-03-05T03:00:31.119 DEBUG:teuthology.task.ceph-deploy:Ceph health: HEALTH_WARN 104 pgs degraded; 88 pgs incomplete; 192 pgs stale; 88 pgs stuck inactive; 192 pgs stuck unclean; 4/9 in osds are down
2014-03-05T03:00:41.120 DEBUG:teuthology.orchestra.run:Running [10.214.138.177]: 'cd /home/ubuntu/cephtest && sudo ceph health'
2014-03-05T03:00:41.385 DEBUG:teuthology.task.ceph-deploy:Ceph health: HEALTH_WARN 104 pgs degraded; 88 pgs incomplete; 192 pgs stale; 88 pgs stuck inactive; 192 pgs stuck unclean; 4/9 in osds are down
2014-03-05T03:00:51.385 DEBUG:teuthology.orchestra.run:Running [10.214.138.177]: 'cd /home/ubuntu/cephtest && sudo ceph health'
2014-03-05T03:00:51.624 DEBUG:teuthology.task.ceph-deploy:Ceph health: HEALTH_WARN 104 pgs degraded; 88 pgs incomplete; 192 pgs stale; 88 pgs stuck inactive; 192 pgs stuck unclean; 5/9 in osds are down

This seems to be the case only for Fedora 19 and Firefly.

Full log output http://qa-proxy.ceph.com/teuthology/teuthology-2014-03-05_01:10:15-ceph-deploy-firefly-distro-basic-vps/117053/teuthology.log

Other failures can be seen on the dashboard http://pulpito.ceph.com/teuthology-2014-03-05_01:10:15-ceph-deploy-firefly-distro-basic-vps/

Note that OSD logs have been increased (see the config values for the run at the start of the log)

boot.log View (118 KB) John Spray, 07/16/2014 05:07 AM

Associated revisions

Revision 3e0d9800 (diff)
Added by Sage Weil about 3 years ago

init-ceph: wrap daemon startup with systemd-run when running under systemd

We want to make sure the daemon runs in its own systemd environment. Check
for systemd as pid 1 and, when present, use systemd-run -r <cmd> to do
this.

Probably fixes #7627

Signed-off-by: Sage Weil <>
Reviewed-by: Dan Mick <>
Tested-by: Dan Mick <>

Revision 67b5193f (diff)
Added by Sage Weil about 3 years ago

init-ceph: wrap daemon startup with systemd-run when running under systemd

We want to make sure the daemon runs in its own systemd environment. Check
for systemd as pid 1 and, when present, use systemd-run -r <cmd> to do
this.

Probably fixes #7627

Signed-off-by: Sage Weil <>
Reviewed-by: Dan Mick <>
Tested-by: Dan Mick <>
(cherry picked from commit 3e0d9800767018625f0e7d797c812aa44c426dab)

History

#1 Updated by Alfredo Deza over 3 years ago

  • Description updated (diff)

#2 Updated by Sage Weil over 3 years ago

  • Subject changed from OSDs start crashing after 15 minutes to filejournal crash on f19
  • Priority changed from Normal to Urgent
2014-03-05 10:45:23.549004 7f14ce44f7c0  0 filestore(/var/lib/ceph/osd/ceph-5) mount detected xfs (libxfs)
2014-03-05 10:45:23.549067 7f14ce44f7c0  1 filestore(/var/lib/ceph/osd/ceph-5)  disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs
2014-03-05 10:45:23.552358 7f14ce44f7c0  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-5) detect_features: FIEMAP ioctl is supported and appears to work
2014-03-05 10:45:23.552401 7f14ce44f7c0  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-5) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2014-03-05 10:45:23.554399 7f14ce44f7c0  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-5) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2014-03-05 10:45:23.554496 7f14ce44f7c0  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-5) detect_feature: extsize is supported
2014-03-05 10:45:23.558626 7f14ce44f7c0  0 filestore(/var/lib/ceph/osd/ceph-5) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2014-03-05 10:45:23.568632 7f14ce44f7c0 -1 journal _check_disk_write_cache: pclose failed: (61) No data available
2014-03-05 10:45:23.568685 7f14ce44f7c0  1 journal _open /var/lib/ceph/osd/ceph-5/journal fd 21: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2014-03-05 10:45:23.573989 7f14ce44f7c0  1 journal close /var/lib/ceph/osd/ceph-5/journal
2014-03-05 10:45:23.575458 7f14ce44f7c0 -1 os/FileJournal.cc: In function 'virtual void FileJournal::close()' thread 7f14ce44f7c0 time 2014-03-05 10:45:23.574063
os/FileJournal.cc: 547: FAILED assert(fd >= 0)

 ceph version 0.77-708-g8fdfece (8fdfece9fd5419eeb1bc65b3ac4987f6e150fd9f)
 1: (FileJournal::close()+0x1b5) [0x905535]
 2: (JournalingObjectStore::journal_stop()+0x6a) [0x75b91a]
 3: (FileStore::umount()+0x178) [0x716e68]
 4: (OSD::do_convertfs(ObjectStore*)+0xab2) [0x600a12]
 5: (main()+0x2550) [0x5ec7f0]
 6: (__libc_start_main()+0xf5) [0x7f14cc010b75]
 7: /usr/bin/ceph-osd() [0x5f0929]

#3 Updated by Ian Colle over 3 years ago

  • Assignee set to Joao Luis

#4 Updated by Sage Weil over 3 years ago

  • Severity changed from 3 - minor to 2 - major

#5 Updated by Sage Weil over 3 years ago

  • Subject changed from filejournal crash on f19 to filejournal crash on f19; ceph-deploy can't go healthy
  • Status changed from New to Verified
  • Assignee changed from Joao Luis to Sage Weil
  • Source changed from other to Q/A

#6 Updated by Sage Weil over 3 years ago

I'm 90% sure this is systemd killing everything in the cgroup when ceph-deploy's ssh session closes. The log file just ends (kill -9?) right when the ceph-deploy command finishes. In the cases where we start several OSDs at once, they all stay running until it finishes and then all die at once, consistent with the ssh close.

The f19 config options don't explicitly enable KillUserProcesses or kill-session-processes (at least that i can see), but i can't figure out how to view what the default is.

#7 Updated by Alfredo Deza over 3 years ago

I tried to replicate this with ceph-deploy and boy was it a nightmare to get there.

FC19 panics a few seconds after an OSD is deployed. I was able to replicate this behavior reliably by blowing away everything and reinstalling from scratch.

So after several attempts to get FC19 going and failing, Sage suggested I should try FC20, but alas, we do not have any FC20 packages. I attempted to re-use the
FC19 packages though and was not able to get the dependencies resolved:

ceph-deploy install --release fedora20 node{7,8}
[ceph_deploy.cli][INFO  ] Invoked (1.3.5): /Users/alfredo/.virtualenvs/ceph-deploy/bin/ceph-deploy install --release fedora20 node7 node8
[ceph_deploy.install][DEBUG ] Installing stable version fedora20 on cluster ceph hosts node7 node8
[ceph_deploy.install][DEBUG ] Detecting platform for host node7 ...
[node7][DEBUG ] connected to host: node7
[node7][DEBUG ] detect platform information from remote host
[node7][DEBUG ] detect machine type
[ceph_deploy.install][INFO  ] Distro info: Fedora 20 Heisenbug
[node7][INFO  ] installing ceph on node7
[ceph_deploy.install][INFO  ] detected valid custom repositories from config file
[ceph_deploy.install][INFO  ] will use repository from conf: fedora20
[node7][INFO  ] adding custom repository file
[node7][INFO  ] Running command: sudo rpm --import https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/autobuild.asc
[node7][DEBUG ] set the contents of repo file to /etc/yum.repos.d/
[node7][INFO  ] Running command: sudo yum -y -q install wget
` [node7][DEBUG ] Package wget-1.14-12.fc20.x86_64 already installed and latest version
[node7][INFO  ] Running command: sudo yum -y -q install ceph
[node7][WARNIN] Error: Package: libcephfs1-0.72.2-0.fc19.x86_64 (fedora20)
[node7][WARNIN]            Requires: libboost_thread-mt.so.1.53.0()(64bit)
[node7][WARNIN] Error: Package: librbd1-0.72.2-0.fc19.x86_64 (fedora20)
[node7][WARNIN]            Requires: libboost_system-mt.so.1.53.0()(64bit)
[node7][WARNIN] Error: Package: ceph-0.72.2-0.fc19.x86_64 (fedora20)
[node7][WARNIN]            Requires: libboost_system-mt.so.1.53.0()(64bit)
[node7][WARNIN] Error: Package: libcephfs1-0.72.2-0.fc19.x86_64 (fedora20)
[node7][WARNIN]            Requires: libboost_system-mt.so.1.53.0()(64bit)
[node7][WARNIN] Error: Package: librbd1-0.72.2-0.fc19.x86_64 (fedora20)
[node7][WARNIN]            Requires: libboost_thread-mt.so.1.53.0()(64bit)
[node7][WARNIN] Error: Package: ceph-0.72.2-0.fc19.x86_64 (fedora20)
[node7][WARNIN]            Requires: libboost_thread-mt.so.1.53.0()(64bit)
[node7][WARNIN] Error: Package: librados2-0.72.2-0.fc19.x86_64 (fedora20)
[node7][WARNIN]            Requires: libboost_thread-mt.so.1.53.0()(64bit)
[node7][WARNIN] Error: Package: librados2-0.72.2-0.fc19.x86_64 (fedora20)
[node7][WARNIN]            Requires: libboost_system-mt.so.1.53.0()(64bit)
[node7][DEBUG ]  You could try using --skip-broken to work around the problem
[node7][DEBUG ]  You could try running: rpm -Va --nofiles --nodigest
[node7][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y -q install ceph

#8 Updated by Alfredo Deza over 3 years ago

About the systemd not playing well with ceph-deploy, I am still doubtful because the way ceph-deploy is working with the host is constantly connecting and disconnecting.

It does not stay connected doing a bunch of commands (say monitors and then OSDs). So why would systemd not kill the process for monitors but would totally do that for
OSDs?

I am thinking that this has to do with the fact that I could not get OSDs up and running except with a kernel panic in combination with systemd killing the process.

#9 Updated by Sage Weil over 3 years ago

  • Priority changed from Urgent to High

#10 Updated by Sage Weil over 3 years ago

  • Assignee deleted (Sage Weil)

#11 Updated by Sage Weil over 3 years ago

  • Subject changed from filejournal crash on f19; ceph-deploy can't go healthy to filejournal crash on f19; ceph-deploy can't go healthy (systemd?)

#13 Updated by Sage Weil over 3 years ago

  • Subject changed from filejournal crash on f19; ceph-deploy can't go healthy (systemd?) to ceph-disk: does not start daemons properly under systemd
  • Priority changed from High to Urgent

the fix is probably to change ceph-disk to run the systemd command to trigger the service start instead of running 'service ...' directly.

#14 Updated by Alfredo Deza over 3 years ago

Fedora 19 does not have (nor it provides) systemd-run:

$ systemd --version
systemd 204
+PAM +LIBWRAP +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ

$ sudo yum provides systemd-run
No matches found

According to the Fedora docs it looks like it is there for FC 20: http://linuxmanpages.net/manpages/fedora20/man1/systemd-run.1.html

#15 Updated by Sage Weil over 3 years ago

Alfredo Deza wrote:

Fedora 19 does not have (nor it provides) systemd-run:

[...]

According to the Fedora docs it looks like it is there for FC 20: http://linuxmanpages.net/manpages/fedora20/man1/systemd-run.1.html

I think we should ignore fc19 entirely then and focus on making sure we are doing the right thing with 20+. Hopefully rhel7 has systemd-run?

#16 Updated by Alfredo Deza over 3 years ago

...and we don't have packages for F20

Going to try and build for F20, but it might take some effort to get there as we depend on setting up hosts for this (as opposed to ephemeral ones). See issue #8325
for a reference on to why this is problematic for now.

#17 Updated by Ian Colle over 3 years ago

  • Assignee set to Alfredo Deza

Alfredo - where are we at on the F20 packages?

#18 Updated by Ian Colle over 3 years ago

  • Project changed from Ceph to devops

#19 Updated by Alfredo Deza over 3 years ago

I am having a lot of trouble trying to relate the ssh connection with how we start the daemon and the subsequent failure.

We are able to start the monitor correctly without the 'systemd-run' call using just 'service', and that stays up even
after ceph-deploy closes the connection:

ceph-deploy mon create-initial
[ceph_deploy.conf][DEBUG ] found configuration file at: /Users/alfredo/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.4): /Users/alfredo/.virtualenvs/ceph-deploy/bin/ceph-deploy mon create-initial
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts node8
[ceph_deploy.mon][DEBUG ] detecting platform for host node8 ...
[node8][DEBUG ] connected to host: node8
[node8][DEBUG ] detect platform information from remote host
[node8][DEBUG ] detect machine type
[ceph_deploy.mon][INFO  ] distro info: Fedora 20 Heisenbug
[node8][DEBUG ] determining if provided host has same hostname in remote
[node8][DEBUG ] get remote short hostname
[node8][DEBUG ] deploying mon to node8
[node8][DEBUG ] get remote short hostname
[node8][DEBUG ] remote hostname: node8
[node8][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[node8][DEBUG ] create the mon path if it does not exist
[node8][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-node8/done
[node8][DEBUG ] done path does not exist: /var/lib/ceph/mon/ceph-node8/done
[node8][INFO  ] creating keyring file: /var/lib/ceph/tmp/ceph-node8.mon.keyring
[node8][DEBUG ] create the monitor keyring file
[node8][INFO  ] Running command: sudo ceph-mon --cluster ceph --mkfs -i node8 --keyring /var/lib/ceph/tmp/ceph-node8.mon.keyring
[node8][DEBUG ] ceph-mon: mon.noname-a 192.168.111.107:6789/0 is local, renaming to mon.node8
[node8][DEBUG ] ceph-mon: set fsid to ba2b6622-a1bb-490b-9de9-47f263321591
[node8][DEBUG ] ceph-mon: created monfs at /var/lib/ceph/mon/ceph-node8 for mon.node8
[node8][INFO  ] unlinking keyring file /var/lib/ceph/tmp/ceph-node8.mon.keyring
[node8][DEBUG ] create a done file to avoid re-doing the mon deployment
[node8][DEBUG ] create the init path if it does not exist
[node8][DEBUG ] locating the `service` executable...
[node8][INFO  ] Running command: sudo /usr/sbin/service ceph -c /etc/ceph/ceph.conf start mon.node8
[node8][DEBUG ] === mon.node8 ===
[node8][DEBUG ] Starting Ceph mon.node8 on node8...
[node8][DEBUG ] Starting ceph-create-keys on node8...
[node8][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node8.asok mon_status
[node8][DEBUG ] ********************************************************************************
[node8][DEBUG ] status for monitor: mon.node8
[node8][DEBUG ] {
[node8][DEBUG ]   "election_epoch": 2,
[node8][DEBUG ]   "extra_probe_peers": [],
[node8][DEBUG ]   "monmap": {
[node8][DEBUG ]     "created": "0.000000",
[node8][DEBUG ]     "epoch": 1,
[node8][DEBUG ]     "fsid": "ba2b6622-a1bb-490b-9de9-47f263321591",
[node8][DEBUG ]     "modified": "0.000000",
[node8][DEBUG ]     "mons": [
[node8][DEBUG ]       {
[node8][DEBUG ]         "addr": "192.168.111.107:6789/0",
[node8][DEBUG ]         "name": "node8",
[node8][DEBUG ]         "rank": 0
[node8][DEBUG ]       }
[node8][DEBUG ]     ]
[node8][DEBUG ]   },
[node8][DEBUG ]   "name": "node8",
[node8][DEBUG ]   "outside_quorum": [],
[node8][DEBUG ]   "quorum": [
[node8][DEBUG ]     0
[node8][DEBUG ]   ],
[node8][DEBUG ]   "rank": 0,
[node8][DEBUG ]   "state": "leader",
[node8][DEBUG ]   "sync_provider": []
[node8][DEBUG ] }
[node8][DEBUG ] ********************************************************************************
[node8][INFO  ] monitor: mon.node8 is running
[node8][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node8.asok mon_status
[ceph_deploy.mon][INFO  ] processing monitor mon.node8
[node8][DEBUG ] connected to host: node8
[node8][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node8.asok mon_status
[ceph_deploy.mon][INFO  ] mon.node8 monitor has reached quorum!
[ceph_deploy.mon][INFO  ] all initial monitors are running and have formed quorum
[ceph_deploy.mon][INFO  ] Running gatherkeys...
[ceph_deploy.gatherkeys][DEBUG ] Have ceph.client.admin.keyring
[ceph_deploy.gatherkeys][DEBUG ] Have ceph.mon.keyring
[ceph_deploy.gatherkeys][DEBUG ] Have ceph.bootstrap-osd.keyring
[ceph_deploy.gatherkeys][DEBUG ] Have ceph.bootstrap-mds.keyring

After that is done I can verify that ceph-mon is still up:

$ ssh node8 ps aux | grep ceph-mon
root     13984  0.1  1.8 133824  9068 ?        Sl   20:49   0:00 /usr/bin/ceph-mon -i node8 --pid-file /var/run/ceph/mon.node8.pid -c /etc/ceph/ceph.conf --cluster ceph

I think that one possibility is that ceph-disk is launching things incorrectly (for Fedora) but I cannot find where that is.

I did add a few 'systemd-run' calls to see if that would take care of it, but that didn't seem to help:

ceph-deploy osd --zap create node8:/dev/sdb
[ceph_deploy.conf][DEBUG ] found configuration file at: /Users/alfredo/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.4): /Users/alfredo/.virtualenvs/ceph-deploy/bin/ceph-deploy osd --zap create node8:/dev/sdb
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks node8:/dev/sdb:
[node8][DEBUG ] connected to host: node8
[node8][DEBUG ] detect platform information from remote host
[node8][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Fedora 20 Heisenbug
[ceph_deploy.osd][DEBUG ] Deploying osd to node8
[node8][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[node8][INFO  ] Running command: sudo udevadm trigger --subsystem-match=block --action=add
[ceph_deploy.osd][DEBUG ] Preparing host node8 disk /dev/sdb journal None activate True
[node8][INFO  ] Running command: sudo systemd-run ceph-disk-prepare --zap-disk --fs-type xfs --cluster ceph -- /dev/sdb
[node8][WARNIN] Running as unit run-14206.service.
[node8][INFO  ] Running command: sudo udevadm trigger --subsystem-match=block --action=add
[node8][INFO  ] checking OSD status...
[node8][INFO  ] Running command: sudo ceph --cluster=ceph osd stat --format=json
[ceph_deploy.osd][DEBUG ] Host node8 is now ready for osd use.

The 'host is now ready for osd use' is a lie, here is the 'osd stat' result:

{u'epoch': 1,
 u'full': False,
 u'nearfull': False,
 u'num_in_osds': 0,
 u'num_osds': 0,
 u'num_up_osds': 0}

Not sure what else I can try here

#20 Updated by Alfredo Deza over 3 years ago

  • Status changed from Verified to Feedback

#21 Updated by Sage Weil about 3 years ago

  • Priority changed from Urgent to High

#22 Updated by Sage Weil about 3 years ago

  • Status changed from Feedback to Testing

i think we fixed this by doing systemd-run from teh init script...

#23 Updated by John Spray about 3 years ago

Some possibly related feedback from running master (aeaac69) on Fedora 20:

  • Mons don't come up because they're trying to bind before network is set up. Although S60ceph is after K90network, the network init is just starting NetworkManager, and its a few more seconds before the net is actually available, by which time the mon has tried and failed to start (accepter.accepter.bind unable to bind to 192.168.18.3:6789: (99) Cannot assign requested address)
  • OSDs don't come up reliably, possible smoking gun in the case of an OSD not coming up could be "timeout '/usr/sbin/ceph-disk-udev 1 sdc1 sdc'".

Log of a Fedora 20 boot where the mon fails to come up because of being before networking, and one OSD fails to come up (this is sporadic) attached as boot.log

#24 Updated by Sage Weil about 3 years ago

  • Status changed from Testing to Resolved

commit:3e0d9800767018625f0e7d797c812aa44c426dab

Also available in: Atom PDF