Bug #15583: systemd warns about TasksMax= setting on older distros - devops - Ceph

Actions

Copy link

Bug #15583

open

systemd warns about TasksMax= setting on older distros

Added by Nathan Cutler about 8 years ago. Updated over 7 years ago.

Status:

New

Priority:

Low

Assignee:

Category:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

The TasksMax= setting was introduced in systemd 227 ( see https://github.com/systemd/systemd/blob/master/NEWS )

In ceph, this setting was introduced by https://github.com/ceph/ceph/commit/05cafcf1 - however, no measure was included to prevent ceph from being installed on systems running older systemd versions that do not yet support this setting.

The result is that the unit files do not start. E.g.:

ceph-osd@0.service - Ceph object storage daemon
Loaded: loaded (/lib/systemd/system/ceph-osd.service; disabled)
Active: inactive (dead)

Apr 22 17:55:38 elara systemd1: [/lib/systemd/system/ceph-osd@.service:18] 
Unknown lvalue 'TasksMax' in section 'Service'

See http://tracker.ceph.com/issues/15553#note-10

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Nathan Cutler about 8 years ago

Related to Bug #15553: /var/run/ceph permissions are borked after every reboot added

Actions

Copy link

Updated by Nathan Cutler about 8 years ago

Description updated (diff)

Actions

Copy link

Updated by Nathan Cutler about 8 years ago

Has duplicate Bug #15581: OSD doesn't start added

Actions

Copy link

Updated by Heath Jepson about 8 years ago

This can't just be effecting me. after running apt-get update then apt-get dist-upgrade, I'm still in the same boat.

A quick look at debian's repository reveals that the latest version of systemd for jessie is 215, there is nothing useful in the backports.

What is your recommended OS for jewel? What do you guys use internally? I've be using debian and ceph together for the last 3.5 years and it's been perfect until now.

I commented out all occurrences of TasksMax, now my mon starts, but everything else is still dead as a doornail. ceph-disk gives me the following and data drives are not getting mounted:

I have yet to get anything to start without major intervention since upgrading from infernalis to jewel 10.2.0

root@elara:/etc/ceph# systemctl status ceph-osd@0 ● ceph-osd@0.service - Ceph object storage daemon Loaded: loaded (/lib/systemd/system/ceph-osd.service; disabled)
Active: failed (Result: start-limit) since Sat 2016-04-23 19:10:00 MDT; 44min ago
Process: 4141 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Process: 4089 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS)
Main PID: 4141 (code=exited, status=1/FAILURE)

Apr 23 19:10:00 elara systemd¹: Unit ceph-osd@0.service entered failed state.
Apr 23 19:10:00 elara systemd¹: ceph-osd@0.service start request repeated too quickly, refusing to start.
Apr 23 19:10:00 elara systemd¹: Failed to start Ceph object storage daemon.
Apr 23 19:10:00 elara systemd¹: Unit ceph-osd@0.service entered failed state.
Apr 23 19:25:07 elara systemd¹: ceph-osd@0.service start request repeated too quickly, refusing to start.
Apr 23 19:25:07 elara systemd¹: Failed to start Ceph object storage daemon.
root@elara:/etc/ceph# systemctl status ceph-disk@0
● ceph-disk@0.service - Ceph disk activation: /0
Loaded: loaded (/lib/systemd/system/ceph-disk@.service; static)
Active: inactive (dead)
root@elara:/etc/ceph# systemctl start ceph-disk@0
Job for ceph-disk@0.service failed. See 'systemctl status ceph-disk@0.service' and 'journalctl -xn' for details.
root@elara:/etc/ceph# systemctl status ceph-disk@0
● ceph-disk@0.service - Ceph disk activation: /0
Loaded: loaded (/lib/systemd/system/ceph-disk@.service; static)
Active: failed (Result: exit-code) since Sat 2016-04-23 19:55:08 MDT; 4s ago
Process: 10705 ExecStart=/bin/sh -c flock /var/lock/ceph-disk /usr/sbin/ceph-disk --verbose --log-stdout trigger --sync %f (code=exited, status=1/FAILURE)
Main PID: 10705 (code=exited, status=1/FAILURE)

Apr 23 19:55:08 elara sh¹⁰⁷⁰⁵: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4964, in run
Apr 23 19:55:08 elara sh¹⁰⁷⁰⁵: main(sys.argv[1:])
Apr 23 19:55:08 elara sh¹⁰⁷⁰⁵: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4915, in main
Apr 23 19:55:08 elara sh¹⁰⁷⁰⁵: args.func(args)
Apr 23 19:55:08 elara sh¹⁰⁷⁰⁵: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4347, in main_trigger
Apr 23 19:55:08 elara sh¹⁰⁷⁰⁵: raise Error('unrecognized partition type %s' % parttype)
Apr 23 19:55:08 elara sh¹⁰⁷⁰⁵: ceph_disk.main.Error: Error: unrecognized partition type None
Apr 23 19:55:08 elara systemd¹: ceph-disk@0.service: main process exited, code=exited, status=1/FAILURE
Apr 23 19:55:08 elara systemd¹: Failed to start Ceph disk activation: /0.
Apr 23 19:55:08 elara systemd¹: Unit ceph-disk@0.service entered failed state.
root@elara:/etc/ceph#@

The only way it works is for me to manually mount the osds, which all previous versions of ceph did automatically saving me the trouble of mucking around in /etc/fstab

by running the following 2 commands, I finally end up with a running osd.:
mount -t xfs /dev/disk/by-partlabel/Ceph_OSD.0.XFSdata /srv/ceph/osd/osd.0 -o noatime,nodiratime,logbsize=256k,logbufs=8,allocsize=4M systemctl status ceph-osd@0

I could not find anyone else reporting this, but I can't be going crazy. My config has worked perfectly all the way through hammer, besides the pidfile issue which was an amazingly simple fix. I'd like to know why ceph is misbehaving so much, when previously, it was the most rock-solid software of this level of complexity I've ever used.

Actions

Copy link

Updated by Heath Jepson about 8 years ago

Correction:

mount -t xfs /dev/disk/by-partlabel/Ceph_OSD.0.XFSdata /srv/ceph/osd/osd.0 -o noatime,nodiratime,logbsize=256k,logbufs=8,allocsize=4M
systemctl start ceph-osd@0

Actions

Copy link

Updated by Nathan Cutler about 8 years ago

Has duplicate deleted (Bug #15581: OSD doesn't start)

Actions

Copy link

Updated by Nathan Cutler about 8 years ago

Status changed from New to Rejected

As the systemd maintainers just told me, the "Unknown lvalue 'TasksMax' in section 'Service'" is benign and the presence of an unsupported/unrecognized "TasksMax=" in the unit file does not prevent the unit from starting.

Actions

Copy link

Updated by Ken Dreyer almost 8 years ago

Subject changed from systemd units refuse to start due to presence of TasksMax= to systemd warns about TasksMax= setting on older distros
Status changed from Rejected to New
Priority changed from Urgent to Low

This still strikes me as a poor usability experience. We should eliminate the warning to the user, if we can help it.

Ubuntu Xenial ships systemd 229, so that is not affected. My thought is to make the RPM strip out these lines if we're building on RHEL 7, with an RPM conditional.

Actions

Copy link