Project

General

Profile

Actions

Bug #15581

closed

OSD doesn't start

Added by Shinobu Kinjo about 8 years ago. Updated almost 8 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

[ceph@octopus conf]$ sudo systemctl status
- Ceph object storage daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2016-04-22 22:16:45 EDT; 2min 37s ago
Main PID: 3780 (ceph-osd)
CGroup: /system.slice/system-ceph\
└─3780 /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph

Apr 22 22:16:45 octopus.fullstack.go ceph-osd3780: 2016-04-22 22:16:45.301872 7f8b63a60800 -1 osd.1 0 log_to_monitors {default=true}
Apr 22 22:16:45 octopus.fullstack.go systemd1: [/usr/lib/systemd/system/ceph-osd@.service:18] Unknown lvalue 'TasksMax' in section 'Service'
Apr 22 22:16:45 octopus.fullstack.go systemd1: Started Ceph object storage daemon.
Apr 22 22:16:46 octopus.fullstack.go systemd1: [/usr/lib/systemd/system/ceph-osd@.service:18] Unknown lvalue 'TasksMax' in section 'Service'
Apr 22 22:16:46 octopus.fullstack.go systemd1: Started Ceph object storage daemon.
Apr 22 22:17:43 octopus.fullstack.go systemd1: [/usr/lib/systemd/system/ceph-osd@.service:18] Unknown lvalue 'TasksMax' in section 'Service'
Apr 22 22:17:44 octopus.fullstack.go systemd1: Started Ceph object storage daemon.
Apr 22 22:18:18 octopus.fullstack.go systemd1: [/usr/lib/systemd/system/ceph-osd@.service:18] Unknown lvalue 'TasksMax' in section 'Service'
Apr 22 22:18:18 octopus.fullstack.go systemd1: Started Ceph object storage daemon.
Apr 22 22:18:24 octopus.fullstack.go systemd1: [/usr/lib/systemd/system/ceph-osd@.service:18] Unknown lvalue 'TasksMax' in section 'Service'

[ceph@octopus conf]$ cat /usr/lib/systemd/system/ceph-osd\@.service
[Unit]
Description=Ceph object storage daemon
After=network-online.target local-fs.target
Wants=network-online.target local-fs.target
PartOf=ceph-osd.target

[Service]
LimitNOFILE=1048576
LimitNPROC=1048576
EnvironmentFile=-/etc/sysconfig/ceph
Environment=CLUSTER=ceph
ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph
ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph
ExecReload=/bin/kill -HUP $MAINPID
ProtectHome=true
ProtectSystem=full
PrivateTmp=true
  1. TasksMax=infinity
    Restart=on-failure
    StartLimitInterval=30min
    StartLimitBurst=3

[Install]
WantedBy=ceph-osd.target

[ceph@octopus conf]$ ceph -v
ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9)


Related issues 1 (0 open1 closed)

Is duplicate of devops - Bug #14941: ceph-{mds,mon,osd} packages need scriptlets with systemd codeResolvedNathan Cutler03/01/2016

Actions
Actions #1

Updated by Nathan Cutler about 8 years ago

  • Project changed from Ceph to devops
  • Status changed from New to Duplicate
Actions #2

Updated by Nathan Cutler about 8 years ago

  • Is duplicate of Bug #15583: systemd warns about TasksMax= setting on older distros added
Actions #3

Updated by Nathan Cutler about 8 years ago

You could be running an older version of systemd that doesn't support TasksMax=

Actions #4

Updated by Nathan Cutler almost 8 years ago

  • Status changed from Duplicate to New

Changing status back to "New" because, as the systemd maintainers just told me, the "Unknown lvalue 'TasksMax' in section 'Service'" is benign and the presence of an unsupported/unrecognized "TasksMax=" in the unit file does not prevent the unit from starting.

Actions #5

Updated by Nathan Cutler almost 8 years ago

  • Is duplicate of deleted (Bug #15583: systemd warns about TasksMax= setting on older distros)
Actions #6

Updated by Nathan Cutler almost 8 years ago

  • Status changed from New to Need More Info

In what way does the OSD not start? systemctl status says "Active: active (running)"

Also, what distro are you running?

Actions #7

Updated by Shinobu Kinjo almost 8 years ago

I just built the Ceph cluster using ceph-deploy with stable jewel.

Problem is not only osd but mon as well. Maybe I filed a bug on this.

It's enabled.

[ceph@octopus conf]$ sudo systemctl is-enabled
enabled

But...

[ceph@octopus conf]$ sudo systemctl status
- Ceph cluster monitor daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled)
Active: inactive (dead)

Once I started mon, it's running -;

[ceph@octopus conf]$ sudo systemctl status
- Ceph cluster monitor daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2016-04-25 18:37:32 EDT; 2s ago
Main PID: 3577 (ceph-mon)
CGroup: /system.slice/system-ceph\
└─3577 /usr/bin/ceph-mon -f --cluster ceph --id octopus --setuser ceph --setgroup ceph

Apr 25 18:37:32 octopus.fullstack.go systemd1: Started Ceph cluster monitor daemon.
Apr 25 18:37:32 octopus.fullstack.go systemd1: Starting Ceph cluster monitor daemon...
Apr 25 18:37:32 octopus.fullstack.go ceph-mon3577: 2016-04-25 18:37:32.336537 7f591911d4c0 -1 WARNING: the following dangerous and experimental features are enabled: *
Apr 25 18:37:32 octopus.fullstack.go ceph-mon3577: 2016-04-25 18:37:32.337119 7f591911d4c0 -1 WARNING: the following dangerous and experimental features are enabled: *
Apr 25 18:37:32 octopus.fullstack.go ceph-mon3577: 2016-04-25 18:37:32.385628 7f591911d4c0 -1 WARNING: the following dangerous and experimental features are enabled: *
Apr 25 18:37:32 octopus.fullstack.go ceph-mon3577: starting mon.octopus rank 0 at 172.16.0.2:6789/0 mon_data /var/lib/ceph/mon/ceph-octopus fsid dbc219e2-cbd4-4538-8486-6dab65a57c18

Now I understood that it's because mon is NOT running that osd doen't run.

[ceph@octopus conf]$ sudo systemctl status
- Ceph object storage daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2016-04-25 18:38:26 EDT; 25s ago
Main PID: 3852 (ceph-osd)
CGroup: /system.slice/system-ceph\
└─3852 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph

Apr 25 18:38:26 octopus.fullstack.go ceph-osd3852: unsupported, and may result in data corruption, data loss,
Apr 25 18:38:26 octopus.fullstack.go ceph-osd3852: and/or irreparable damage to your cluster. Do not use
Apr 25 18:38:26 octopus.fullstack.go ceph-osd3852: feature with important data.
Apr 25 18:38:27 octopus.fullstack.go ceph-osd3852: 2016-04-25 18:38:27.385077 7fce90b25800 -1 WARNING: experimental feature 'rocksdb' is enabled
Apr 25 18:38:27 octopus.fullstack.go ceph-osd3852: Please be aware that this feature is experimental, untested,
Apr 25 18:38:27 octopus.fullstack.go ceph-osd3852: unsupported, and may result in data corruption, data loss,
Apr 25 18:38:27 octopus.fullstack.go ceph-osd3852: and/or irreparable damage to your cluster. Do not use
Apr 25 18:38:27 octopus.fullstack.go ceph-osd3852: feature with important data.
Apr 25 18:38:27 octopus.fullstack.go ceph-osd3852: 2016-04-25 18:38:27.537116 7fce90b25800 -1 osd.0 0 log_to_monitors {default=true}
Apr 25 18:38:32 octopus.fullstack.go systemd1: [/usr/lib/systemd/system/ceph-osd@.service:18] Unknown lvalue 'TasksMax' in section 'Service'

Let's have a look.

[ceph@octopus conf]$ sudo reboot

O.k it's not running -;

[ceph@octopus ~]$ sudo systemctl status
- Ceph cluster monitor daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled)
Active: inactive (dead)

Apr 25 18:45:28 octopus.fullstack.go systemd1: [/usr/lib/systemd/system/ceph-mon@.service:24] Unknown lvalue 'TasksMax' in section 'Service'

osd is innocent...

[ceph@octopus ~]$ sudo systemctl status
- Ceph object storage daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled; vendor preset: disabled)
Active: inactive (dead)

Apr 25 18:45:21 octopus.fullstack.go systemd1: [/usr/lib/systemd/system/ceph-osd@.service:18] Unknown lvalue 'TasksMax' in section 'Service'

$ lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.2.1511 (Core)
Release: 7.2.1511
Codename: Core

$ uname -r
4.6.0-rc4+

Actions #8

Updated by Nathan Cutler almost 8 years ago

The ceph-mon service not starting after reboot most likely means that ceph-mon.target is disabled. It's likely that ceph-osd.target is also disabled. I have a PR open to fix this.

Please check if these targets are disabled:

systemctl is-enabled ceph-mon.target
systemctl is-enabled ceph-osd.target

If, as I suspect, they are disabled, please enable them and try reboot again.

Actions #9

Updated by Shinobu Kinjo almost 8 years ago

You're right. By default, it's disabled.

[ceph@octopus ~]$ sudo systemctl is-enabled ceph-mon.target
disabled

[ceph@octopus ~]$ sudo systemctl is-enabled ceph-osd.target
disabled

After enabled,

[ceph@octopus ~]$ sudo systemctl enable ceph-mon.target
Created symlink from /etc/systemd/system/multi-user.target.wants/ceph-mon.target to /usr/lib/systemd/system/ceph-mon.target.
Created symlink from /etc/systemd/system/ceph.target.wants/ceph-mon.target to /usr/lib/systemd/system/ceph-mon.target.

[ceph@octopus ~]$ sudo systemctl enable ceph-osd.target
Created symlink from /etc/systemd/system/multi-user.target.wants/ceph-osd.target to /usr/lib/systemd/system/ceph-osd.target.
Created symlink from /etc/systemd/system/ceph.target.wants/ceph-osd.target to /usr/lib/systemd/system/ceph-osd.target.

[ceph@octopus ~]$ sudo systemctl status
- Ceph cluster monitor daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2016-04-26 18:16:45 EDT; 9min ago
Main PID: 1703 (ceph-mon)
CGroup: /system.slice/system-ceph\
└─1703 /usr/bin/ceph-mon -f --cluster ceph --id octopus --setuser ceph --setgroup ceph

Apr 26 18:16:45 octopus.fullstack.go systemd1: Started Ceph cluster monitor daemon.
Apr 26 18:16:45 octopus.fullstack.go systemd1: Starting Ceph cluster monitor daemon...
Apr 26 18:16:46 octopus.fullstack.go ceph-mon1703: 2016-04-26 18:16:46.374772 7efc5ee7e4c0 -1 WARNING: the following dangerous and experimental features are enabled: *
Apr 26 18:16:46 octopus.fullstack.go ceph-mon1703: 2016-04-26 18:16:46.382433 7efc5ee7e4c0 -1 WARNING: the following dangerous and experimental features are enabled: *
Apr 26 18:16:46 octopus.fullstack.go ceph-mon1703: 2016-04-26 18:16:46.488410 7efc5ee7e4c0 -1 WARNING: the following dangerous and experimental features are enabled: *
Apr 26 18:16:46 octopus.fullstack.go ceph-mon1703: starting mon.octopus rank 0 at 172.16.0.2:6789/0 mon_data /var/lib/ceph/mon/ceph-octopus fsid dbc219e2-c...b65a57c18
Apr 26 18:16:51 octopus.fullstack.go systemd1: [/usr/lib/systemd/system/ceph-mon@.service:24] Unknown lvalue 'TasksMax' in section 'Service'
Hint: Some lines were ellipsized, use -l to show in full.

[ceph@octopus ~]$ sudo systemctl status
- Ceph object storage daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2016-04-26 18:16:47 EDT; 9min ago
Main PID: 2941 (ceph-osd)
CGroup: /system.slice/system-ceph\
└─2941 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph

Apr 26 18:16:47 octopus.fullstack.go ceph-osd2941: unsupported, and may result in data corruption, data loss,
Apr 26 18:16:47 octopus.fullstack.go ceph-osd2941: and/or irreparable damage to your cluster. Do not use
Apr 26 18:16:47 octopus.fullstack.go ceph-osd2941: feature with important data.
Apr 26 18:16:48 octopus.fullstack.go ceph-osd2941: 2016-04-26 18:16:48.186222 7f01135b4800 -1 WARNING: experimental feature 'rocksdb' is enabled
Apr 26 18:16:48 octopus.fullstack.go ceph-osd2941: Please be aware that this feature is experimental, untested,
Apr 26 18:16:48 octopus.fullstack.go ceph-osd2941: unsupported, and may result in data corruption, data loss,
Apr 26 18:16:48 octopus.fullstack.go ceph-osd2941: and/or irreparable damage to your cluster. Do not use
Apr 26 18:16:48 octopus.fullstack.go ceph-osd2941: feature with important data.
Apr 26 18:16:48 octopus.fullstack.go ceph-osd2941: 2016-04-26 18:16:48.466579 7f01135b4800 -1 osd.0 27 log_to_monitors {default=true}
Apr 26 18:16:51 octopus.fullstack.go systemd1: [/usr/lib/systemd/system/ceph-osd@.service:18] Unknown lvalue 'TasksMax' in section 'Service'

So now we need these kind of dependencies since Jewel.

ceph.target
-> ceph-mon.target
-> ceph-mon@.service

Why do we need ceph-{mon|osd}.target?

Actions #10

Updated by Nathan Cutler almost 8 years ago

  • Status changed from Need More Info to In Progress

I don't know that we absolutely need them, but we have them. It is a product of the split of the ceph package into ceph-base, ceph-mds, ceph-mon, ceph-osd.

If you have a MON-only node, for example, you only need to install the ceph-mon package and then you have only the ceph-mon.target

Due to an outstanding bug #14941 the *.target units are not enabled when the RPM is installed. There is a PR https://github.com/ceph/ceph/pull/8714 open to fix that and it should be merged soon.

Actions #11

Updated by Nathan Cutler almost 8 years ago

  • Is duplicate of Bug #14941: ceph-{mds,mon,osd} packages need scriptlets with systemd code added
Actions #12

Updated by Nathan Cutler almost 8 years ago

  • Status changed from In Progress to Duplicate
Actions

Also available in: Atom PDF