Bug #23332: kclient: with fstab entry is not coming up reboot - CephFS - Ceph

Actions

Copy link

Bug #23332

closed

kclient: with fstab entry is not coming up reboot

Added by Shreekara Shastry about 6 years ago. Updated almost 6 years ago.

Status:

Closed

Priority:

High

Assignee:

Zheng Yan

Category:

Target version:

Ceph - v14.0.0

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

kceph

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

With 4 clients(2 fuse and 2 kernel clients), was running automated script for making fstab entry before reboot of clients . After reboot of each clients, I could see that client was not coming up ,after timeout of 600 seconds.

Files

Download all files

reboot.jpg (343 KB) reboot.jpg		Shreekara Shastry, 03/13/2018 01:44 PM
ceph-mds.ceph-sshreeka-run379-node6-mds.log (56.4 KB) ceph-mds.ceph-sshreeka-run379-node6-mds.log		Shreekara Shastry, 03/16/2018 09:49 AM
cierror.txt (1.33 KB) cierror.txt		Shreekara Shastry, 03/16/2018 09:49 AM
ceph-mds.ceph-sshreeka-run379-node4-mds.log (891 KB) ceph-mds.ceph-sshreeka-run379-node4-mds.log		Shreekara Shastry, 03/16/2018 09:49 AM
messages_ceph-sshreeka-run379-node7-client (323 KB) messages_ceph-sshreeka-run379-node7-client		Shreekara Shastry, 03/16/2018 09:49 AM
messages_ceph-sshreeka-run379-node5-client (782 KB) messages_ceph-sshreeka-run379-node5-client		Shreekara Shastry, 03/16/2018 09:49 AM
messages_ceph-sshreeka-run379-node8-client (467 KB) messages_ceph-sshreeka-run379-node8-client		Shreekara Shastry, 03/16/2018 09:49 AM
messages_ceph-sshreeka-run379-node9-client (369 KB) messages_ceph-sshreeka-run379-node9-client		Shreekara Shastry, 03/16/2018 09:49 AM

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Patrick Donnelly about 6 years ago

Subject changed from Client with fstab entry is not coming up reboot to kclient: with fstab entry is not coming up reboot
Category deleted (~~Testing~~)
Status changed from New to Need More Info
Component(FS) kceph added
Component(FS) deleted (~~ceph-fuse~~)

You probably have the wrong mons in the mount configuration.

Actions

Copy link

Updated by Zheng Yan about 6 years ago

need add _netdev mount option

Actions

Copy link

Updated by Shreekara Shastry about 6 years ago

Patrick Donnelly wrote:

You probably have the wrong mons in the mount configuration.

I have only one mon.

Actions

Copy link Download all files

Updated by Shreekara Shastry about 6 years ago

File ceph-mds.ceph-sshreeka-run379-node4-mds.log ceph-mds.ceph-sshreeka-run379-node4-mds.log added
File ceph-mds.ceph-sshreeka-run379-node6-mds.log ceph-mds.ceph-sshreeka-run379-node6-mds.log added
File cierror.txt cierror.txt added
File messages_ceph-sshreeka-run379-node5-client messages_ceph-sshreeka-run379-node5-client added
File messages_ceph-sshreeka-run379-node7-client messages_ceph-sshreeka-run379-node7-client added
File messages_ceph-sshreeka-run379-node8-client messages_ceph-sshreeka-run379-node8-client added
File messages_ceph-sshreeka-run379-node9-client messages_ceph-sshreeka-run379-node9-client added

Actions

Copy link

Updated by Vasu Kulkarni about 6 years ago

Can someone please check the logs, we are having this issue 50% of the time in sanity runs.

Actions

Copy link

Updated by Patrick Donnelly about 6 years ago

Status changed from Need More Info to New
Assignee set to Zheng Yan

Actions

Copy link

Updated by Zheng Yan about 6 years ago

In messages_ceph-sshreeka-run379-node5-client

Mar 14 11:02:55 ceph-sshreeka-run379-node5-client mount: mount error 1 = Operation not permitted
Mar 14 11:02:55 ceph-sshreeka-run379-node5-client kernel: libceph: auth method 'x' error -1

looks like fstab didn't include correct secret

Actions

Copy link

#10

Updated by Shreekara Shastry about 6 years ago

Zheng Yan wrote:

In messages_ceph-sshreeka-run379-node5-client
[...]

looks like fstab didn't include correct secret

command mount -a works with same entries in fstab.

Actions

Copy link

#11

Updated by Luis Henriques about 6 years ago

Actually, the first failure seems to be a bit before:

Mar 14 08:10:00 ceph-sshreeka-run379-node5-client systemd: Stopping User Slice of root.
Mar 14 08:10:42 ceph-sshreeka-run379-node5-client kernel: ceph: mds1 caps stale
Mar 14 08:11:02 ceph-sshreeka-run379-node5-client kernel: ceph: mds1 caps stale
Mar 14 08:14:43 ceph-sshreeka-run379-node5-client kernel: libceph: mds0 172.16.115.100:6800 socket closed (con state OPEN)
Mar 14 08:14:46 ceph-sshreeka-run379-node5-client chronyd[499]: Source 45.76.244.202 replaced with 64.113.44.54
Mar 14 08:14:47 ceph-sshreeka-run379-node5-client kernel: libceph: mds0 172.16.115.100:6800 connection reset
Mar 14 08:14:47 ceph-sshreeka-run379-node5-client kernel: libceph: reset on mds0
Mar 14 08:14:47 ceph-sshreeka-run379-node5-client kernel: ceph: mds0 closed our session
Mar 14 08:14:47 ceph-sshreeka-run379-node5-client kernel: ceph: mds0 reconnect start
Mar 14 08:14:47 ceph-sshreeka-run379-node5-client kernel: ceph: mds0 reconnect denied
Mar 14 08:15:20 ceph-sshreeka-run379-node5-client systemd-logind: Removed session 20.
Mar 14 08:15:20 ceph-sshreeka-run379-node5-client systemd: Removed slice User Slice of cephuser.
Mar 14 08:15:20 ceph-sshreeka-run379-node5-client systemd: Stopping User Slice of cephuser.
Mar 14 08:23:05 ceph-sshreeka-run379-node5-client kernel: libceph: auth method 'x' error -22
Mar 14 08:23:15 ceph-sshreeka-run379-node5-client kernel: libceph: auth method 'x' error -22
Mar 14 08:23:25 ceph-sshreeka-run379-node5-client kernel: libceph: auth method 'x' error -22
...

The client seems to be trying to renew caps while connection is being closed.

It's possible that this has been fixed in recent kernels and the fix hasn't been backported to your distro. Maybe it's worth trying a recent kernel and try to reproduce it.

Actions

Copy link

#12

Updated by Patrick Donnelly about 6 years ago

Priority changed from Normal to High
Target version set to v13.0.0

Actions

Copy link

#13

Updated by Shreekara Shastry about 6 years ago

Luis Henriques wrote:

Actually, the first failure seems to be a bit before:
[...]
The client seems to be trying to renew caps while connection is being closed.

It's possible that this has been fixed in recent kernels and the fix hasn't been backported to your distro. Maybe it's worth trying a recent kernel and try to reproduce it.

I've checked with the latest kernel version, still result was same

Actions

Copy link

#14

Updated by Zheng Yan about 6 years ago

kexec in dmesgs looks suspicious. client mounted cephfs, then used kexec to load kernel image again. All issues happened after kexec

Actions

Copy link

#15

Updated by Patrick Donnelly almost 6 years ago

Status changed from New to Need More Info
Target version changed from v13.0.0 to v14.0.0
ceph-qa-suite deleted (fs)

Zheng Yan wrote:

kexec in dmesgs looks suspicious. client mounted cephfs, then used kexec to load kernel image again. All issues happened after kexec

What do you need from Shreekara to verify this is the issue?

Actions

Copy link

#16

Updated by Zheng Yan almost 6 years ago

I still don't think this is kernel issue. please path kernel with below change and try again.

diff --git a/net/ceph/ceph_common.c b/net/ceph/ceph_common.c
index 584fdbef2088..3ca2d0c65d20 100644
--- a/net/ceph/ceph_common.c
+++ b/net/ceph/ceph_common.c
@@ -432,6 +432,7 @@ ceph_parse_options(char *options, const char *dev_name,
                                err = -ENOMEM;
                                goto out;
                        }
+                       printk("name: %s\n", opt->name);
                        break;
                case Opt_secret:
                        ceph_crypto_key_destroy(opt->key);
@@ -442,6 +443,7 @@ ceph_parse_options(char *options, const char *dev_name,
                                err = -ENOMEM;
                                goto out;
                        }
+                       printk("secret: %s\n", argstr[0].from);
                        err = ceph_crypto_key_unarmor(opt->key, argstr[0].from);
                        if (err < 0)
                                goto out;

Actions

Copy link

#17

Updated by Patrick Donnelly almost 6 years ago

Related to Bug #24879: mds: create health warning if we detect metadata (journal) writes are slow added

Actions

Copy link

#18

Updated by Patrick Donnelly almost 6 years ago

Status changed from Need More Info to Closed

Closing this as it's a consequence of #24879.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #23332

kclient: with fstab entry is not coming up reboot

Updated by Patrick Donnelly about 6 years ago

Updated by Zheng Yan about 6 years ago

Updated by Shreekara Shastry about 6 years ago

Updated by Shreekara Shastry about 6 years ago

Updated by Vasu Kulkarni about 6 years ago

Updated by Patrick Donnelly about 6 years ago

Updated by Zheng Yan about 6 years ago

Updated by Shreekara Shastry about 6 years ago

Updated by Luis Henriques about 6 years ago

Updated by Patrick Donnelly about 6 years ago

Updated by Shreekara Shastry about 6 years ago

Updated by Zheng Yan about 6 years ago

Updated by Patrick Donnelly almost 6 years ago

Updated by Zheng Yan almost 6 years ago

Updated by Patrick Donnelly almost 6 years ago

Updated by Patrick Donnelly almost 6 years ago