Project

General

Profile

Bug #23332

kclient: with fstab entry is not coming up reboot

Added by Shreekara Shastry 8 months ago. Updated 4 months ago.

Status:
Closed
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
03/13/2018
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
kceph
Labels (FS):
Pull request ID:

Description

With 4 clients(2 fuse and 2 kernel clients), was running automated script for making fstab entry before reboot of clients . After reboot of each clients, I could see that client was not coming up ,after timeout of 600 seconds.

reboot.jpg View (343 KB) Shreekara Shastry, 03/13/2018 01:44 PM

ceph-mds.ceph-sshreeka-run379-node6-mds.log View (56.4 KB) Shreekara Shastry, 03/16/2018 09:49 AM

cierror.txt View (1.33 KB) Shreekara Shastry, 03/16/2018 09:49 AM

ceph-mds.ceph-sshreeka-run379-node4-mds.log View (891 KB) Shreekara Shastry, 03/16/2018 09:49 AM

messages_ceph-sshreeka-run379-node7-client (323 KB) Shreekara Shastry, 03/16/2018 09:49 AM

messages_ceph-sshreeka-run379-node5-client (782 KB) Shreekara Shastry, 03/16/2018 09:49 AM

messages_ceph-sshreeka-run379-node8-client (467 KB) Shreekara Shastry, 03/16/2018 09:49 AM

messages_ceph-sshreeka-run379-node9-client (369 KB) Shreekara Shastry, 03/16/2018 09:49 AM


Related issues

Related to fs - Bug #24879: mds: create health warning if we detect metadata (journal) writes are slow Resolved 07/11/2018

History

#1 Updated by Patrick Donnelly 8 months ago

  • Subject changed from Client with fstab entry is not coming up reboot to kclient: with fstab entry is not coming up reboot
  • Category deleted (Testing)
  • Status changed from New to Need More Info
  • Component(FS) kceph added
  • Component(FS) deleted (ceph-fuse)

You probably have the wrong mons in the mount configuration.

#2 Updated by Zheng Yan 8 months ago

need add _netdev mount option

#3 Updated by Shreekara Shastry 8 months ago

Patrick Donnelly wrote:

You probably have the wrong mons in the mount configuration.

I have only one mon.

#7 Updated by Vasu Kulkarni 8 months ago

Can someone please check the logs, we are having this issue 50% of the time in sanity runs.

#8 Updated by Patrick Donnelly 7 months ago

  • Status changed from Need More Info to New
  • Assignee set to Zheng Yan

#9 Updated by Zheng Yan 7 months ago

In messages_ceph-sshreeka-run379-node5-client

Mar 14 11:02:55 ceph-sshreeka-run379-node5-client mount: mount error 1 = Operation not permitted
Mar 14 11:02:55 ceph-sshreeka-run379-node5-client kernel: libceph: auth method 'x' error -1

looks like fstab didn't include correct secret

#10 Updated by Shreekara Shastry 7 months ago

Zheng Yan wrote:

In messages_ceph-sshreeka-run379-node5-client
[...]

looks like fstab didn't include correct secret

command mount -a works with same entries in fstab.

#11 Updated by Luis Henriques 7 months ago

Actually, the first failure seems to be a bit before:

Mar 14 08:10:00 ceph-sshreeka-run379-node5-client systemd: Stopping User Slice of root.
Mar 14 08:10:42 ceph-sshreeka-run379-node5-client kernel: ceph: mds1 caps stale
Mar 14 08:11:02 ceph-sshreeka-run379-node5-client kernel: ceph: mds1 caps stale
Mar 14 08:14:43 ceph-sshreeka-run379-node5-client kernel: libceph: mds0 172.16.115.100:6800 socket closed (con state OPEN)
Mar 14 08:14:46 ceph-sshreeka-run379-node5-client chronyd[499]: Source 45.76.244.202 replaced with 64.113.44.54
Mar 14 08:14:47 ceph-sshreeka-run379-node5-client kernel: libceph: mds0 172.16.115.100:6800 connection reset
Mar 14 08:14:47 ceph-sshreeka-run379-node5-client kernel: libceph: reset on mds0
Mar 14 08:14:47 ceph-sshreeka-run379-node5-client kernel: ceph: mds0 closed our session
Mar 14 08:14:47 ceph-sshreeka-run379-node5-client kernel: ceph: mds0 reconnect start
Mar 14 08:14:47 ceph-sshreeka-run379-node5-client kernel: ceph: mds0 reconnect denied
Mar 14 08:15:20 ceph-sshreeka-run379-node5-client systemd-logind: Removed session 20.
Mar 14 08:15:20 ceph-sshreeka-run379-node5-client systemd: Removed slice User Slice of cephuser.
Mar 14 08:15:20 ceph-sshreeka-run379-node5-client systemd: Stopping User Slice of cephuser.
Mar 14 08:23:05 ceph-sshreeka-run379-node5-client kernel: libceph: auth method 'x' error -22
Mar 14 08:23:15 ceph-sshreeka-run379-node5-client kernel: libceph: auth method 'x' error -22
Mar 14 08:23:25 ceph-sshreeka-run379-node5-client kernel: libceph: auth method 'x' error -22
...

The client seems to be trying to renew caps while connection is being closed.

It's possible that this has been fixed in recent kernels and the fix hasn't been backported to your distro. Maybe it's worth trying a recent kernel and try to reproduce it.

#12 Updated by Patrick Donnelly 7 months ago

  • Priority changed from Normal to High
  • Target version set to v13.0.0

#13 Updated by Shreekara Shastry 7 months ago

Luis Henriques wrote:

Actually, the first failure seems to be a bit before:
[...]
The client seems to be trying to renew caps while connection is being closed.

It's possible that this has been fixed in recent kernels and the fix hasn't been backported to your distro. Maybe it's worth trying a recent kernel and try to reproduce it.

I've checked with the latest kernel version, still result was same

#14 Updated by Zheng Yan 7 months ago

kexec in dmesgs looks suspicious. client mounted cephfs, then used kexec to load kernel image again. All issues happened after kexec

#15 Updated by Patrick Donnelly 6 months ago

  • Status changed from New to Need More Info
  • Target version changed from v13.0.0 to v14.0.0
  • ceph-qa-suite deleted (fs)

Zheng Yan wrote:

kexec in dmesgs looks suspicious. client mounted cephfs, then used kexec to load kernel image again. All issues happened after kexec

What do you need from Shreekara to verify this is the issue?

#16 Updated by Zheng Yan 6 months ago

I still don't think this is kernel issue. please path kernel with below change and try again.

diff --git a/net/ceph/ceph_common.c b/net/ceph/ceph_common.c
index 584fdbef2088..3ca2d0c65d20 100644
--- a/net/ceph/ceph_common.c
+++ b/net/ceph/ceph_common.c
@@ -432,6 +432,7 @@ ceph_parse_options(char *options, const char *dev_name,
                                err = -ENOMEM;
                                goto out;
                        }
+                       printk("name: %s\n", opt->name);
                        break;
                case Opt_secret:
                        ceph_crypto_key_destroy(opt->key);
@@ -442,6 +443,7 @@ ceph_parse_options(char *options, const char *dev_name,
                                err = -ENOMEM;
                                goto out;
                        }
+                       printk("secret: %s\n", argstr[0].from);
                        err = ceph_crypto_key_unarmor(opt->key, argstr[0].from);
                        if (err < 0)
                                goto out;

#17 Updated by Patrick Donnelly 4 months ago

  • Related to Bug #24879: mds: create health warning if we detect metadata (journal) writes are slow added

#18 Updated by Patrick Donnelly 4 months ago

  • Status changed from Need More Info to Closed

Closing this as it's a consequence of #24879.

Also available in: Atom PDF