Bug #23332
kclient: with fstab entry is not coming up reboot
0%
Description
With 4 clients(2 fuse and 2 kernel clients), was running automated script for making fstab entry before reboot of clients . After reboot of each clients, I could see that client was not coming up ,after timeout of 600 seconds.
Related issues
History
#1 Updated by Patrick Donnelly about 6 years ago
- Subject changed from Client with fstab entry is not coming up reboot to kclient: with fstab entry is not coming up reboot
- Category deleted (
Testing) - Status changed from New to Need More Info
- Component(FS) kceph added
- Component(FS) deleted (
ceph-fuse)
You probably have the wrong mons in the mount configuration.
#2 Updated by Zheng Yan about 6 years ago
need add _netdev mount option
#3 Updated by Shreekara Shastry about 6 years ago
Patrick Donnelly wrote:
You probably have the wrong mons in the mount configuration.
I have only one mon.
#5 Updated by Shreekara Shastry about 6 years ago
- File ceph-mds.ceph-sshreeka-run379-node4-mds.log View added
- File ceph-mds.ceph-sshreeka-run379-node6-mds.log View added
- File cierror.txt View added
- File messages_ceph-sshreeka-run379-node5-client added
- File messages_ceph-sshreeka-run379-node7-client added
- File messages_ceph-sshreeka-run379-node8-client added
- File messages_ceph-sshreeka-run379-node9-client added
#7 Updated by Vasu Kulkarni about 6 years ago
Can someone please check the logs, we are having this issue 50% of the time in sanity runs.
#8 Updated by Patrick Donnelly almost 6 years ago
- Status changed from Need More Info to New
- Assignee set to Zheng Yan
#9 Updated by Zheng Yan almost 6 years ago
In messages_ceph-sshreeka-run379-node5-client
Mar 14 11:02:55 ceph-sshreeka-run379-node5-client mount: mount error 1 = Operation not permitted Mar 14 11:02:55 ceph-sshreeka-run379-node5-client kernel: libceph: auth method 'x' error -1
looks like fstab didn't include correct secret
#10 Updated by Shreekara Shastry almost 6 years ago
Zheng Yan wrote:
In messages_ceph-sshreeka-run379-node5-client
[...]looks like fstab didn't include correct secret
command mount -a works with same entries in fstab.
#11 Updated by Luis Henriques almost 6 years ago
Actually, the first failure seems to be a bit before:
Mar 14 08:10:00 ceph-sshreeka-run379-node5-client systemd: Stopping User Slice of root. Mar 14 08:10:42 ceph-sshreeka-run379-node5-client kernel: ceph: mds1 caps stale Mar 14 08:11:02 ceph-sshreeka-run379-node5-client kernel: ceph: mds1 caps stale Mar 14 08:14:43 ceph-sshreeka-run379-node5-client kernel: libceph: mds0 172.16.115.100:6800 socket closed (con state OPEN) Mar 14 08:14:46 ceph-sshreeka-run379-node5-client chronyd[499]: Source 45.76.244.202 replaced with 64.113.44.54 Mar 14 08:14:47 ceph-sshreeka-run379-node5-client kernel: libceph: mds0 172.16.115.100:6800 connection reset Mar 14 08:14:47 ceph-sshreeka-run379-node5-client kernel: libceph: reset on mds0 Mar 14 08:14:47 ceph-sshreeka-run379-node5-client kernel: ceph: mds0 closed our session Mar 14 08:14:47 ceph-sshreeka-run379-node5-client kernel: ceph: mds0 reconnect start Mar 14 08:14:47 ceph-sshreeka-run379-node5-client kernel: ceph: mds0 reconnect denied Mar 14 08:15:20 ceph-sshreeka-run379-node5-client systemd-logind: Removed session 20. Mar 14 08:15:20 ceph-sshreeka-run379-node5-client systemd: Removed slice User Slice of cephuser. Mar 14 08:15:20 ceph-sshreeka-run379-node5-client systemd: Stopping User Slice of cephuser. Mar 14 08:23:05 ceph-sshreeka-run379-node5-client kernel: libceph: auth method 'x' error -22 Mar 14 08:23:15 ceph-sshreeka-run379-node5-client kernel: libceph: auth method 'x' error -22 Mar 14 08:23:25 ceph-sshreeka-run379-node5-client kernel: libceph: auth method 'x' error -22 ...
The client seems to be trying to renew caps while connection is being closed.
It's possible that this has been fixed in recent kernels and the fix hasn't been backported to your distro. Maybe it's worth trying a recent kernel and try to reproduce it.
#12 Updated by Patrick Donnelly almost 6 years ago
- Priority changed from Normal to High
- Target version set to v13.0.0
#13 Updated by Shreekara Shastry almost 6 years ago
Luis Henriques wrote:
Actually, the first failure seems to be a bit before:
[...]
The client seems to be trying to renew caps while connection is being closed.It's possible that this has been fixed in recent kernels and the fix hasn't been backported to your distro. Maybe it's worth trying a recent kernel and try to reproduce it.
I've checked with the latest kernel version, still result was same
#14 Updated by Zheng Yan almost 6 years ago
kexec in dmesgs looks suspicious. client mounted cephfs, then used kexec to load kernel image again. All issues happened after kexec
#15 Updated by Patrick Donnelly almost 6 years ago
- Status changed from New to Need More Info
- Target version changed from v13.0.0 to v14.0.0
- ceph-qa-suite deleted (
fs)
Zheng Yan wrote:
kexec in dmesgs looks suspicious. client mounted cephfs, then used kexec to load kernel image again. All issues happened after kexec
What do you need from Shreekara to verify this is the issue?
#16 Updated by Zheng Yan almost 6 years ago
I still don't think this is kernel issue. please path kernel with below change and try again.
diff --git a/net/ceph/ceph_common.c b/net/ceph/ceph_common.c index 584fdbef2088..3ca2d0c65d20 100644 --- a/net/ceph/ceph_common.c +++ b/net/ceph/ceph_common.c @@ -432,6 +432,7 @@ ceph_parse_options(char *options, const char *dev_name, err = -ENOMEM; goto out; } + printk("name: %s\n", opt->name); break; case Opt_secret: ceph_crypto_key_destroy(opt->key); @@ -442,6 +443,7 @@ ceph_parse_options(char *options, const char *dev_name, err = -ENOMEM; goto out; } + printk("secret: %s\n", argstr[0].from); err = ceph_crypto_key_unarmor(opt->key, argstr[0].from); if (err < 0) goto out;
#17 Updated by Patrick Donnelly over 5 years ago
- Related to Bug #24879: mds: create health warning if we detect metadata (journal) writes are slow added
#18 Updated by Patrick Donnelly over 5 years ago
- Status changed from Need More Info to Closed
Closing this as it's a consequence of #24879.