Project

General

Profile

Actions

Bug #26968

open

klient: mount fails during MDS failover

Added by Patrick Donnelly over 5 years ago. Updated about 5 years ago.

Status:
New
Priority:
Urgent
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
kceph
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2018-08-18T13:23:04.302 INFO:tasks.mds_thrash.fs.[cephfs]:waiting till mds map indicates mds.c is laggy/crashed, in failed state, or mds.c is removed from mdsmap
2018-08-18T13:23:05.947 INFO:teuthology.orchestra.run.smithi177.stdout:parsing options: name=0,secretfile=/home/ubuntu/cephtest/ceph.data/client.0.secret,norequire_active_mds
2018-08-18T13:23:05.947 INFO:teuthology.orchestra.run.smithi177.stdout:mount error 5 = Input/output error
2018-08-18T13:23:05.948 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 89, in run_tasks
    manager.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20180817.222213/qa/tasks/kclient.py", line 108, in task
    kernel_mount.mount()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20180817.222213/qa/tasks/cephfs/kernel_mount.py", line 87, in mount
    opts
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 193, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 423, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 155, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 177, in _raise_for_status
    node=self.hostname, label=self.label
CommandFailedError: Command failed on smithi177 with status 5: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage /sbin/mount.ceph 172.21.15.150:6789,172.21.15.150:6790,172.21.15.19:6789:/ /home/ubuntu/cephtest/mnt.0 -v -o name=0,secretfile=/home/ubuntu/cephtest/ceph.data/client.0.secret,norequire_active_mds'

From: /ceph/teuthology-archive/pdonnell-2018-08-18_00:37:08-multimds-wip-pdonnell-testing-20180817.222213-testing-basic-smithi/2917598/teuthology.log

2018-08-18T13:22:05.925507+00:00 smithi177 kernel: libceph: mon0 172.21.15.150:6789 session established
2018-08-18T13:22:05.933328+00:00 smithi177 kernel: libceph: client4465 fsid 5c830fb4-2ae2-4b86-abf7-eba170e5bd75
2018-08-18T13:22:05.948427+00:00 smithi177 kernel: libceph: mds1 172.21.15.19:6813 socket closed (con state OPEN)

From: /ceph/teuthology-archive/pdonnell-2018-08-18_00:37:08-multimds-wip-pdonnell-testing-20180817.222213-testing-basic-smithi/2917598/remote/smithi177/syslog/kern.log.gz

Actions #1

Updated by Zheng Yan over 5 years ago

It's mount timeout. I think it's related to socket failure injection

2018-08-18 13:22:05.934 7f0ce1678700  1 -- 172.21.15.19:6813/4195740184 --> 172.21.15.177:0/4094852171 -- client_session(open) v3 -- 0x563e7f915200 con 0
2018-08-18 13:22:05.934 7f0ce9688700 10 -- 172.21.15.19:6813/4195740184 >> 172.21.15.177:0/4094852171 conn(0x563e7fbaae00 legacy :6813 s=STATE_OPEN pgs=3 cs=1 l=0).handle_write
2018-08-18 13:22:05.934 7f0ce9688700  0 -- 172.21.15.19:6813/4195740184 >> 172.21.15.177:0/4094852171 conn(0x563e7fbaae00 legacy :6813 s=STATE_OPEN pgs=3 cs=1 l=0)._try_send injecting socket failure
2018-08-18 13:22:05.934 7f0ce9688700  1 -- 172.21.15.19:6813/4195740184 >> 172.21.15.177:0/4094852171 conn(0x563e7fbaae00 legacy :6813 s=STATE_OPEN pgs=3 cs=1 l=0)._try_send send error: (32) Broken pipe
2018-08-18 13:22:05.934 7f0ce9688700  1 -- 172.21.15.19:6813/4195740184 >> 172.21.15.177:0/4094852171 conn(0x563e7fbaae00 legacy :6813 s=STATE_OPEN pgs=3 cs=1 l=0).write_message error sending 0x563e7f915200, (32) Broken pipe
2018-08-18 13:22:05.934 7f0ce9688700  1 -- 172.21.15.19:6813/4195740184 >> 172.21.15.177:0/4094852171 conn(0x563e7fbaae00 legacy :6813 s=STATE_OPEN pgs=3 cs=1 l=0).handle_write send msg failed
2018-08-18 13:22:05.934 7f0ce9688700  1 -- 172.21.15.19:6813/4195740184 >> 172.21.15.177:0/4094852171 conn(0x563e7fbaae00 legacy :6813 s=STATE_OPEN pgs=3 cs=1 l=0).handle_write send msg failed
Actions #2

Updated by Patrick Donnelly over 5 years ago

Zheng Yan wrote:

It's mount timeout. I think it's related to socket failure injection

[...]

OKay, how do you suggest we fix the test/kclient?

Actions #3

Updated by Patrick Donnelly about 5 years ago

  • Assignee deleted (Zheng Yan)
Actions

Also available in: Atom PDF