Actions
Bug #26968
openklient: mount fails during MDS failover
Status:
New
Priority:
Urgent
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
kceph
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2018-08-18T13:23:04.302 INFO:tasks.mds_thrash.fs.[cephfs]:waiting till mds map indicates mds.c is laggy/crashed, in failed state, or mds.c is removed from mdsmap 2018-08-18T13:23:05.947 INFO:teuthology.orchestra.run.smithi177.stdout:parsing options: name=0,secretfile=/home/ubuntu/cephtest/ceph.data/client.0.secret,norequire_active_mds 2018-08-18T13:23:05.947 INFO:teuthology.orchestra.run.smithi177.stdout:mount error 5 = Input/output error 2018-08-18T13:23:05.948 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 89, in run_tasks manager.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20180817.222213/qa/tasks/kclient.py", line 108, in task kernel_mount.mount() File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20180817.222213/qa/tasks/cephfs/kernel_mount.py", line 87, in mount opts File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 193, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 423, in run r.wait() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 155, in wait self._raise_for_status() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 177, in _raise_for_status node=self.hostname, label=self.label CommandFailedError: Command failed on smithi177 with status 5: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage /sbin/mount.ceph 172.21.15.150:6789,172.21.15.150:6790,172.21.15.19:6789:/ /home/ubuntu/cephtest/mnt.0 -v -o name=0,secretfile=/home/ubuntu/cephtest/ceph.data/client.0.secret,norequire_active_mds'
From: /ceph/teuthology-archive/pdonnell-2018-08-18_00:37:08-multimds-wip-pdonnell-testing-20180817.222213-testing-basic-smithi/2917598/teuthology.log
2018-08-18T13:22:05.925507+00:00 smithi177 kernel: libceph: mon0 172.21.15.150:6789 session established 2018-08-18T13:22:05.933328+00:00 smithi177 kernel: libceph: client4465 fsid 5c830fb4-2ae2-4b86-abf7-eba170e5bd75 2018-08-18T13:22:05.948427+00:00 smithi177 kernel: libceph: mds1 172.21.15.19:6813 socket closed (con state OPEN)
From: /ceph/teuthology-archive/pdonnell-2018-08-18_00:37:08-multimds-wip-pdonnell-testing-20180817.222213-testing-basic-smithi/2917598/remote/smithi177/syslog/kern.log.gz
Updated by Zheng Yan over 5 years ago
It's mount timeout. I think it's related to socket failure injection
2018-08-18 13:22:05.934 7f0ce1678700 1 -- 172.21.15.19:6813/4195740184 --> 172.21.15.177:0/4094852171 -- client_session(open) v3 -- 0x563e7f915200 con 0 2018-08-18 13:22:05.934 7f0ce9688700 10 -- 172.21.15.19:6813/4195740184 >> 172.21.15.177:0/4094852171 conn(0x563e7fbaae00 legacy :6813 s=STATE_OPEN pgs=3 cs=1 l=0).handle_write 2018-08-18 13:22:05.934 7f0ce9688700 0 -- 172.21.15.19:6813/4195740184 >> 172.21.15.177:0/4094852171 conn(0x563e7fbaae00 legacy :6813 s=STATE_OPEN pgs=3 cs=1 l=0)._try_send injecting socket failure 2018-08-18 13:22:05.934 7f0ce9688700 1 -- 172.21.15.19:6813/4195740184 >> 172.21.15.177:0/4094852171 conn(0x563e7fbaae00 legacy :6813 s=STATE_OPEN pgs=3 cs=1 l=0)._try_send send error: (32) Broken pipe 2018-08-18 13:22:05.934 7f0ce9688700 1 -- 172.21.15.19:6813/4195740184 >> 172.21.15.177:0/4094852171 conn(0x563e7fbaae00 legacy :6813 s=STATE_OPEN pgs=3 cs=1 l=0).write_message error sending 0x563e7f915200, (32) Broken pipe 2018-08-18 13:22:05.934 7f0ce9688700 1 -- 172.21.15.19:6813/4195740184 >> 172.21.15.177:0/4094852171 conn(0x563e7fbaae00 legacy :6813 s=STATE_OPEN pgs=3 cs=1 l=0).handle_write send msg failed 2018-08-18 13:22:05.934 7f0ce9688700 1 -- 172.21.15.19:6813/4195740184 >> 172.21.15.177:0/4094852171 conn(0x563e7fbaae00 legacy :6813 s=STATE_OPEN pgs=3 cs=1 l=0).handle_write send msg failed
Updated by Patrick Donnelly over 5 years ago
Zheng Yan wrote:
It's mount timeout. I think it's related to socket failure injection
[...]
OKay, how do you suggest we fix the test/kclient?
Actions