Project

General

Profile

Actions

Bug #46883

closed

kclient: ghost kernel mount

Added by Patrick Donnelly over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
kceph, qa-suite
Labels (FS):
qa, qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Relevant snippets of teuthology log:

2020-08-08T16:09:17.162 INFO:teuthology.orchestra.run.smithi114:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 900 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-mds.a.asok --format=json session evict 6737
...
2020-08-08T16:09:18.433 DEBUG:tasks.cephfs.kernel_mount:Unmounting client client.0...
2020-08-08T16:09:18.433 INFO:teuthology.orchestra.run:Running command with timeout 900
2020-08-08T16:09:18.434 INFO:teuthology.orchestra.run.smithi072:> sudo umount /home/ubuntu/cephtest/mnt.0 -f
2020-08-08T16:09:18.479 INFO:teuthology.orchestra.run.smithi072.stderr:umount: /home/ubuntu/cephtest/mnt.0: target is busy.
...
2020-08-08T16:09:20.022 INFO:teuthology.orchestra.run.smithi072:> sudo umount -f -l /home/ubuntu/cephtest/mnt.0
2020-08-08T16:09:20.270 INFO:teuthology.orchestra.run:Running command with timeout 300
2020-08-08T16:09:20.271 INFO:teuthology.orchestra.run.smithi072:> (cd /home/ubuntu/cephtest && exec rmdir -- /home/ubuntu/cephtest/mnt.0)
...
020-08-08T16:09:22.489 INFO:tasks.cephfs.mount:Ready to start KernelMount...
2020-08-08T16:09:22.489 INFO:tasks.cephfs.mount:Setting the 'ceph-ns--home-ubuntu-cephtest-mnt.0' netns for '/home/ubuntu/cephtest/mnt.0'
...
2020-08-08T16:09:22.841 INFO:teuthology.orchestra.run.smithi072:> mkdir -p /home/ubuntu/cephtest/mnt.0
2020-08-08T16:09:22.895 INFO:teuthology.orchestra.run:Running command with timeout 1800
2020-08-08T16:09:22.896 INFO:teuthology.orchestra.run.smithi072:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage nsenter --net=/var/run/netns/ceph-ns--home-ubuntu-cephtest-mnt.0 /bin/mount -t ceph :/ /home/ubuntu/cephtest/mnt.0 -v -o name=0,norequire_active_mds,conf=/etc/ceph/ceph.conf
2020-08-08T16:09:22.992 INFO:teuthology.orchestra.run.smithi072.stdout:parsing options: rw,name=0,norequire_active_mds,conf=/etc/ceph/ceph.conf
2020-08-08T16:09:22.993 INFO:teuthology.orchestra.run.smithi072.stdout:mount.ceph: options "name=0,norequire_active_mds" will pass to kernel.
2020-08-08T16:09:22.995 INFO:teuthology.orchestra.run.smithi072.stdout:parsing options: rw,name=0,norequire_active_mds,conf=/etc/ceph/ceph.conf
2020-08-08T16:09:22.996 INFO:teuthology.orchestra.run.smithi072.stdout:mount.ceph: options "name=0,norequire_active_mds" will pass to kernel.
2020-08-08T16:09:22.998 INFO:teuthology.orchestra.run:Running command with timeout 300
2020-08-08T16:09:22.998 INFO:teuthology.orchestra.run.smithi072:> sudo chmod 1777 /home/ubuntu/cephtest/mnt.0
2020-08-08T16:09:25.966 INFO:teuthology.orchestra.run.smithi072.stderr:chmod: cannot access '/home/ubuntu/cephtest/mnt.0': Permission denied
2020-08-08T16:09:25.968 DEBUG:teuthology.orchestra.run:got remote process result: 1

From: /ceph/teuthology-archive/pdonnell-2020-08-08_02:19:19-kcephfs-wip-pdonnell-testing-20200808.001303-distro-basic-smithi/5319130/teuthology.log

Then from the kernel syslog:

2020-08-08T16:09:05.425102+00:00 smithi072 kernel: libceph: mon2 (1)172.21.15.197:6790 session established
2020-08-08T16:09:05.425176+00:00 smithi072 kernel: libceph: client6737 fsid 2a4fe1bc-6c9e-4855-8e2e-c3febc31e26b
2020-08-08T16:09:18.271102+00:00 smithi072 kernel: libceph: mds0 (1)172.21.15.114:6835 socket closed (con state OPEN)
2020-08-08T16:09:18.430096+00:00 smithi072 kernel: ceph-brx: port 1(brx.0) entered disabled state
2020-08-08T16:09:18.478710+00:00 smithi072 kernel: ceph:  dropping file locks for 000000007fa0e095 1099511627776
2020-08-08T16:09:18.478762+00:00 smithi072 kernel: ceph:  dropping file locks for 000000007dd06430 1099511627777
2020-08-08T16:09:22.843915+00:00 smithi072 kernel: IPv6: ADDRCONF(NETDEV_UP): brx.0: link is not ready
2020-08-08T16:09:22.843998+00:00 smithi072 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): brx.0: link becomes ready
2020-08-08T16:09:22.844030+00:00 smithi072 kernel: ceph-brx: port 1(brx.0) entered blocking state 
2020-08-08T16:09:22.855959+00:00 smithi072 kernel: ceph-brx: port 1(brx.0) entered forwarding state 
2020-08-08T16:09:25.578999+00:00 smithi072 kernel: libceph: mds0 (1)172.21.15.114:6835 connection reset
2020-08-08T16:09:25.579048+00:00 smithi072 kernel: libceph: reset on mds0
2020-08-08T16:09:25.579072+00:00 smithi072 kernel: ceph: mds0 closed our session
2020-08-08T16:09:25.583476+00:00 smithi072 kernel: ceph: mds0 reconnect start
2020-08-08T16:09:25.597111+00:00 smithi072 kernel: libceph: mds0 (1)172.21.15.114:6835 socket closed (con state NEGOTIATING)
2020-08-08T16:09:25.965071+00:00 smithi072 kernel: ceph: mds0 rejected session

From: /ceph/teuthology-archive/pdonnell-2020-08-08_02:19:19-kcephfs-wip-pdonnell-testing-20200808.001303-distro-basic-smithi/5319130/remote/smithi072/syslog/kern.log.gz

So the client started at 16:09:05 was killed by the test suite 16:09:17. The test kills the mount and cleans up:

https://github.com/ceph/ceph/blob/9b2d0a90fc890029ff559bcc5cc88b01bebb4dea/qa/tasks/cephfs/test_client_recovery.py#L433-L434

and then does a new mount but gets "Permission denied" when running chmod.

What's odd is I don't see the usual evidence of a new kernel mount, i.e. something like

libceph: mon2 (1)172.21.15.197:6790 session established

in the log at 16:09:22. It appears the old mount persists and that is what the MDS is showing in some of its traffic:

2020-08-08T16:09:25.586+0000 7f7509fad700  1 --1- [v2:172.21.15.114:6834/980265380,v1:172.21.15.114:6835/980265380] >>  conn(0x563d8dc88800 0x563d8dbf9000 :6835 s=ACCEPTING pgs=0 cs=0 l=0).send_server_banner sd=38 legacy v1:172.21.15.114:6835/980265380 socket_addr v1:172.21.15.114:6835/980265380 target_addr v1:172.21.15.72:47482/0
2020-08-08T16:09:25.586+0000 7f7509fad700 10 mds.a parse_caps: parsing auth_cap_str='allow'
2020-08-08T16:09:25.586+0000 7f75077a8700 10 mds.a  new session 0x563d8dbc4880 for client.6737 v1:172.21.15.72:0/1351493818 con 0x563d8dc88800
2020-08-08T16:09:25.586+0000 7f75077a8700 10 mds.a parse_caps: parsing auth_cap_str='allow'
2020-08-08T16:09:25.586+0000 7f75077a8700 10 mds.a ms_handle_accept v1:172.21.15.72:0/1351493818 con 0x563d8dc88800 session 0x563d8dbc4880
2020-08-08T16:09:25.586+0000 7f75077a8700  1 -- [v2:172.21.15.114:6834/980265380,v1:172.21.15.114:6835/980265380] <== client.6737 v1:172.21.15.72:0/1351493818 1 ==== client_reconnect(0 caps 0 realms ) v3 ==== 0+0+4 (unknown 0 0 0) 0x563d8db04ea0 con 0x563d8dc88800
2020-08-08T16:09:25.586+0000 7f75077a8700  7 mds.0.server handle_client_reconnect client.6737
2020-08-08T16:09:25.586+0000 7f75077a8700 20 mds.0.82 get_session have 0x563d8dbc4880 client.6737 v1:172.21.15.72:0/1351493818 state closed
2020-08-08T16:09:25.586+0000 7f75077a8700  0 mds.0.server  ignoring msg from not-open sessionclient_reconnect(0 caps 0 realms ) v3

From: /ceph/teuthology-archive/pdonnell-2020-08-08_02:19:19-kcephfs-wip-pdonnell-testing-20200808.001303-distro-basic-smithi/5319130/remote/smithi114/log/ceph-mds.a.log.gz

Actions

Also available in: Atom PDF