Bug #23972
openCeph MDS Crash from client mounting aufs over cephfs
0%
Description
Here is a rough outline of my topology
https://pastebin.com/HQqbMxyj
---
I can reliably crash all (in my case 2) cephfs MDS from a client by trying to mount cephFS under AUFS. I am not sure what it is doing to cause this but the MDS will refuse to start until I 1.) Reboot my client to stop any more requests and 2.) Mark the current active MDS server as failed.
`ceph -s ` will report that the current monitors are up but the processes will be dead on both MDS servers:
Ceph health prior to trying to mount bridge cephfs with aufs
----------------------------------------------
ceph -s
cluster:
id: 9f58ee5a-7c5d-4d68-81ee-debe16322544
health: HEALTH_OK
services:
mon: 3 daemons, quorum kh08-8,kh09-8,kh10-8
mgr: kh08-8(active)
mds: cephfs-1/1/1 up {0=kh09-8=up:active}, 1 up:standby
osd: 570 osds: 570 up, 570 in
Client tries to mount aufs :: No output here it just hangs.
mount -vvv -t aufs -o br=/cephfs=rw:/mnt/aufs=rw -o udba=reval none /aufs
Monitors now report health_warn state
----------------------------------------------
root@kh08-8:~# ceph -s
cluster:
id: 9f58ee5a-7c5d-4d68-81ee-debe16322544
health: HEALTH_WARN
insufficient standby MDS daemons available
services:
mon: 3 daemons, quorum kh08-8,kh09-8,kh10-8
mgr: kh08-8(active)
mds: cephfs-1/1/1 up {0=kh10-8=up:active(laggy or crashed)}
At this point all mounts hang until I stop the client, mark the mds servers as failed, and restart the mds servers.
I tried installing the following packages (ceph-mds-dbg ceph-mgr-dbg ceph-mon-dbg ceph-osd-dbg ceph-test-dbg)
kh10-8 mds backtrace -- https://pastebin.com/bwqZGcfD
kh09-8 mds backtrace -- https://pastebin.com/vvGiXYVY
The log files are pretty large (one 4.1G and the other 200MB)
kh10-8 (200MB) mds log -- https://griffin-objstore.opensciencedatacloud.org/logs/ceph-mds.kh10-8.log
kh09-8 (4.1GB) mds log -- https://griffin-objstore.opensciencedatacloud.org/logs/ceph-mds.kh09-8.log
I am trying to mount aufs over the cephfs directory /aufstest so here are the last few lines from kh10-8 (secondary MDS server at the time) around the aufs mention.