Bug #64856
openmds crashes when extracting from a tar is cancelled
0%
Description
On fresh vstart
cluster following commands were run -
./bin/ceph config set mon mon_allow_pool_delete true ./bin/ceph config set mds mds_cache_memory_limit 1G ./bin/ceph config set mds mds_health_cache_threshold 1.000001 ./bin/ceph fs set a allow_standby_replay true sleep 2 ./bin/ceph status | grep "hot standby" ./bin/ceph-fuse cephfs1 tar -xv -f linux-6.7.9.tar.xz
This extraction takes some time. After 2-5 seconds into extracting, when it is cancelled through ctrl-c
, the tar
command hangs and prompt never returns. Hitting ctrl-c
multiple times has no effect; tar
commands remain stuck and prompt doesn't return. Hitting enter too few times doesn't help. As per ps
output, tar process is still running. Killing tar
command using kill <tar-pid>
, sudo kill <tar-pid>
and sudo kill -9 <tar-pid>
has no effect.
After unmount CephFS using sudo umount -lf <cephfs-mntp>
, the prompt returns, ps
doesn't report tar
anymore and MDS crashes instantaneously. In some cases, tar
doesn't hang but MDS definitely crashes everytime.
Reproducing this issue was successfully also when MDS cache size was set to 50M, 100M, 500M, 1G, 2G and 4G.
Reproduciblity: 10/10 times, but minutely lesser with 2G, even lesser with 4G. I've reproduced this around 70 times with different cache sizes on the main branch as well as on feature branch on which this issue was discovered.
Updated by Greg Farnum about 1 month ago
- Assignee changed from Dhairya Parmar to Rishabh Dave
Rishabh, this sounds a lot like you're just putting too much of a metadata workload in for the MDS to handle with constrained memory. Have you debugged what is going on at all beyond the apparent hang? Is the MDS swapping or spending all its time doing RADOS IO?