Actions
Bug #53082
closedceph-fuse: segmenetation fault in Client::handle_mds_map
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2021-10-26T22:49:17.843 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi017.stderr:*** Caught signal (Segmentation fault) ** 2021-10-26T22:49:17.843 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi017.stderr: in thread 7f20f77fe700 thread_name:ms_dispatch 2021-10-26T22:49:17.844 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi017.stderr: ceph version 17.0.0-8469-g3dd8f159 (3dd8f1596bf5dc8769506e3cff803328189f20b1) quincy (dev) 2021-10-26T22:49:17.844 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi017.stderr: 1: /lib64/libpthread.so.0(+0x12b20) [0x7f2114ceeb20] 2021-10-26T22:49:17.845 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi017.stderr: 2: (Client::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)+0x617) [0x55dfb7943287] 2021-10-26T22:49:17.845 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi017.stderr: 3: (Client::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x7fd) [0x55dfb79448ad] 2021-10-26T22:49:17.845 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi017.stderr: 4: (DispatchQueue::entry()+0x14fa) [0x7f2115b3c37a] 2021-10-26T22:49:17.845 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi017.stderr: 5: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f2115bf2741] 2021-10-26T22:49:17.846 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi017.stderr: 6: /lib64/libpthread.so.0(+0x814a) [0x7f2114ce414a] 2021-10-26T22:49:17.846 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi017.stderr: 7: clone()
From: /ceph/teuthology-archive/pdonnell-2021-10-26_22:12:03-fs-wip-pdonnell-testing-20211025.000447-distro-basic-smithi/6462553/teuthology.log
I tried digging into the core file but getting the right packages set up with centos:stream has been difficult. So far I can't see what caused this. Hopefully we can reproduce this again with centos:8.
It's possible but unlikely this is related to #52436.
Updated by Xiubo Li over 2 years ago
There has some logs before the corruption:
-8> 2021-10-26T22:49:17.842+0000 7f20f77fe700 1 client.4692 handle_mds_map epoch 87 -7> 2021-10-26T22:49:17.842+0000 7f20f77fe700 1 -- 192.168.0.1:0/3403711952 >> [v2:172.21.15.73:6838/301218788,v1:172.21.15.73:6839/301218788] conn(0x7f20b809bbf0 msgr2=0x7f20b80d1b20 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=0).mark_down -6> 2021-10-26T22:49:17.842+0000 7f20f77fe700 1 --2- 192.168.0.1:0/3403711952 >> [v2:172.21.15.73:6838/301218788,v1:172.21.15.73:6839/301218788] conn(0x7f20b809bbf0 0x7f20b80d1b20 crc :-1 s=READY pgs=172 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).stop -5> 2021-10-26T22:49:17.842+0000 7f20f77fe700 5 client.4692 _closed_mds_session mds.2 seq 0 -4> 2021-10-26T22:49:17.842+0000 7f20f77fe700 1 -- 192.168.0.1:0/3403711952 >> [v2:172.21.15.73:6838/301218788,v1:172.21.15.73:6839/301218788] conn(0x7f20b809bbf0 msgr2=0x7f20b80d1b20 crc :-1 s=STATE_CLOSED l=0).mark_down -3> 2021-10-26T22:49:17.842+0000 7f20f77fe700 1 --2- 192.168.0.1:0/3403711952 >> [v2:172.21.15.73:6838/301218788,v1:172.21.15.73:6839/301218788] conn(0x7f20b809bbf0 0x7f20b80d1b20 crc :-1 s=CLOSED pgs=172 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).stop -2> 2021-10-26T22:49:17.842+0000 7f20f77fe700 10 client.4692 remove_session_caps mds.2 -1> 2021-10-26T22:49:17.842+0000 7f20f77fe700 10 client.4692 kick_requests_closed for mds.2 0> 2021-10-26T22:49:17.843+0000 7f20f77fe700 -1 *** Caught signal (Segmentation fault) **
It should be crashed in kick_requests_closed() or after it, and checked the code, since it was crash in the libpthread.so:
2021-10-26T22:49:17.844 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.smithi017.stderr: 1: /lib64/libpthread.so.0(+0x12b20) [0x7f2114ceeb20]
It seems caused by some pthread_XX ? Such when doing `req->caller_cond->notify_all();` it will finally call `pthread_cond_broadcast()` ?
Updated by Xiubo Li over 2 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 44038
Updated by Venky Shankar over 2 years ago
- Status changed from Fix Under Review to Resolved
Actions