Bug #21275
closedtest hang after mds evicts kclient
0%
Description
http://pulpito.ceph.com/zyan-2017-09-07_03:18:23-kcephfs-master-testing-basic-mira/
http://qa-proxy.ceph.com/teuthology/zyan-2017-09-07_03:18:23-kcephfs-master-testing-basic-mira/1603494/teuthology.log
http://qa-proxy.ceph.com/teuthology/zyan-2017-09-07_03:18:23-kcephfs-master-testing-basic-mira/1603494/teuthology.log
A python process at:
[ 4862.107710] RIP: 0033:0x7f8d18b55e8c [ 4862.107714] RSP: 002b:00007ffc96ef6a38 EFLAGS: 00000246 ORIG_RAX: 000000000000003d [ 4862.107721] RAX: ffffffffffffffda RBX: 00007f8d18f18c30 RCX: 00007f8d18b55e8c [ 4862.107724] RDX: 0000000000000000 RSI: 00007ffc96ef6a60 RDI: 0000000000001c6a [ 4862.107728] RBP: 000000000091bc60 R08: 00000000005c2242 R09: 0000000000000000 [ 4862.107732] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8d18f0dd00 [ 4862.107736] R13: 000000000000006a R14: 00007f8d18f0dd00 R15: 00007f8d18f0bd32 [ 4862.107759] python D 0 7274 7272 0x00000006 [ 4862.107766] Call Trace: [ 4862.107778] __schedule+0x41d/0xb60 [ 4862.107795] schedule+0x3d/0x90 [ 4862.107801] schedule_timeout+0x268/0x570 [ 4862.107811] ? wait_for_completion_killable_timeout+0x110/0x1a0 [ 4862.107821] ? trace_hardirqs_on_caller+0x11f/0x190 [ 4862.107831] wait_for_completion_killable_timeout+0x118/0x1a0 [ 4862.107836] ? wait_for_completion_killable_timeout+0x118/0x1a0 [ 4862.107844] ? wake_up_q+0x70/0x70 [ 4862.107876] ceph_mdsc_do_request+0x1da/0x2d0 [ceph] [ 4862.107899] ceph_lock_message+0x12f/0x2c0 [ceph] [ 4862.107925] ceph_lock+0x91/0x1d0 [ceph] [ 4862.107937] vfs_lock_file+0x30/0x50 [ 4862.107943] locks_remove_posix+0xb8/0x210 [ 4862.107964] ? rcu_read_lock_sched_held+0x89/0xa0 [ 4862.107970] ? kmem_cache_free+0x2c4/0x2f0 [ 4862.107990] filp_close+0x4e/0x70 [ 4862.107999] put_files_struct+0x75/0xe0 [ 4862.108010] exit_files+0x47/0x50 [ 4862.108019] do_exit+0x2fd/0xc80 [ 4862.108027] ? get_signal+0x317/0x8f0 [ 4862.108038] do_group_exit+0x50/0xd0 [ 4862.108046] get_signal+0x254/0x8f0 [ 4862.108066] do_signal+0x28/0x720 [ 4862.108083] ? _copy_to_user+0x5b/0x70 [ 4862.108092] ? poll_select_copy_remaining+0xd9/0x120 [ 4862.108109] exit_to_usermode_loop+0x80/0xc0 [ 4862.108119] syscall_return_slowpath+0xc8/0xd0 [ 4862.108127] entry_SYSCALL_64_fastpath+0xc0/0xc2
Updated by Zheng Yan over 6 years ago
static struct ceph_msg *create_session_open_msg(struct ceph_mds_client *mdsc, u64 seq) { struct ceph_msg *msg; struct ceph_mds_session_head *h; int i = -1; int metadata_bytes = 0; int metadata_key_count = 0; struct ceph_options *opt = mdsc->fsc->client->options; struct ceph_mount_options *fsopt = mdsc->fsc->mount_options; void *p; const char* metadata[][2] = { {"hostname", utsname()->nodename}, {"kernel_version", utsname()->release}, {"entity_id", opt->name ? : ""}, {"root", fsopt->server_path ? : "/"}, {NULL, NULL} };
The panic is caused by "ustname() return NULL when process exits".
Updated by Jeff Layton over 6 years ago
Got it. I think we've hit problems like that in NFS, and what we had to do is save copies of the fields from utsname() that we'll need later (see rpc_clnt_set_nodename()). In this case, I think you want copies of nodename and release, maybe put them in the mdsc?
That said...once the MDS has evicted the client, we should just tear down any state that it holds on it (including locks). There really should be no reason to issue calls to the MDS to tear down state that we no longer hold, right?
In fact, note too that you have a signal pending here, so a call to wait_for_completion_killable_timeout is going to immediately return, most likely.
Updated by Patrick Donnelly over 6 years ago
- Status changed from New to Fix Under Review
Patch is on ceph-devel.
Updated by Zheng Yan over 6 years ago
with kernel fixes, the test case still hang at umount. http://qa-proxy.ceph.com/teuthology/zyan-2017-09-12_01:10:12-kcephfs-master-testing-basic-mira/1620665/teuthology.log
Updated by Patrick Donnelly over 6 years ago
- Status changed from Fix Under Review to Resolved
Updated by Patrick Donnelly over 6 years ago
- Has duplicate Bug #21468: kcephfs: hang during umount added
Updated by Patrick Donnelly over 6 years ago
- Status changed from Resolved to Pending Backport
- Backport set to luminous
Updated by Nathan Cutler over 6 years ago
- Copied to Backport #21473: luminous: test hang after mds evicts kclient added
Updated by Nathan Cutler over 6 years ago
- Status changed from Pending Backport to Resolved