Bug #2224
Oops in __cfh_to_dentry
Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Development
Tags:
Backport:
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
I setup an HA pair of NFS servers which re-export Ceph to NFS clients.
The HA pair is in active/standby mode, using Heartbeat+virtual IP mechanism.
Only active node acquires the virtual IP, mounts Ceph and runs NFS server.
I got the following oops when I tested the failover scenario:
1. Have one NFS client to mount nfs on /mnt/nfs. Run "ls -l /mnt/nfs" to list the existing files.
2. Reboot the active NFS server.
3. On the NFS client, run "ls -l /mnt/nfs" again.
The oops happens when the standby node becomes active and runs the NFS server. It is always reproducible.
Mar 23 11:02:16 ecctsrv11 kernel: [37542.584997] BUG: unable to handle kernel NULL pointer dereference at (null) Mar 23 11:02:16 ecctsrv11 kernel: [37542.587453] IP: [<ffffffff812dfb72>] strlen+0x2/0x20 Mar 23 11:02:16 ecctsrv11 kernel: [37542.588997] PGD 0 Mar 23 11:02:16 ecctsrv11 kernel: [37542.589617] Oops: 0000 [#1] SMP Mar 23 11:02:16 ecctsrv11 kernel: [37542.590641] last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map Mar 23 11:02:16 ecctsrv11 kernel: [37542.593043] CPU 2 Mar 23 11:02:16 ecctsrv11 kernel: [37542.593635] Modules linked in: ceph libceph nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc vesafb dcdbas psmouse serio_raw ghes hed joydev i7core_edac edac_core lp parport mptsas mptscsih igb mptbase e1000e usbhid hid dca btrfs scsi_transport_sas zlib_deflate libcrc32c Mar 23 11:02:16 ecctsrv11 kernel: [37542.602240] Mar 23 11:02:16 ecctsrv11 kernel: [37542.602673] Pid: 27635, comm: nfsd Not tainted 2.6.38-13-server #52 Dell DCS CS24-SC /345678 Mar 23 11:02:16 ecctsrv11 kernel: [37542.606578] RIP: 0010:[<ffffffff812dfb72>] [<ffffffff812dfb72>] strlen+0x2/0x20 Mar 23 11:02:16 ecctsrv11 kernel: [37542.608869] RSP: 0018:ffff880bd7937958 EFLAGS: 00010246 Mar 23 11:02:16 ecctsrv11 kernel: [37542.610487] RAX: 0000000000000000 RBX: ffff880bd7937a04 RCX: 000001000000001e Mar 23 11:02:16 ecctsrv11 kernel: [37542.612673] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612676] RBP: ffff880bd7937990 R08: ffff880bd79379f8 R09: ffff880bd7937a0c Mar 23 11:02:16 ecctsrv11 kernel: [37542.612678] R10: 00000000ea46b424 R11: ffff880bd5ba89c0 R12: ffff880bd7d62400 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612680] R13: ffff880bd79379e8 R14: 0000000000000000 R15: ffff880bd7d62428 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612683] FS: 0000000000000000(0000) GS:ffff8800bf440000(0000) knlGS:0000000000000000 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612686] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Mar 23 11:02:16 ecctsrv11 kernel: [37542.612688] CR2: 0000000000000000 CR3: 0000000001a03000 CR4: 00000000000006e0 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612690] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612692] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612695] Process nfsd (pid: 27635, threadinfo ffff880bd7936000, task ffff880bd78b16e0) Mar 23 11:02:16 ecctsrv11 kernel: [37542.612697] Stack: Mar 23 11:02:16 ecctsrv11 kernel: [37542.612699] ffffffffa0347e28 ffff880bd7937a0c ffffffff813a7664 0000000000000000 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612702] ffff880bd63fe000 ffff880bd7d62400 0000000000000000 ffff880bd7937a40 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612705] ffffffffa0347ed9 ffff880bd79379e8 ffff880bd7937a04 0000000000000000 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612708] Call Trace: Mar 23 11:02:16 ecctsrv11 kernel: [37542.612719] [<ffffffffa0347e28>] ? set_request_path_attr+0x148/0x170 [ceph] Mar 23 11:02:16 ecctsrv11 kernel: [37542.612725] [<ffffffff813a7664>] ? extract_entropy+0x94/0xd0 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612732] [<ffffffffa0347ed9>] create_request_message.clone.25+0x89/0x4f0 [ceph] Mar 23 11:02:16 ecctsrv11 kernel: [37542.612736] [<ffffffff813a7790>] ? get_random_bytes+0x20/0x30 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612742] [<ffffffffa034ae96>] ? ceph_mdsmap_get_random_mds+0x76/0xc0 [ceph] Mar 23 11:02:16 ecctsrv11 kernel: [37542.612749] [<ffffffffa0348432>] __prepare_send_request+0xf2/0x1a0 [ceph] Mar 23 11:02:16 ecctsrv11 kernel: [37542.612755] [<ffffffffa034869b>] __do_request+0x1bb/0x260 [ceph] Mar 23 11:02:16 ecctsrv11 kernel: [37542.612761] [<ffffffffa034669c>] ? __register_request+0xac/0x170 [ceph] Mar 23 11:02:16 ecctsrv11 kernel: [37542.612768] [<ffffffffa0349e32>] ceph_mdsc_do_request+0xa2/0x1b0 [ceph] Mar 23 11:02:16 ecctsrv11 kernel: [37542.612774] [<ffffffffa033d1fe>] __cfh_to_dentry+0x13e/0x1c0 [ceph] Mar 23 11:02:16 ecctsrv11 kernel: [37542.612780] [<ffffffffa033d293>] ceph_fh_to_dentry+0x13/0x30 [ceph] Mar 23 11:02:16 ecctsrv11 kernel: [37542.612784] [<ffffffffa013163c>] exportfs_decode_fh+0x5c/0x2b0 [exportfs] Mar 23 11:02:16 ecctsrv11 kernel: [37542.612793] [<ffffffffa02e5750>] ? nfsd_acceptable+0x0/0x120 [nfsd] Mar 23 11:02:16 ecctsrv11 kernel: [37542.612810] [<ffffffffa026aaec>] ? cache_check+0x6c/0x250 [sunrpc] Mar 23 11:02:16 ecctsrv11 kernel: [37542.612818] [<ffffffffa02eb353>] ? exp_find_key+0x63/0xb0 [nfsd] Mar 23 11:02:16 ecctsrv11 kernel: [37542.612823] [<ffffffff81153dcf>] ? kmem_cache_alloc_trace+0xef/0x110 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612828] [<ffffffff812a624d>] ? aa_dup_task_context+0x3d/0x70 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612832] [<ffffffff812ab6e0>] ? apparmor_cred_prepare+0x40/0x60 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612839] [<ffffffffa02e5aa2>] nfsd_set_fh_dentry+0x232/0x3a0 [nfsd] Mar 23 11:02:16 ecctsrv11 kernel: [37542.612844] [<ffffffff81091d90>] ? getboottime+0x30/0x40 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612850] [<ffffffffa02e63e3>] fh_verify+0x1b3/0x270 [nfsd] Mar 23 11:02:16 ecctsrv11 kernel: [37542.612858] [<ffffffffa02f166c>] nfsd3_proc_getattr+0x6c/0xe0 [nfsd] Mar 23 11:02:16 ecctsrv11 kernel: [37542.612864] [<ffffffffa02e29ee>] nfsd_dispatch+0xfe/0x240 [nfsd] Mar 23 11:02:16 ecctsrv11 kernel: [37542.612876] [<ffffffffa025ffe5>] svc_process_common+0x345/0x690 [sunrpc] Mar 23 11:02:16 ecctsrv11 kernel: [37542.612881] [<ffffffff8105f3d0>] ? default_wake_function+0x0/0x20 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612892] [<ffffffffa0260436>] svc_process+0x106/0x160 [sunrpc] Mar 23 11:02:16 ecctsrv11 kernel: [37542.612897] [<ffffffffa02e2102>] nfsd+0xc2/0x160 [nfsd] Mar 23 11:02:16 ecctsrv11 kernel: [37542.612902] [<ffffffffa02e2040>] ? nfsd+0x0/0x160 [nfsd] Mar 23 11:02:16 ecctsrv11 kernel: [37542.612907] [<ffffffff81086f86>] kthread+0x96/0xa0 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612913] [<ffffffff8100cde4>] kernel_thread_helper+0x4/0x10 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612916] [<ffffffff81086ef0>] ? kthread+0x0/0xa0 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612919] [<ffffffff8100cde0>] ? kernel_thread_helper+0x0/0x10 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612921] Code: 11 eb 1b 66 0f 1f 44 00 00 48 83 ea 01 48 39 d0 77 0c 0f b6 0a f6 81 00 73 63 81 20 75 eb c6 42 01 00 c9 c3 0f 1f 44 00 00 31 c0 <80> 3f 00 55 48 89 e5 74 11 48 89 f8 66 90 48 83 c0 01 80 38 00 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612942] RIP [<ffffffff812dfb72>] strlen+0x2/0x20 Mar 23 11:02:16 ecctsrv11 kernel: [37542.612945] RSP <ffff880bd7937958> Mar 23 11:02:16 ecctsrv11 kernel: [37542.612946] CR2: 0000000000000000 Mar 23 11:02:16 ecctsrv11 kernel: [37542.742497] ---[ end trace 437e9018f75dd144 ]---
I tried to fix it as follows. However, the kernel got page fault in handle_reply when it received the reply from mds. (See the attached video..)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index f8ba653..88aecd6 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -1601,11 +1601,13 @@ static int set_request_path_attr(struct inode *rinode, struct dentry *rdentry, r = build_dentry_path(rdentry, ppath, pathlen, ino, freepath); dout(" dentry %p %llx/%.*s\n", rdentry, *ino, *pathlen, *ppath); - } else if (rpath || rino) { + } else { *ino = rino; - *ppath = rpath; - *pathlen = strlen(rpath); - dout(" path %.*s\n", *pathlen, rpath); + if (rpath) { + *ppath = rpath; + *pathlen = strlen(rpath); + dout(" path %.*s\n", *pathlen, rpath); + } } return r;
History
#1 Updated by Sage Weil about 10 years ago
- Status changed from New to Rejected