Actions
Bug #2444
closednull pointer deference in ceph_d_prune inside kvm
Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
From http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/6180:
Hello. I'm stress testing ceph since some time now, with quite good results. I really like ceph and will probably use in in some pre-production services. Anyway I've seen some bugs. One of them is instability if the kernel is running inside KVM, leading to a very fast (and reproductible) kernel oops. On bare metal this particular oops doesn't happen. The kernel oops itself involve ceph, but it could be a real bug in kvm too. The host machine is runnning 3.2.2 kvm is quite ancien (0.14) guest OS is ubuntu 12.04 with his standard kernel. Retried with custom 3.2 kernel with the same problem. I'm using ceph using mount -t ceph mon_adress:/ /mnt/temp A simple recursive copy of /home lead to this kernel oops: May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675559] BUG: unable to handle kernel NULL pointer dereference at (null) May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675569] IP: [<f8379d8d>] ceph_d_prune+0x1d/0x30 [ceph] May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675579] *pde = 00000000 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675583] Oops: 0002 [#1] SMP May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675587] Modules linked in: ceph libceph libcrc32c zram(C) parport_pc rfcomm ppdev bnep lp bluetooth parport dm_crypt binfmt_misc psmouse mac_hid virtio_balloon serio_raw i2c_piix4 nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 floppy May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675605] May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675609] Pid: 27, comm: kswapd0 Tainted: G S WC 3.2.0-24-generic #37-Ubuntu Bochs Bochs May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675614] EIP: 0060:[<f8379d8d>] EFLAGS: 00010282 CPU: 0 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675618] EIP is at ceph_d_prune+0x1d/0x30 [ceph] May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675621] EAX: 00000000 EBX: ed311480 ECX: cdf35a4c EDX: cdf35a00 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675623] ESI: ed3114e0 EDI: c8de4ccc EBP: f3e4bdec ESP: f3e4bdec May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675625] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675628] Process kswapd0 (pid: 27, ti=f3e4a000 task=f3d45860 task.ti=f3e4a000) May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675630] Stack: May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675632] f3e4bdfc c114503e ed311480 cdf35a00 f3e4be28 c1146bdf c8de5764 ed31164c May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675638] cdf35a4c f3e4be44 ed3114cc ed17dbe0 f1b5ac00 f1b5ac80 eafb38e0 f3e4be58 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675644] c114763e eafb38cc 00000000 f3e4be3c f3e4be3c f3e4be3c c91a5e60 ed3114e0 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675685] Call Trace: May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675769] [<c114503e>] dentry_lru_prune+0x6e/0x70 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675774] [<c1146bdf>] shrink_dentry_list+0x14f/0x270 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675777] [<c114763e>] prune_dcache_sb+0x10e/0x130 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675786] [<c113584a>] prune_super+0xfa/0x160 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675790] [<c10f6056>] shrink_slab+0x166/0x2e0 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675793] [<c10f7c47>] ? shrink_zone+0x137/0x190 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675796] [<c10f8074>] balance_pgdat+0x3d4/0x540 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675800] [<c10f82d1>] kswapd+0xf1/0x1b0 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675803] [<c10f81e0>] ? balance_pgdat+0x540/0x540 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675812] [<c1069b8d>] kthread+0x6d/0x80 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675815] [<c1069b20>] ? flush_kthread_worker+0x80/0x80 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675828] [<c157e37e>] kernel_thread_helper+0x6/0x10 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675830] Code: e5 3e 8d 74 26 00 b8 01 00 00 00 5d c3 90 55 89 e5 3e 8d 74 26 00 8b 50 10 85 d2 74 12 39 d0 74 0e 8b 40 0c 85 c0 74 07 8b 42 5c <f0> 80 20 fd 5d c3 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 89 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675861] EIP: [<f8379d8d>] ceph_d_prune+0x1d/0x30 [ceph] SS:ESP 0068:f3e4bdec May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675867] CR2: 0000000000000000 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675872] ---[ end trace a7919e7f17c0a727 ]--- Retried on another machine with kvm 1.0 : May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.962997] BUG: unable to handle kernel NULL pointer dereference at (null) May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963015] IP: [<f882fd8d>] ceph_d_prune+0x1d/0x30 [ceph] May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963038] *pde = 7f686067 May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963045] Oops: 0002 [#1] SMP May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963057] Modules linked in: ceph libceph libcrc32c nfs lockd fscache auth_rpcgss nfs_acl sunrpc zram(C) dm_crypt rfcomm bnep parport_pc bluetooth ppdev lp parport mac_hid binfmt_misc psmouse serio_raw virtio_balloon i2c_piix4 nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 floppy May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963084] May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963091] Pid: 27, comm: kswapd0 Tainted: G S WC 3.2.0-24-generic #37-Ubuntu Bochs Bochs May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963099] EIP: 0060:[<f882fd8d>] EFLAGS: 00010282 CPU: 0 May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963104] EIP is at ceph_d_prune+0x1d/0x30 [ceph] May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963106] EAX: 00000000 EBX: eb00cd00 ECX: eb26f4cc EDX: eb26f480 May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963108] ESI: eb00cd60 EDI: eb00cd60 EBP: f3e6ddec ESP: f3e6ddec May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963110] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963113] Process kswapd0 (pid: 27, ti=f3e6c000 task=f3d45860 task.ti=f3e6c000) May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963114] Stack: May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963116] f3e6ddfc c114503e eb00cd00 eb00cd4c f3e6de28 c1146c77 eb29a1fc eb26f4cc May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963124] eb26f480 f3e6de44 eb00cd00 eb00cd60 ebdc3400 ebdc3480 e9908060 f3e6de58 May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963129] c114763e e990804c 00000000 f3e6de3c f3e6de3c f3e6de3c eb26f1e0 eb00cd60 May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963134] Call Trace: May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963164] [<c114503e>] dentry_lru_prune+0x6e/0x70 May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963169] [<c1146c77>] shrink_dentry_list+0x1e7/0x270 May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963172] [<c114763e>] prune_dcache_sb+0x10e/0x130 May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963179] [<c113584a>] prune_super+0xfa/0x160 May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963183] [<c10f6056>] shrink_slab+0x166/0x2e0 May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963190] [<c10f7c47>] ? shrink_zone+0x137/0x190 May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963193] [<c10f8074>] balance_pgdat+0x3d4/0x540 May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963196] [<c10f82d1>] kswapd+0xf1/0x1b0 May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963199] [<c10f81e0>] ? balance_pgdat+0x540/0x540 May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963207] [<c1069b8d>] kthread+0x6d/0x80 May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963210] [<c1069b20>] ? flush_kthread_worker+0x80/0x80 May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963223] [<c157e37e>] kernel_thread_helper+0x6/0x10 May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.963225] Code: e5 3e 8d 74 26 00 b8 01 00 00 00 5d c3 90 55 89 e5 3e 8d 74 26 00 8b 50 10 85 d2 74 12 39 d0 74 0e 8b 40 0c 85 c0 74 07 8b 42 5c <f0> 80 20 fd 5d c3 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 89 :
Updated by Christian Krafft almost 12 years ago
hi,
same bug here on native x86 and amd64 machines.
It affects debian wheezy and ubuntu 12.04 LTS.
I did not check upstream kernel though.
Updated by Alexandre Dupouy over 11 years ago
same bug here with Ceph 0.49 on Ubuntu 12.04 LTS (GNU/Linux 3.2.0-27-generic x86_64)
Updated by Alexandre Dupouy over 11 years ago
problem doesent seem to be reproductible after upgrading to 3.5.0-9-generic (Ubuntu Quantal)
Updated by Sage Weil over 11 years ago
Can you try
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c index e5b7731..858be3f 100644 --- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -1127,7 +1127,8 @@ static void ceph_d_prune(struct dentry *dentry) * cleared until d_release */ di = ceph_dentry(dentry->d_parent); - clear_bit(CEPH_D_COMPLETE, &di->flags); + if (di) + clear_bit(CEPH_D_COMPLETE, &di->flags); } /*
and see if that resolves it? (That is probably not the best fix, but it'll tell us more about what is going on.)
Updated by Sage Weil over 11 years ago
- Project changed from Linux kernel client to CephFS
- Category deleted (
fs/ceph)
Updated by Sage Weil over 11 years ago
- Status changed from New to Can't reproduce
Actions