Project

General

Profile

Actions

Bug #2444

closed

null pointer deference in ceph_d_prune inside kvm

Added by Josh Durgin almost 12 years ago. Updated over 11 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/6180:

Hello. I'm stress testing ceph since some time now, with quite good results. I really like ceph and will probably use in in some pre-production services.

Anyway I've seen some bugs.

One of them is instability if the kernel is running inside KVM, leading to a very fast (and reproductible) kernel oops. On bare metal this particular oops doesn't happen.

The kernel oops itself involve ceph, but it could be a real bug in kvm too.

The host machine is runnning 3.2.2
kvm is quite ancien (0.14)
guest OS is ubuntu 12.04 with his standard kernel. Retried with custom 3.2 kernel with the same problem.

I'm using ceph using mount -t ceph mon_adress:/ /mnt/temp

A simple recursive copy of /home lead to this kernel oops:

May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675559] BUG: unable to handle kernel NULL pointer dereference at   (null)
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675569] IP: [<f8379d8d>] ceph_d_prune+0x1d/0x30 [ceph]
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675579] *pde = 00000000
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675583] Oops: 0002 [#1] SMP
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675587] Modules linked in: ceph libceph libcrc32c zram(C) parport_pc rfcomm ppdev bnep lp bluetooth parport dm_crypt binfmt_misc psmouse mac_hid virtio_balloon serio_raw i2c_piix4 nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 floppy
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675605]
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675609] Pid: 27, comm: kswapd0 Tainted: G S      WC   3.2.0-24-generic #37-Ubuntu Bochs Bochs
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675614] EIP: 0060:[<f8379d8d>] EFLAGS: 00010282 CPU: 0
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675618] EIP is at ceph_d_prune+0x1d/0x30 [ceph]
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675621] EAX: 00000000 EBX: ed311480 ECX: cdf35a4c EDX: cdf35a00
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675623] ESI: ed3114e0 EDI: c8de4ccc EBP: f3e4bdec ESP: f3e4bdec
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675625]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675628] Process kswapd0 (pid: 27, ti=f3e4a000 task=f3d45860 task.ti=f3e4a000)
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675630] Stack:
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675632] f3e4bdfc c114503e ed311480 cdf35a00 f3e4be28 c1146bdf c8de5764 ed31164c
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675638] cdf35a4c f3e4be44 ed3114cc ed17dbe0 f1b5ac00 f1b5ac80 eafb38e0 f3e4be58
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675644] c114763e eafb38cc 00000000 f3e4be3c f3e4be3c f3e4be3c c91a5e60 ed3114e0
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675685] Call Trace:
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675769] [<c114503e>] dentry_lru_prune+0x6e/0x70
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675774] [<c1146bdf>] shrink_dentry_list+0x14f/0x270
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675777] [<c114763e>] prune_dcache_sb+0x10e/0x130
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675786] [<c113584a>] prune_super+0xfa/0x160
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675790] [<c10f6056>] shrink_slab+0x166/0x2e0
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675793] [<c10f7c47>] ? shrink_zone+0x137/0x190
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675796] [<c10f8074>] balance_pgdat+0x3d4/0x540
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675800] [<c10f82d1>] kswapd+0xf1/0x1b0
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675803] [<c10f81e0>] ? balance_pgdat+0x540/0x540
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675812] [<c1069b8d>] kthread+0x6d/0x80
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675815] [<c1069b20>] ? flush_kthread_worker+0x80/0x80
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675828] [<c157e37e>] kernel_thread_helper+0x6/0x10
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675830] Code: e5 3e 8d 74 26 00 b8 01 00 00 00 5d c3 90 55 89 e5 3e 8d 74 26 00 8b 50 10 85 d2 74 12 39 d0 74 0e 8b 40 0c 85 c0 74 07 8b 42 5c <f0> 80 20 fd 5d c3 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 89
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675861] EIP: [<f8379d8d>] ceph_d_prune+0x1d/0x30 [ceph] SS:ESP 0068:f3e4bdec
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675867] CR2: 0000000000000000
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675872] ---[ end trace a7919e7f17c0a727 ]---

Retried on another machine with kvm 1.0 :

May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.962997] BUG: unable to handle kernel NULL pointer dereference at   (null)
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963015] IP: [<f882fd8d>] ceph_d_prune+0x1d/0x30 [ceph]
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963038] *pde = 7f686067
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963045] Oops: 0002 [#1] SMP
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963057] Modules linked in: ceph libceph libcrc32c nfs lockd fscache auth_rpcgss nfs_acl sunrpc zram(C) dm_crypt rfcomm bnep parport_pc bluetooth ppdev lp parport mac_hid binfmt_misc psmouse serio_raw virtio_balloon i2c_piix4 nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 floppy
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963084]
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963091] Pid: 27, comm: kswapd0 Tainted: G S      WC   3.2.0-24-generic #37-Ubuntu Bochs Bochs
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963099] EIP: 0060:[<f882fd8d>] EFLAGS: 00010282 CPU: 0
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963104] EIP is at ceph_d_prune+0x1d/0x30 [ceph]
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963106] EAX: 00000000 EBX: eb00cd00 ECX: eb26f4cc EDX: eb26f480
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963108] ESI: eb00cd60 EDI: eb00cd60 EBP: f3e6ddec ESP: f3e6ddec
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963110]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963113] Process kswapd0 (pid: 27, ti=f3e6c000 task=f3d45860 task.ti=f3e6c000)
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963114] Stack:
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963116] f3e6ddfc c114503e eb00cd00 eb00cd4c f3e6de28 c1146c77 eb29a1fc eb26f4cc
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963124] eb26f480 f3e6de44 eb00cd00 eb00cd60 ebdc3400 ebdc3480 e9908060 f3e6de58
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963129] c114763e e990804c 00000000 f3e6de3c f3e6de3c f3e6de3c eb26f1e0 eb00cd60
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963134] Call Trace:
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963164] [<c114503e>] dentry_lru_prune+0x6e/0x70
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963169] [<c1146c77>] shrink_dentry_list+0x1e7/0x270
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963172] [<c114763e>] prune_dcache_sb+0x10e/0x130
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963179] [<c113584a>] prune_super+0xfa/0x160
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963183] [<c10f6056>] shrink_slab+0x166/0x2e0
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963190] [<c10f7c47>] ? shrink_zone+0x137/0x190
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963193] [<c10f8074>] balance_pgdat+0x3d4/0x540
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963196] [<c10f82d1>] kswapd+0xf1/0x1b0
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963199] [<c10f81e0>] ? balance_pgdat+0x540/0x540
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963207] [<c1069b8d>] kthread+0x6d/0x80
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963210] [<c1069b20>] ? flush_kthread_worker+0x80/0x80
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963223] [<c157e37e>] kernel_thread_helper+0x6/0x10
May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.963225] Code: e5 3e 8d 74 26 00 b8 01 00 00 00 5d c3 90 55 89 e5 3e 8d 74 26 00 8b 50 10 85 d2 74 12 39 d0 74 0e 8b 40 0c 85 c0 74 07 8b 42 5c <f0> 80 20 fd 5d c3 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 89
:
Actions #1

Updated by Christian Krafft almost 12 years ago

hi,
same bug here on native x86 and amd64 machines.
It affects debian wheezy and ubuntu 12.04 LTS.
I did not check upstream kernel though.

Actions #2

Updated by Alexandre Dupouy over 11 years ago

same bug here with Ceph 0.49 on Ubuntu 12.04 LTS (GNU/Linux 3.2.0-27-generic x86_64)

Actions #3

Updated by Alexandre Dupouy over 11 years ago

problem doesent seem to be reproductible after upgrading to 3.5.0-9-generic (Ubuntu Quantal)

Actions #4

Updated by Sage Weil over 11 years ago

Can you try

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index e5b7731..858be3f 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1127,7 +1127,8 @@ static void ceph_d_prune(struct dentry *dentry)
         * cleared until d_release
         */
        di = ceph_dentry(dentry->d_parent);
-       clear_bit(CEPH_D_COMPLETE, &di->flags);
+       if (di)
+               clear_bit(CEPH_D_COMPLETE, &di->flags);
 }

 /*

and see if that resolves it? (That is probably not the best fix, but it'll tell us more about what is going on.)

Actions #5

Updated by Sage Weil over 11 years ago

  • Project changed from Linux kernel client to CephFS
  • Category deleted (fs/ceph)
Actions #6

Updated by Sage Weil over 11 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF