Project

General

Profile

Actions

Bug #3088

closed

NULL pointer dereference at ceph_d_prune

Added by Matt Garner over 11 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

when I mount (via fstab) a specific folder
10.10.140.210:6789:/pool-hs-san-1 /mnt/ceph1-pool-hs-san-1 ceph name=admin,noauto,rw,noexec,nodev,noatime,nodiratime 0 2

the mount is successful:

[112151.148435] libceph: loaded (mon/osd proto 15/24, osdmap 5/6 5/6)
[112151.170881] ceph: loaded (mds proto 32)

when I mount the cephfs root (via fstab)
10.10.140.210:6789:/ /mnt/ceph1-kernel ceph name=admin,noauto,rw,noexec,nodev,noatime,nodiratime 0 2

I get the following to dmesg:

[112151.173070] libceph: client0 fsid 7e9dfb5f-4af9-4f65-a2fe-47cc19766243
[112151.173902] libceph: mon0 10.10.140.210:6789 session established
[112186.650920] BUG: unable to handle kernel NULL pointer dereference at (null)
[112186.651826] IP: [<ffffffffa02590d2>] ceph_d_prune+0x22/0x30 [ceph]
[112186.652536] PGD 4010067 PUD 5237067 PMD 0
[112186.653010] Oops: 0002 [#1] SMP
[112186.653010] CPU 1
[112186.653010] Modules linked in: ceph libceph libcrc32c dcdbas bonding serio_raw xgifb(C) shpchp i3000_edac edac_core mac_hid lp parport xfs raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov e1000 tg3 raid6_pq async_tx raid1 raid0 multipath linear
[112186.653010]
[112186.653010] Pid: 11104, comm: umount Tainted: G C 3.2.0-23-generic #36-Ubuntu Dell Computer Corporation PowerEdge 850/0FJ365
[112186.653010] RIP: 0010:[<ffffffffa02590d2>] [<ffffffffa02590d2>] ceph_d_prune+0x22/0x30 [ceph]
[112186.653010] RSP: 0018:ffff880036e69e08 EFLAGS: 00010282
[112186.653010] RAX: 0000000000000000 RBX: ffff8800058bcc00 RCX: 0000000180150010
[112186.653010] RDX: ffff8800058bcca0 RSI: ffffea00012c1100 RDI: ffff8800058bcc00
[112186.653010] RBP: ffff880036e69e08 R08: 0000000000000001 R09: 0000000000000000
[112186.653010] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8800058bcc80
[112186.653010] R13: 0000000001f113c0 R14: 0000000000000000 R15: 0000000000000000
[112186.653010] FS: 00007fb1b7979800(0000) GS:ffff88010fd00000(0000) knlGS:0000000000000000
[112186.653010] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[112186.653010] CR2: 0000000000000000 CR3: 0000000003c25000 CR4: 00000000000006e0
[112186.653010] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[112186.653010] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[112186.653010] Process umount (pid: 11104, threadinfo ffff880036e68000, task ffff8800051ec4d0)
[112186.653010] Stack:
[112186.653010] ffff880036e69e28 ffffffff8118c717 ffff8800058bcc00 ffffffffa026f0e0
[112186.653010] ffff880036e69e58 ffffffff8118d6ea ffff880036e69e18 ffff880036e69e18
[112186.653010] ffff880104979800 ffffffffa026f0e0 ffff880036e69e78 ffffffff8118f6a9
[112186.653010] Call Trace:
[112186.653010] [<ffffffff8118c717>] dentry_lru_prune+0x97/0xa0
[112186.653010] [<ffffffff8118d6ea>] shrink_dcache_for_umount_subtree+0x7a/0x1e0
[112186.653010] [<ffffffff8118f6a9>] shrink_dcache_for_umount+0x49/0x60
[112186.735879] [<ffffffff81179bbc>] generic_shutdown_super+0x2c/0xe0
[112186.735879] [<ffffffff81179d06>] kill_anon_super+0x16/0x30
[112186.735879] [<ffffffffa02550c0>] ceph_kill_sb+0x30/0x50 [ceph]
[112186.802472] [<ffffffff8117a34c>] deactivate_locked_super+0x3c/0xa0
[112186.802472] [<ffffffff8117abbe>] deactivate_super+0x4e/0x70
[112186.832171] [<ffffffff811971dd>] mntput_no_expire+0x9d/0xf0
[112186.832171] [<ffffffff811984fb>] sys_umount+0x5b/0xd0
[112186.832171] [<ffffffff81664a82>] system_call_fastpath+0x16/0x1b
[112186.832171] Code: 66 90 b8 01 00 00 00 5d c3 55 48 89 e5 66 66 66 66 90 48 8b 47 18 48 85 c0 74 14 48 39 c7 74 0f 48 83 7f 10 00 74 08 48 8b 40 78 <f0> 80 20 fd 5d c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec
[112186.832171] RIP [<ffffffffa02590d2>] ceph_d_prune+0x22/0x30 [ceph]
[112186.832171] RSP <ffff880036e69e08>
[112186.832171] CR2: 0000000000000000
[112187.047029] ---[ end trace 4e4a2b237a888ee4 ]---

Running in a production environment with:
5 mons
1 mds
4 osd

All machines are vanilla Ubuntu Server 12.04
Linux rmi-orem-ceph1-mds1.readymicro.local 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Ceph from packages:
ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)

The mds exports the cephfs via kernel driver via samba (smbd version 3.6.3)

Once this message appears, all access to the cephfs via the kernel driver blocks in definitely.

Please suggest what other information and logs will be helpful.

This is on production equipment; however, I'm in process of configuring a similar setup in my lab to attempt to recreate the issue where I can enable more logging.


Related issues 1 (0 open1 closed)

Has duplicate CephFS - Bug #3640: kclient: hang and kernel panicDuplicate12/18/2012

Actions
Actions #1

Updated by Matt Garner over 11 years ago

Probably a duplicate of BUG #2444.

Actions #2

Updated by Matt Garner over 11 years ago

Matt Garner wrote:

Probably a duplicate of BUG #2444.

  • Possibly
Actions #3

Updated by Sage Weil over 11 years ago

Can you try

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index e5b7731..858be3f 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1127,7 +1127,8 @@ static void ceph_d_prune(struct dentry *dentry)
         * cleared until d_release
         */
        di = ceph_dentry(dentry->d_parent);
-       clear_bit(CEPH_D_COMPLETE, &di->flags);
+       if (di)
+               clear_bit(CEPH_D_COMPLETE, &di->flags);
 }

 /*

and see if that resolves it? (That is probably not the best fix, but it'll tell us more about what is going on.)

Actions #4

Updated by Sage Weil over 11 years ago

  • Status changed from New to Need More Info
Actions #5

Updated by Sage Weil about 11 years ago

  • Project changed from Linux kernel client to CephFS
  • Category deleted (libceph)

this code may be gone now with yan's d_prune changes...

Actions #6

Updated by Zheng Yan over 10 years ago

  • Status changed from Need More Info to Resolved
Actions

Also available in: Atom PDF