Bug #2092: BUG at fs/ceph/caps.c:999 - CephFS - Ceph

Actions

Copy link

Bug #2092

closed

BUG at fs/ceph/caps.c:999

Added by Sage Weil about 12 years ago. Updated over 11 years ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Feb 20 17:18:57 fcmsnode0 kernel: ------------[ cut here ]------------
Feb 20 17:18:57 fcmsnode0 kernel: kernel BUG at fs/ceph/caps.c:999!
Feb 20 17:18:57 fcmsnode0 kernel: invalid opcode: 0000 [#1] SMP 
Feb 20 17:18:57 fcmsnode0 kernel: CPU 11 
Feb 20 17:18:57 fcmsnode0 kernel: Modules linked in: xt_tcpudp ip6table_filter ip6_tables ipt_LOG xt_mac xt_physdev xt_multiport iptable_filter ip_tables kvm_amd vhost_net rbd zram(C) ebt_arp ebtable_filter ebtables x_tables kvm ceph libceph 8021q garp bridge stp xfs exportfs snd_pcm snd_timer snd soundcore snd_page_alloc amd64_edac_mod psmouse edac_core tpm_tis tpm i2c_piix4 edac_mce_amd tpm_bios joydev pcspkr serio_raw k10temp btrfs zlib_deflate crc32c libcrc32c usbhid hid ixgbe mdio mpt2sas igb ahci scsi_transport_sas e1000e dca libahci raid_class 3w_sas [last unloaded: scsi_wait_scan]
Feb 20 17:18:57 fcmsnode0 kernel:
Feb 20 17:18:57 fcmsnode0 kernel: Pid: 26647, comm: sudo Tainted: G         C   3.2.1 #1 Supermicro H8DG6/H8DGi/H8DG6/H8DGi
Feb 20 17:18:57 fcmsnode0 kernel: RIP: 0010:[<ffffffffa03bc366>]  [<ffffffffa03bc366>] __queue_cap_release+0x3d/0xe0 [ceph]
Feb 20 17:18:57 fcmsnode0 kernel: RSP: 0018:ffff881828525b08  EFLAGS: 00010246
Feb 20 17:18:57 fcmsnode0 kernel: RAX: 000000000000021c RBX: ffff8808b0b88000 RCX: 0000000000000001
Feb 20 17:18:57 fcmsnode0 kernel: RDX: 00000000021c021c RSI: 0000010000000008 RDI: ffff8808b0b884d0
Feb 20 17:18:57 fcmsnode0 kernel: RBP: ffff881828525b38 R08: 0000000000000001 R09: ffff8803b4e59310
Feb 20 17:18:57 fcmsnode0 kernel: R10: ffff8803b4e59348 R11: ffff881828525d18 R12: 0000010000000008
Feb 20 17:18:57 fcmsnode0 kernel: R13: 0000000000001bd0 R14: 0000000000000001 R15: ffff8819b837da40
Feb 20 17:18:57 fcmsnode0 kernel: FS:  00007f4e64227700(0000) GS:ffff881007d40000(0000) knlGS:0000000000000000
Feb 20 17:18:57 fcmsnode0 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 20 17:18:57 fcmsnode0 kernel: CR2: 00007f4e6422c000 CR3: 0000000f802b9000 CR4: 00000000000006e0
Feb 20 17:18:57 fcmsnode0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 20 17:18:57 fcmsnode0 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Feb 20 17:18:57 fcmsnode0 kernel: Process sudo (pid: 26647, threadinfo ffff881828524000, task ffff88196706adc0)
Feb 20 17:18:57 fcmsnode0 kernel: Stack:
Feb 20 17:18:57 fcmsnode0 kernel: ffff8803b4e592b0 0000000000000001 ffff8802e4226d80 ffff8803b4e58fa8
Feb 20 17:18:57 fcmsnode0 kernel: ffff8802e4226d88 0000000000000001 ffff881828525b68 ffffffffa03bed84
Feb 20 17:18:57 fcmsnode0 kernel: ffff881828525b48 ffff8803b4e592b0 ffff8803b4e58fa8 ffff8803b4e593a8
Feb 20 17:18:57 fcmsnode0 kernel: Call Trace:
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffffa03bed84>] ceph_queue_caps_release+0x47/0x67 [ceph]
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffffa03b55ce>] ceph_destroy_inode+0x25/0x118 [ceph]
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffff811420cf>] destroy_inode+0x40/0x5a
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffff81142741>] evict+0x165/0x16e
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffff8114290e>] iput+0x1c4/0x1cd
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffff8113f9f2>] dentry_kill+0x161/0x181
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffff8114013b>] dput+0x101/0x10e
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffff811387ac>] walk_component+0x299/0x3ca
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffff8113939c>] path_lookupat+0xb0/0x305
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffff8111d5a4>] ? kmem_cache_alloc+0x31/0xf6
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffff81139619>] do_path_lookup+0x28/0xa4
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffff8113a2e0>] user_path_at_empty+0x55/0x94
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffff810fd498>] ? tlb_finish_mmu+0x14/0x39
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffff81101689>] ? unmap_region+0x102/0x114
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffff8113a330>] user_path_at+0x11/0x13
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffff8113165a>] vfs_fstatat+0x42/0x6d
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffff8113173a>] vfs_stat+0x1b/0x1d
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffff8113175b>] sys_newstat+0x1f/0x3d
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffff810723ae>] ? put_cred+0x18/0x1a
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffff8107268e>] ? commit_creds+0x168/0x172
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffff81064162>] ? sys_setreuid+0xe5/0x105
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffff81063d58>] ? sys_setregid+0xb8/0xd1
Feb 20 17:18:57 fcmsnode0 kernel: [<ffffffff81501e82>] system_call_fastpath+0x16/0x1b
Feb 20 17:18:57 fcmsnode0 kernel: Code: 66 90 48 89 fb 48 8d bf d0 04 00 00 44 89 45 d8 49 89 f4 49 89 d5 41 89 ce e8 91 ee 13 e1 83 bb f8 04 00 00 00 44 8b 45 d8 75 04 <0f> 0b eb fe 48 8b 83 00 05 00 00 48 8b 48 d8 48 8d 50 88 48 83 
Feb 20 17:18:57 fcmsnode0 kernel: RIP  [<ffffffffa03bc366>] __queue_cap_release+0x3d/0xe0 [ceph]
Feb 20 17:18:57 fcmsnode0 kernel: RSP <ffff881828525b08>
Feb 20 17:18:57 fcmsnode0 kernel: ---[ end trace 1c4d90c12a23789c ]---

full log at fatty:~sage/tmp/oliver/kern.log.osd0

Actions

Copy link

Updated by Alex Elder about 12 years ago

Just a quick look at this.

Here's the code:

static void __queue_cap_release(struct ceph_mds_session *session,
u64 ino, u64 cap_id, u32 migrate_seq,
u32 issue_seq) {
struct ceph_msg *msg;
struct ceph_mds_cap_release *head;
struct ceph_mds_cap_item *item;

spin_lock(&session->s_cap_lock);
        BUG_ON(!session->s_num_cap_releases);     <--
   . . .

A quick review of where session->s_num_cap_releases gets modified:
- Initialized to zero in register_session()
- Increased by CEPH_CAPS_PER_RELEASE for every client cap release message
allocated in ceph_add_cap_releases().
- The number of such messages allocated is enough to ensure a release
is allocated for each capability issued by the session.
- In addition, a mount-option tunable "safety" cushion is added to
the number of releases desired.
- And finally, if the session already has some cap release messages,
the number of releases that will be allocated in addition to
those--to reach a "full" cap release message is accounted for.
- Decreased by (CEPH_CAPS_PER_RELEASE - num), where num is the number
of existing cap release messages already tied to the session, when
that "partial" release message is queued to the s_cap_releases_done
list.
==> I think this is OK--it's just accounting for the unused releases
represented by the partial message.
- Increased by the number of cap releases represented by each of the
entries in the s_cap_releases_done list in discard_cap_releases(),
when tossing out old state to prepare for reconnecting to a MDS.
The messages queued there are pulled from the done list and re-queued
onto the cap_releases list.
- Decremented (by 1) in __queue_cap_release(), to account for the
capability release entry that gets consumed in the cap release
message by this function.

Note that this last spot is just after the BUG, which fired
because there were no entries left to consume at the point this
function was called.

This probably means ceph_add_cap_releases() wasn't called in time
to ensure there was always at least one available.

I note one more thing here: Despite this comment above
ceph_add_cap_releases(): * Called under s_mutex.
one of the three callers of that function (handle_reply())
explicitly drops the mutex (around mds_client.c:2203):

mutex_unlock(&mdsc->mutex);

ceph_add_cap_releases(mdsc, req->r_session);
        mutex_unlock(&session->s_mutex);

Maybe this is the problem.

Actions

Copy link

Updated by Sage Weil about 12 years ago

mdsc->mutex protects the globalish mds client state (request/session lists), which is different from session->s_mutex, which is held until the next line.

there is one path that doesn't properly do the reservation that i know about, triggered by IMPORT messages when metadata migrates between servers. however, that isn't what's happening here with only 1 mds active.

Actions

Copy link