Project

General

Profile

Actions

Bug #4722

closed

kernel BUG at fs/ceph/caps.c:1006 invalid opcode: 0000

Added by Matthew Roy about 11 years ago. Updated almost 8 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
kceph
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Top of Call trace:

ceph_queue_caps_release
ceph_destroy_inode
evict
input
d_kill
shrink_dentry_list
prune_dcache_sb
prune_super
shrink_slab

This roughly coincided with some MDS/mon crashes. Crash occurred on Ubuntu 12.10 (3.5 series), but this line looks last modified in 2010, so this might still be present.

Unfortunately none of this made it into the logfiles, screenshots attached and at https://plus.google.com/102493641635816029247/posts/SpiK8kPAyuX


Files

IMG_20130413_105003.jpg (2.01 MB) IMG_20130413_105003.jpg Matthew Roy, 04/13/2013 09:46 AM
IMG_20130413_104944.jpg (2.26 MB) IMG_20130413_104944.jpg Matthew Roy, 04/13/2013 09:46 AM
IMG_20130413_104936.jpg (1.91 MB) IMG_20130413_104936.jpg Matthew Roy, 04/13/2013 09:46 AM
IMG_20130413_104956.jpg (2.25 MB) IMG_20130413_104956.jpg Matthew Roy, 04/13/2013 09:46 AM
Actions #1

Updated by Greg Farnum about 11 years ago

I did a checkout of v3.5, and caps.c:1006 is

BUG_ON(msg->front.iov_len + sizeof(*item) > PAGE_CACHE_SIZE);

Alex tells me this matches up with the invalid opcode (probably; he's not sure the code is the right one), so I guess that's good? He's not sure whether it matters that the message front is larger than a page (I have no idea, kernel-side), but this happened following some MDS trouble so it makes sense that the client could be instructed to drop >4KB worth of encoded caps.

Thanks for the report!

Actions #2

Updated by Greg Farnum about 10 years ago

  • Category set to 53
  • Priority changed from Normal to High

Sounds like this might require some protocol work and it's in the kernel client — high!

Actions #3

Updated by Zheng Yan about 10 years ago

who cares 3.5 kernel?

Actions #4

Updated by Greg Farnum about 10 years ago

Unless this part has been fixed by a newer kernel, we still need to deal with it. In particular we were concerned that this BUG_ON might be because we always pre-allocate the space needed for a message reply, but with this message the "front" portion can sometimes be bigger than the allocated space.

Actions #5

Updated by Zheng Yan about 10 years ago

  • Status changed from New to Can't reproduce

it's more likely there is no pre-allocated message. variable 'msg' is pointing to the pre-allocated message list.

Actions #6

Updated by Greg Farnum about 10 years ago

You think the msg pointer is invalid, and so it's overflowing?
I'm a little concerned at just closing this unless we can guarantee the size of a cap drop message is effectively limited by something, which I don't believe we've changed recently.

Actions #7

Updated by Zheng Yan about 10 years ago

__queue_cap_release has code which limits the size of cap release message

Actions #8

Updated by Greg Farnum almost 8 years ago

  • Component(FS) kceph added
Actions

Also available in: Atom PDF