Bug #4722
closedkernel BUG at fs/ceph/caps.c:1006 invalid opcode: 0000
0%
Description
Top of Call trace:
ceph_queue_caps_release ceph_destroy_inode evict input d_kill shrink_dentry_list prune_dcache_sb prune_super shrink_slab
This roughly coincided with some MDS/mon crashes. Crash occurred on Ubuntu 12.10 (3.5 series), but this line looks last modified in 2010, so this might still be present.
Unfortunately none of this made it into the logfiles, screenshots attached and at https://plus.google.com/102493641635816029247/posts/SpiK8kPAyuX
Files
Updated by Greg Farnum about 11 years ago
I did a checkout of v3.5, and caps.c:1006 is
BUG_ON(msg->front.iov_len + sizeof(*item) > PAGE_CACHE_SIZE);
Alex tells me this matches up with the invalid opcode (probably; he's not sure the code is the right one), so I guess that's good? He's not sure whether it matters that the message front is larger than a page (I have no idea, kernel-side), but this happened following some MDS trouble so it makes sense that the client could be instructed to drop >4KB worth of encoded caps.
Thanks for the report!
Updated by Greg Farnum about 10 years ago
- Category set to 53
- Priority changed from Normal to High
Sounds like this might require some protocol work and it's in the kernel client — high!
Updated by Greg Farnum about 10 years ago
Unless this part has been fixed by a newer kernel, we still need to deal with it. In particular we were concerned that this BUG_ON might be because we always pre-allocate the space needed for a message reply, but with this message the "front" portion can sometimes be bigger than the allocated space.
Updated by Zheng Yan about 10 years ago
- Status changed from New to Can't reproduce
it's more likely there is no pre-allocated message. variable 'msg' is pointing to the pre-allocated message list.
Updated by Greg Farnum about 10 years ago
You think the msg pointer is invalid, and so it's overflowing?
I'm a little concerned at just closing this unless we can guarantee the size of a cap drop message is effectively limited by something, which I don't believe we've changed recently.
Updated by Zheng Yan about 10 years ago
__queue_cap_release has code which limits the size of cap release message