Project

General

Profile

Actions

Bug #3572

closed

High CPU after equivalent to node very busy

Added by David Zafman over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I dropped a ceph kernel client node into the debugger and left it there a while. After resuming execution the machine is looping heavily in the kernel. If I set a breakpoint on ceph_mdsmap_get_random_mds() and __do_request(), it will hit them repeatedly. It appears that a long chain of requests are being process in __wake_requests().

Another issue is that with VMs getting entropy can be difficult, so calling get_random_bytes() which is the export kernel interface for getting random numbers drains the entropy pool. This means that it will block waiting for more entropy. I wish we could get at the non-blocking variant, although it would be best to only call __choose_mds() once after a possible loss.

This isn't a real world scenario, but I'm concerned that on a heavily loaded machine the same circumstances could occur.

Below is a typical stack trace if I just randomly interrupt into the debugger.

#0 0xffffffff813393f8 in sha_transform (digest=0xffff88001c5239b0, data=<optimized out>, array=0xffff88001c523970) at lib/sha1.c:155
#1 0xffffffff8140cf1a in extract_buf (r=0xffffffff81c83b00, out=0xffff88001c523a8e "") at drivers/char/random.c:926
#2 0xffffffff8140dadf in extract_entropy (r=0xffffffff81c83b00, buf=0xffff88001c523aff, nbytes=1, min=0, reserved=0) at drivers/char/random.c:965
#3 0xffffffff8140dbc0 in get_random_bytes (buf=<optimized out>, nbytes=<optimized out>) at drivers/char/random.c:1035
#4 0xffffffff812a5178 in ceph_mdsmap_get_random_mds (m=0xffff880030c90960) at fs/ceph/mdsmap.c:33
#5 0xffffffff812a21bb in _choose_mds (req=0xffff88002fc2ec00, mdsc=0xffff8800241c4800) at fs/ceph/mds_client.c:755
#6 __do_request (mdsc=0xffff8800241c4800, req=0xffff88002fc2ec00) at fs/ceph/mds_client.c:1820
#7 0xffffffff812a24c2 in __wake_requests (mdsc=0xffff8800241c4800, head=0xffff88003dbbe570) at fs/ceph/mds_client.c:1883
#8 0xffffffff812a4698 in handle_session (msg=0xffff88002fce5900, session=0xffff88003dbbe000) at fs/ceph/mds_client.c:2338
#9 dispatch (con=<optimized out>, msg=0xffff88002fce5900) at fs/ceph/mds_client.c:3359
#10 0xffffffff81654fc9 in process_message (con=0xffff88003dbbe040) at net/ceph/messenger.c:2006
#11 try_read (con=<optimized out>) at net/ceph/messenger.c:2221
#12 con_work (work=0xffff88003dbbe438) at net/ceph/messenger.c:2332
#13 0xffffffff810740e6 in process_one_work (worker=<optimized out>, work=0xffff88003dbbe438) at kernel/workqueue.c:2080
#14 0xffffffff81075380 in worker_thread (
_worker=0xffff880030d5f780) at kernel/workqueue.c:2201
#15 0xffffffff8107a423 in kthread (_create=0xffff88003e371cb8) at kernel/kthread.c:121
#16 0xffffffff8169ea84 in ?? () at arch/x86/kernel/entry_64.S:1216
#17 0x0000000000000000 in ?? ()

Actions #1

Updated by Sage Weil over 11 years ago

  • Description updated (diff)

The good news is that we don't need high quality entropy here; the crappy stuff will do...

Actions #2

Updated by David Zafman over 11 years ago

There is a bug in this routine:

static void __wake_requests(struct ceph_mds_client *mdsc,
                            struct list_head *head)
{
        struct ceph_mds_request *req, *nreq;

        list_for_each_entry_safe(req, nreq, head, r_wait) {
                list_del_init(&req->r_wait);
                __do_request(mdsc, req);
        }
}

The list_for_each_entry_safe() isn't the right routine to use here because the head is a pointer to the s_waiting part of a struct ceph_mds_session. However, the list macros assumes that head is the same pointer type as the first argument in this case a struct ceph_mds_request *. So the condition check &pos->member != (head) will NEVER be false if the list wasn't empty. This works for an empty list because even though pos is a bogus pointer &pos->member == head.

Actions #3

Updated by David Zafman over 11 years ago

There is already a fix for this in wip-fs branch. I'm not clear if we need to be processing through the list once or keep trying because we are waiting for the session to open for example.

commit cd197cc5490a3110b9dfba26ee9d90a586a923af
Author: Yan, Zheng <zheng.z.yan@intel.com>
Date:   Mon Nov 19 10:49:06 2012 +0800

    ceph: Fix infinite loop in __wake_requests

    __wake_requests() will enter infinite loop if we use it to wake
    requests in the session->s_waiting list. __wake_requests() deletes
    requests from the list and __do_request() adds requests back to
    the list.

    Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
    Signed-off-by: Sage Weil <sage@inktank.com>

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 62d2342..9165eb8 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1876,9 +1876,14 @@ finish:
 static void __wake_requests(struct ceph_mds_client *mdsc,
                            struct list_head *head)
 {
-       struct ceph_mds_request *req, *nreq;
+       struct ceph_mds_request *req;
+       LIST_HEAD(tmp_list);
+
+       list_splice_init(head, &tmp_list);

-       list_for_each_entry_safe(req, nreq, head, r_wait) {
+       while (!list_empty(&tmp_list)) {
+               req = list_entry(tmp_list.next,
+                                struct ceph_mds_request, r_wait);
                list_del_init(&req->r_wait);
                __do_request(mdsc, req);
        }
Actions #4

Updated by Sage Weil over 11 years ago

Ah, nice catch! We just need to cycle through the list once to requeue the requests for the MDS. If Yan's patch looks good, go ahead and cherry-pick it into testing (and add your Tested-by or Reviewed-by, if you like). For that matter, if there are other patches in that branch you want to review, that's great too :).

I queued that branch for a qa run several days ago and it passed. It's not a particularly thorough test suite for mds behavior, though.. at least not yet.

Actions #5

Updated by David Zafman over 11 years ago

  • Status changed from New to Resolved

commit:ed75ec2cd19b47efcd292b6e23f58e56f4c5bc34

Actions

Also available in: Atom PDF