Project

General

Profile

Actions

Bug #3572

closed

High CPU after equivalent to node very busy

Added by David Zafman over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I dropped a ceph kernel client node into the debugger and left it there a while. After resuming execution the machine is looping heavily in the kernel. If I set a breakpoint on ceph_mdsmap_get_random_mds() and __do_request(), it will hit them repeatedly. It appears that a long chain of requests are being process in __wake_requests().

Another issue is that with VMs getting entropy can be difficult, so calling get_random_bytes() which is the export kernel interface for getting random numbers drains the entropy pool. This means that it will block waiting for more entropy. I wish we could get at the non-blocking variant, although it would be best to only call __choose_mds() once after a possible loss.

This isn't a real world scenario, but I'm concerned that on a heavily loaded machine the same circumstances could occur.

Below is a typical stack trace if I just randomly interrupt into the debugger.

#0 0xffffffff813393f8 in sha_transform (digest=0xffff88001c5239b0, data=<optimized out>, array=0xffff88001c523970) at lib/sha1.c:155
#1 0xffffffff8140cf1a in extract_buf (r=0xffffffff81c83b00, out=0xffff88001c523a8e "") at drivers/char/random.c:926
#2 0xffffffff8140dadf in extract_entropy (r=0xffffffff81c83b00, buf=0xffff88001c523aff, nbytes=1, min=0, reserved=0) at drivers/char/random.c:965
#3 0xffffffff8140dbc0 in get_random_bytes (buf=<optimized out>, nbytes=<optimized out>) at drivers/char/random.c:1035
#4 0xffffffff812a5178 in ceph_mdsmap_get_random_mds (m=0xffff880030c90960) at fs/ceph/mdsmap.c:33
#5 0xffffffff812a21bb in _choose_mds (req=0xffff88002fc2ec00, mdsc=0xffff8800241c4800) at fs/ceph/mds_client.c:755
#6 __do_request (mdsc=0xffff8800241c4800, req=0xffff88002fc2ec00) at fs/ceph/mds_client.c:1820
#7 0xffffffff812a24c2 in __wake_requests (mdsc=0xffff8800241c4800, head=0xffff88003dbbe570) at fs/ceph/mds_client.c:1883
#8 0xffffffff812a4698 in handle_session (msg=0xffff88002fce5900, session=0xffff88003dbbe000) at fs/ceph/mds_client.c:2338
#9 dispatch (con=<optimized out>, msg=0xffff88002fce5900) at fs/ceph/mds_client.c:3359
#10 0xffffffff81654fc9 in process_message (con=0xffff88003dbbe040) at net/ceph/messenger.c:2006
#11 try_read (con=<optimized out>) at net/ceph/messenger.c:2221
#12 con_work (work=0xffff88003dbbe438) at net/ceph/messenger.c:2332
#13 0xffffffff810740e6 in process_one_work (worker=<optimized out>, work=0xffff88003dbbe438) at kernel/workqueue.c:2080
#14 0xffffffff81075380 in worker_thread (
_worker=0xffff880030d5f780) at kernel/workqueue.c:2201
#15 0xffffffff8107a423 in kthread (_create=0xffff88003e371cb8) at kernel/kthread.c:121
#16 0xffffffff8169ea84 in ?? () at arch/x86/kernel/entry_64.S:1216
#17 0x0000000000000000 in ?? ()

Actions

Also available in: Atom PDF