Actions
Bug #2081
closedmsgr: spinlock badness?
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
captured this console fragment from a crashed qa run
[ 8598.813081] Process kworker/0:3 (pid: 25843, threadinfo ffff8800df23a000, task ffff880037eb1f70) [ 8598.813082] Stack: [ 8598.813084] ffff8800df23b728 ffffffff8107f285 ffff8800df23b728 ffff880111c13b80 [ 8598.813087] ffff8800df23b7c8 ffffffff81602db3 0000000000000000 ffffffff81067021 [ 8598.813090] ffff880037eb1f70 0000000000013b80 ffff8800df23bfd8 ffff8800df23a010 [ 8598.813093] Call Trace: [ 8598.813096] [<ffffffff8107f285>] wq_worker_sleeping+0x15/0xa0 [ 8598.813099] [<ffffffff81602db3>] __schedule+0x5c3/0x940 [ 8598.813102] [<ffffffff81067021>] ? do_exit+0x551/0x880 [ 8598.813105] [<ffffffff81065f1d>] ? release_task+0x1d/0x470 [ 8598.813108] [<ffffffff8160345f>] schedule+0x3f/0x60 [ 8598.813111] [<ffffffff81067097>] do_exit+0x5c7/0x880 [ 8598.813113] [<ffffffff81063bd5>] ? kmsg_dump+0x75/0x140 [ 8598.813117] [<ffffffff81607100>] oops_end+0xb0/0xf0 [ 8598.813120] [<ffffffff8103f88d>] no_context+0xfd/0x270 [ 8598.813123] [<ffffffff8103fb45>] __bad_area_nosemaphore+0x145/0x230 [ 8598.813127] [<ffffffff81609b61>] ? do_page_fault+0x221/0x4b0 [ 8598.813130] [<ffffffff8103fc43>] bad_area_nosemaphore+0x13/0x20 [ 8598.813133] [<ffffffff81609c8e>] do_page_fault+0x34e/0x4b0 [ 8598.813136] [<ffffffff814f04e4>] ? release_sock+0x144/0x190 [ 8598.813140] [<ffffffff81545245>] ? tcp_sendpage+0xc5/0x590 [ 8598.813143] [<ffffffff81315d5d>] ? trace_hardirqs_off_thunk+0x3a/0x3c [ 8598.813147] [<ffffffff81606535>] page_fault+0x25/0x30 [ 8598.813154] [<ffffffffa048ec30>] ? try_write+0x570/0xef0 [libceph] [ 8598.813158] [<ffffffff814eb1b4>] ? kernel_recvmsg+0x44/0x60 [ 8598.813165] [<ffffffffa048d858>] ? ceph_tcp_recvmsg+0x48/0x60 [libceph] [ 8598.813173] [<ffffffffa04901b0>] con_work+0xc00/0x1b10 [libceph] [ 8598.813176] [<ffffffff8107fe76>] process_one_work+0x1a6/0x520 [ 8598.813179] [<ffffffff8107fe07>] ? process_one_work+0x137/0x520 [ 8598.813186] [<ffffffffa048f5b0>] ? try_write+0xef0/0xef0 [libceph] [ 8598.813189] [<ffffffff810821a3>] worker_thread+0x173/0x400 [ 8598.813192] [<ffffffff81082030>] ? manage_workers+0x210/0x210 [ 8598.813195] [<ffffffff810871e6>] kthread+0xb6/0xc0 [ 8598.813198] [<ffffffff8109f1a5>] ? trace_hardirqs_on_caller+0x105/0x190 [ 8598.813201] [<ffffffff816103c4>] kernel_thread_helper+0x4/0x10 [ 8598.813204] [<ffffffff816062b4>] ? retint_restore_args+0x13/0x13 [ 8598.813207] [<ffffffff81087130>] ? __init_kthread_worker+0x70/0x70 [ 8598.813210] [<ffffffff816103c0>] ? gs_change+0x13/0x13 [ 8598.813212] Code: 66 66 66 90 65 48 8b 04 25 40 c4 00 00 48 8b 80 48 03 00 00 8b 40 f0 c9 c3 66 90 55 48 89 e5 66 66 66 66 90 48 8b 87 48 03 00 00 <48> 8b 40 f8 c9 c3 eb 08 90 90 90 90 90 90 90 90 55 48 89 e5 66 [ 8598.813239] RIP [<ffffffff81086bd0>] kthread_data+0x10/0x20 [ 8598.813242] RSP <ffff8800df23b708> [ 8598.813243] CR2: fffffffffffffff8 [ 8598.813245] ---[ end trace 96bfe67b43fe6e90 ]--- [ 8598.813246] Fixing recursive fault but reboot is needed! [ 8599.827067] BUG: spinlock lockup on CPU#0, kworker/0:3/25843 [ 8599.827067] lock: ffff880111c13b80, .magic: dead4ead, .owner: kworker/0:3/25843, .owner_cpu: 0 [ 8599.827067] Pid: 25843, comm: kworker/0:3 Tainted: G D W 3.2.0-ceph-00161-g27772da #1 [ 8599.827067] Call Trace: [ 8599.827067] [<ffffffff8131bf18>] spin_dump+0x78/0xc0 [ 8599.827067] [<ffffffff8131c13d>] do_raw_spin_lock+0xed/0x120 [ 8599.827067] [<ffffffff81605835>] _raw_spin_lock_irq+0x45/0x50 [ 8599.827067] [<ffffffff816028c7>] ? __schedule+0xd7/0x940 [ 8599.827067] [<ffffffff816028c7>] __schedule+0xd7/0x940 [ 8599.827067] [<ffffffff8160345f>] schedule+0x3f/0x60 [ 8599.827067] [<ffffffff81067304>] do_exit+0x834/0x880 [ 8599.827067] [<ffffffff81063c65>] ? kmsg_dump+0x105/0x140 [ 8599.827067] [<ffffffff81063bd5>] ? kmsg_dump+0x75/0x140 [ 8599.827067] [<ffffffff81607100>] oops_end+0xb0/0xf0 [ 8599.827067] [<ffffffff8103f88d>] no_context+0xfd/0x270 [ 8599.827067] [<ffffffffa01a9a19>] ? drm_mode_object_find+0x39/0x90 [drm] [ 8599.827067] [<ffffffff8103fb45>] __bad_area_nosemaphore+0x145/0x230 [ 8599.827067] [<ffffffff8103fc43>] bad_area_nosemaphore+0x13/0x20 [ 8599.827067] [<ffffffff81609c8e>] do_page_fault+0x34e/0x4b0 [ 8599.827067] [<ffffffff81055d62>] ? cpuacct_charge+0x22/0xb0 [ 8599.827067] [<ffffffff81055d62>] ? cpuacct_charge+0x22/0xb0 [ 8599.827067] [<ffffffff81315d5d>] ? trace_hardirqs_off_thunk+0x3a/0x3c [ 8599.827067] [<ffffffff81606535>] page_fault+0x25/0x30 [ 8599.827067] [<ffffffff81086bd0>] ? kthread_data+0x10/0x20 [ 8599.827067] [<ffffffff8107f285>] wq_worker_sleeping+0x15/0xa0 [ 8599.827067] [<ffffffff81602db3>] __schedule+0x5c3/0x940 [ 8599.827067] [<ffffffff81067021>] ? do_exit+0x551/0x880 [ 8599.827067] [<ffffffff81065f1d>] ? release_task+0x1d/0x470 [ 8599.827067] [<ffffffff8160345f>] schedule+0x3f/0x60 [ 8599.827067] [<ffffffff81067097>] do_exit+0x5c7/0x880 [ 8599.827067] [<ffffffff81063bd5>] ? kmsg_dump+0x75/0x140 [ 8599.827067] [<ffffffff81607100>] oops_end+0xb0/0xf0 [ 8599.827067] [<ffffffff8103f88d>] no_context+0xfd/0x270 [ 8599.827067] [<ffffffff8103fb45>] __bad_area_nosemaphore+0x145/0x230 [ 8599.827067] [<ffffffff81609b61>] ? do_page_fault+0x221/0x4b0 [ 8599.827067] [<ffffffff8103fc43>] bad_area_nosemaphore+0x13/0x20 [ 8599.827067] [<ffffffff81609c8e>] do_page_fault+0x34e/0x4b0 [ 8599.827067] [<ffffffff814f04e4>] ? release_sock+0x144/0x190 [ 8599.827067] [<ffffffff81545245>] ? tcp_sendpage+0xc5/0x590 [ 8599.827067] [<ffffffff81315d5d>] ? trace_hardirqs_off_thunk+0x3a/0x3c [ 8599.827067] [<ffffffff81606535>] page_fault+0x25/0x30 [ 8599.827067] [<ffffffffa048ec30>] ? try_write+0x570/0xef0 [libceph] [ 8599.827067] [<ffffffff814eb1b4>] ? kernel_recvmsg+0x44/0x60 [ 8599.827067] [<ffffffffa048d858>] ? ceph_tcp_recvmsg+0x48/0x60 [libceph] [ 8599.827067] [<ffffffffa04901b0>] con_work+0xc00/0x1b10 [libceph] [ 8599.827067] [<ffffffff8107fe76>] process_one_work+0x1a6/0x520 [ 8599.827067] [<ffffffff8107fe07>] ? process_one_work+0x137/0x520 [ 8599.827067] [<ffffffffa048f5b0>] ? try_write+0xef0/0xef0 [libceph] [ 8599.827067] [<ffffffff810821a3>] worker_thread+0x173/0x400 [ 8599.827067] [<ffffffff81082030>] ? manage_workers+0x210/0x210 [ 8599.827067] [<ffffffff810871e6>] kthread+0xb6/0xc0 [ 8599.827067] [<ffffffff8109f1a5>] ? trace_hardirqs_on_caller+0x105/0x190 [ 8599.827067] [<ffffffff816103c4>] kernel_thread_helper+0x4/0x10 [ 8599.827067] [<ffffffff816062b4>] ? retint_restore_args+0x13/0x13 [ 8599.827067] [<ffffffff81087130>] ? __init_kthread_worker+0x70/0x70 [ 8599.827067] [<ffffffff816103c0>] ? gs_change+0x13/0x13
The job was
ubuntu@teuthology:/a/nightly_coverage_2012-02-15-b/12139$ cat config.yaml kernel: &id001 sha1: 27772dabb75b1072a81c0215b61b066bf8810f6c nuke-on-error: true overrides: ceph: conf: osd: osd op complaint time: 120 coverage: true fs: btrfs log-whitelist: - clocks not synchronized - old request sha1: e32668f8b83abad74e858d9e9fffbd456968a918 roles: - - mon.a - mon.c - osd.0 - - mon.b - mds.a - osd.1 - - client.0 targets: ubuntu@sepia55.ceph.dreamhost.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDZ0HauoMdA5zgEmINOhrUYdVO1V3PDt4dp/dfm+AwB3iunb6Y/BXifGRDznlrqgIZ8+Sxl7U6lPsMsj0+MOysoWA1GPmpQqwQlDLF3SMOKvRXnGALB2x2spotMgQstoSYQWUAWK+6tC41oZq2R6ht0bljU0jiS8TP3qCZ+I5izt/p31vUjZUAvtCZBQkGr8vLZDrc+/R5dCPkGn/D+IoNA0P+l75WO1nHOpAfuYN7q0xCUN42NBJl13lCN9xZBaQZyDC4OyFp7sCsSEO16SkHpiLgALITUREksvWmS/OAcIk6ayvQNWr8mBCYpiiGgr8Arvdn826UB6PDCZ4Ym9J3H ubuntu@sepia56.ceph.dreamhost.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDUTeKEAoW2ZB3Rl0YEdWLBCfL4PuA3QA6KC2BiPsR3UXfI92jFNnZ8Yr4Qz7pJQkI9ELdC1p4T9ro1zHv/fY7U5jtFeT4aJ1daTY2fw5R5dFuMTaBnvCnWczJshoFEaEgvwh8CHXyykPtpl9l329kWOK8VjNEyN9xCork9i7NLkrPXj6Mq44baZWw32IaFWnpcTfhL1NprMeu7pSLmZooRWElIXrT2WD8FJGp8t1FlSpvOwou5pIcW1EzBLNoR8YG8lVAl5r9xKsRI1zlOtmkKeKosasoNNx8PYmSGXR/ToiMlCpxVtqMxtd7bT1VQLqDEu7W/MMHnj2T9M8ZTF759 ubuntu@sepia6.ceph.dreamhost.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDS1viQGCX9RNrDOwhu61WNR7UUIfR83pRvN2K2Xw8uWoUB5jEkpU6E3sYyvzEA+kEQo9TDbIZSsm8uYFciP7iy5ONj7aQYwqueec9QQS/v4ivRCg9SuD/s146ZiaIHyaOWqHVTleslUwAHRGEKRwFw8KoKIUKO119KrWfx57ahaeWa8ex9wq96TmScPcU4hB4RH2BQCYD9Qexn3JIxdf0L8dwNDHAqsKPV6g5ewlszfNw30cXip7WgL6sWkOPzv9A2xsUIPUkbTj9lLyk9JsUtyDEPTFziKEfsKbDREX2ayyr0d43H8SlhFkMzPRTrlRixPlebS82ZdEvvqYPrW63n tasks: - internal.lock_machines: 3 - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.check_conflict: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.syslog: null - chef: null - ceph: null - rbd: all: null - workunit: all: - kernel_untar_build.sh
Actions