Bug #2790
libceph: crash in read_partial_message_section on ffsb
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
[0]kdb> bt Stack traceback for pid 4558 0xffff88020a66bf00 4558 2 1 0 R 0xffff88020a66c358 *kworker/0:2 <c> ffff8801121019e0<c> 0000000000000018<c> 0000000000000000<c> ffff88020a66bf00<c> <c> 0000000000000000<c> ffffffff00000001<c> 0000000000000000<c> 0000000000000000<c> <c> 0000000000000000<c> 0000000000000000<c> 0000000000000000<c> ffff88020a66bf00<c> Call Trace: [<ffffffff8108126c>] ? ttwu_stat+0x4c/0x140 [<ffffffff810894c4>] ? __enqueue_entity+0x74/0x80 [<ffffffff810812d8>] ? ttwu_stat+0xb8/0x140 [<ffffffff8108126c>] ? ttwu_stat+0x4c/0x140 [<ffffffffa04befaa>] ? con_work+0x1aba/0x2ed0 [libceph] [<ffffffffa04befaa>] ? con_work+0x1aba/0x2ed0 [libceph] [<ffffffff81507996>] ? kernel_recvmsg+0x46/0x60 [<ffffffffa04bb778>] ? ceph_tcp_recvmsg+0x48/0x60 [libceph] [<ffffffffa04bc73b>] ? read_partial_message_section.isra.17+0x6b/0xb0 [libceph] [<ffffffffa04bddbe>] ? con_work+0x8ce/0x2ed0 [libceph] [<ffffffff8108cee0>] ? load_balance+0xd0/0x7e0 [<ffffffff8108da63>] ? idle_balance+0x133/0x180 [<ffffffff8107fb28>] ? finish_task_switch+0x48/0x110 [<ffffffffa04bd4f0>] ? ceph_msg_new+0x2e0/0x2e0 [libceph] [<ffffffff8106d18a>] ? process_one_work+0x18a/0x510 [<ffffffff8106d11e>] ? process_one_work+0x11e/0x510 [0]more> [<ffffffff8106ecdf>] ? worker_thread+0x15f/0x350 [<ffffffff8106eb80>] ? manage_workers.isra.27+0x230/0x230 [<ffffffff8107411e>] ? kthread+0xae/0xc0 [<ffffffff810ae29d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff816368f4>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8162cb50>] ? _raw_spin_unlock_irq+0x30/0x40 [<ffffffff8162d370>] ? retint_restore_args+0x13/0x13 [<ffffffff81074070>] ? __init_kthread_worker+0x70/0x70 ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2012-07-17_07:00:05-marginal-master-testing-basic/12842$ cat config.yaml kernel: &id001 kdb: true sha1: 381448be4f27edab3cdeea88bbf6670e19bf4b8a nuke-on-error: true overrides: ceph: fs: btrfs log-whitelist: - slow request sha1: abe05a3fbbb120d8d354623258d9104584db66f7 workunit: sha1: abe05a3fbbb120d8d354623258d9104584db66f7 roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - - client.0 targets: ubuntu@plana25.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDuXajaQgHe9XnbLOzI8WWFYVz6+TnOiTzbkIJPGOZpzQEjnUtJraQIEt5ABSeovMjiEj+V4XvunfyuSmEd0H9giRSyjmCHTPGlpndfTeCdVtCBpNqf5GkUqHaEY1Hp57XPbya2rGlwtFm0NeIDYx6pfkejKnsTOUqwhgUb6950TRhjHQhMjFgyALSyfAm/4y6vGZfjm57+yyih6XgDkqWiiQ6Y/aJVR2n+iCzvqEzV7JSCU+Brn+k8IQLHho1fadYqc5PjYct5BaVlHcP6c+T8nJE/DvqGwZ4gQaVJcuWJiDfLOPPYo1g/0AFicxauLwVNJ6HFR9FjLLGtGU+2DcVN ubuntu@plana26.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDx0US96hot7gygZ69W4nxJQ9myYnn3I22YtOaSPe+yFWOPJVQUuOST+aw5K6JDcjdO2Gq0aS6s01mgoWpZlO/FVDKss7vZ2KjMp3uPkGMpDZarNbR3QTe5YZYrl7Wfw4pMu4jh92hCWJEzy5nH0H3X2YJhOd5BdOYz0P97qsMSPQGxhlvDBYBhDl9MLgsS3lKm/Js/OPLO+Uf3/SZceCjUqO2m3WsrJSiQJKh8XUWUu3z+6C1Wg6TXSSlA/jdVCiokDg7WYwPN9zMwzzGkGv+GUGHKMZaPGRZb9LQJLTBf/OjwRSgclAVdDc3vnZeYAS5+sDnt2grnJnlBd1rBUj3n ubuntu@plana30.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDYDpptyOTrVH3RqiwH5A//Q2CkkVz5dPTpd/s8qG/Q4EHVA4WDMu80pcDvdSewOfFJl83MEtDKKjuJOuEzI4OGn0DPptDN5wHC1OWrXqFMcIaWVe/KBYOdWEZbA7FECeXgEZR1Sid2bH7XDUE9AYalpS2/SmuuHEU1ObL6zSpAqoY6AIPCR6LgFrtxAqrYmIdpb8YfSuI5uPBv6qikl0yvam06WNerUNQ9lnZXFmFm1wBeicRvWH3jZ6w/xlQBIp/zG6k9IJa0vaLm+FqztLkDWri8Qz1dbdsz0bNjyzD6iRuDOpgmz0Kf8m2IjaJRgRgz2ARcOOdBJKmwnnW/knk5 tasks: - internal.lock_machines: 3 - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.check_conflict: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.syslog: null - internal.timer: null - chef: null - clock: null - ceph: null - kclient: null - workunit: clients: all: - suites/ffsb.sh very reproducible.
Related issues
History
#1 Updated by Sage Weil about 11 years ago
- Assignee set to Sage Weil
#2 Updated by Sage Weil about 11 years ago
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2012-07-19_08:03:32-marginal-master-testing-basic/14125
#3 Updated by Sage Weil about 11 years ago
4f9c: 31 f6 xor %esi,%esi if (con->ops->alloc_msg) { int skip = 0; mutex_unlock(&con->mutex); con->in_msg = con->ops->alloc_msg(con, hdr, &skip); 4f9e: 48 89 83 28 04 00 00 mov %rax,0x428(%rbx) mutex_lock(&con->mutex); 4fa5: e8 00 00 00 00 callq 4faa <con_work+0x1aba> 4fa6: R_X86_64_PC32 mutex_lock_nested-0x4 if (con->in_msg) { 4faa: 4c 8b a3 28 04 00 00 mov 0x428(%rbx),%r12 4fb1: 4d 85 e4 test %r12,%r12 4fb4: 74 21 je 4fd7 <con_work+0x1ae7> con->in_msg->con = con->ops->get(con); 4fb6: 48 8b 43 08 mov 0x8(%rbx),%rax 4fba: 48 89 df mov %rbx,%rdi 4fbd: ff 10 callq *(%rax) 4fbf: 49 89 44 24 78 mov %rax,0x78(%r12) BUG_ON(con->in_msg->con == NULL); 4fc4: 4c 8b a3 28 04 00 00 mov 0x428(%rbx),%r12 4fcb: 49 83 7c 24 78 00 cmpq $0x0,0x78(%r12) 4fd1: 0f 84 8e 0c 00 00 je 5c65 <con_work+0x2775>
problem appears to be teh if (con->in_msg) line.. %bx con is NULL???
[0]kdb> rd ax: 0000000000000000 bx: 0000000000000000 cx: 0000000000004040 dx: 000000000000006a si: 000000000000006a di: 0000000000000000 bp: ffff8801f7855b60 sp: ffff8801f78559e0 r8: 000000000000006a r9: 0000000000004040 r10: 0000000000000000 r11: 0000000000000000 r12: 000000000000006a r13: 0000000000004040 r14: ffff8801f7855b98 r15: ffff8801f7855ab8 ip: ffffffff815078ef flags: 00010246 cs: 00000010 ss: 00000018 ds: 00000018 es: 00000018 fs: 00000018 gs: 00000018 [0]kdb> bt Stack traceback for pid 10121 0xffff880216f6de80 10121 2 1 0 R 0xffff880216f6e2d8 *kworker/0:1 <c> ffff8801f78559e0<c> 0000000000000018<c> 0000000000000000<c> ffffffff8107fb28<c> <c> 0000000000000000<c> ffffffff00000001<c> 0000000000000000<c> 0000000000000000<c> <c> 0000000000000000<c> 0000000000000000<c> 0000000000000000<c> ffff880216f6de80<c> Call Trace: [<ffffffff8107fb28>] ? finish_task_switch+0x48/0x110 [<ffffffff810ae29d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8162cb50>] ? _raw_spin_unlock_irq+0x30/0x40 [<ffffffff8107fb65>] ? finish_task_switch+0x85/0x110 [<ffffffff8107fb28>] ? finish_task_switch+0x48/0x110 [<ffffffff8162b172>] ? __schedule+0x402/0x820 [<ffffffffa042dfaa>] ? con_work+0x1aba/0x2ed0 [libceph] [<ffffffffa042dfaa>] ? con_work+0x1aba/0x2ed0 [libceph] [<ffffffff81507996>] ? kernel_recvmsg+0x46/0x60 [<ffffffffa042a778>] ? ceph_tcp_recvmsg+0x48/0x60 [libceph] [<ffffffffa042b73b>] ? read_partial_message_section.isra.17+0x6b/0xb0 [libceph] [<ffffffffa042cdbe>] ? con_work+0x8ce/0x2ed0 [libceph] [<ffffffff8108cee0>] ? load_balance+0xd0/0x7e0 [<ffffffff8108da63>] ? idle_balance+0x133/0x180 [<ffffffff8107fb28>] ? finish_task_switch+0x48/0x110 [<ffffffffa042c4f0>] ? ceph_msg_new+0x2e0/0x2e0 [libceph] [0]more> [<ffffffff8106d18a>] ? process_one_work+0x18a/0x510 [<ffffffff8106d11e>] ? process_one_work+0x11e/0x510 [<ffffffff8106ecdf>] ? worker_thread+0x15f/0x350 [<ffffffff8106eb80>] ? manage_workers.isra.27+0x230/0x230 [<ffffffff8107411e>] ? kthread+0xae/0xc0 [<ffffffff810ae29d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff816368f4>] ? kernel_thread_helper+0x4/0x10 [<ffffffff8162d370>] ? retint_restore_args+0x13/0x13 [<ffffffff81074070>] ? __init_kthread_worker+0x70/0x70 [<ffffffff816368f0>] ? gs_change+0x13/0x13
#4 Updated by Sage Weil about 11 years ago
- Status changed from New to Duplicate