Project

General

Profile

Actions

Bug #2790

closed

libceph: crash in read_partial_message_section on ffsb

Added by Sage Weil almost 12 years ago. Updated over 11 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
Category:
libceph
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description


[0]kdb> bt
Stack traceback for pid 4558
0xffff88020a66bf00     4558        2  1    0   R  0xffff88020a66c358 *kworker/0:2
<c> ffff8801121019e0<c> 0000000000000018<c> 0000000000000000<c> ffff88020a66bf00<c>
<c> 0000000000000000<c> ffffffff00000001<c> 0000000000000000<c> 0000000000000000<c>
<c> 0000000000000000<c> 0000000000000000<c> 0000000000000000<c> ffff88020a66bf00<c>
Call Trace:
 [<ffffffff8108126c>] ? ttwu_stat+0x4c/0x140
 [<ffffffff810894c4>] ? __enqueue_entity+0x74/0x80
 [<ffffffff810812d8>] ? ttwu_stat+0xb8/0x140
 [<ffffffff8108126c>] ? ttwu_stat+0x4c/0x140
 [<ffffffffa04befaa>] ? con_work+0x1aba/0x2ed0 [libceph]
 [<ffffffffa04befaa>] ? con_work+0x1aba/0x2ed0 [libceph]
 [<ffffffff81507996>] ? kernel_recvmsg+0x46/0x60
 [<ffffffffa04bb778>] ? ceph_tcp_recvmsg+0x48/0x60 [libceph]
 [<ffffffffa04bc73b>] ? read_partial_message_section.isra.17+0x6b/0xb0 [libceph]
 [<ffffffffa04bddbe>] ? con_work+0x8ce/0x2ed0 [libceph]
 [<ffffffff8108cee0>] ? load_balance+0xd0/0x7e0
 [<ffffffff8108da63>] ? idle_balance+0x133/0x180
 [<ffffffff8107fb28>] ? finish_task_switch+0x48/0x110
 [<ffffffffa04bd4f0>] ? ceph_msg_new+0x2e0/0x2e0 [libceph]
 [<ffffffff8106d18a>] ? process_one_work+0x18a/0x510
 [<ffffffff8106d11e>] ? process_one_work+0x11e/0x510
[0]more> 
 [<ffffffff8106ecdf>] ? worker_thread+0x15f/0x350
 [<ffffffff8106eb80>] ? manage_workers.isra.27+0x230/0x230
 [<ffffffff8107411e>] ? kthread+0xae/0xc0
 [<ffffffff810ae29d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff816368f4>] ? kernel_thread_helper+0x4/0x10
 [<ffffffff8162cb50>] ? _raw_spin_unlock_irq+0x30/0x40
 [<ffffffff8162d370>] ? retint_restore_args+0x13/0x13
 [<ffffffff81074070>] ? __init_kthread_worker+0x70/0x70

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2012-07-17_07:00:05-marginal-master-testing-basic/12842$ cat config.yaml 
kernel: &id001
  kdb: true
  sha1: 381448be4f27edab3cdeea88bbf6670e19bf4b8a
nuke-on-error: true
overrides:
  ceph:
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: abe05a3fbbb120d8d354623258d9104584db66f7
  workunit:
    sha1: abe05a3fbbb120d8d354623258d9104584db66f7
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
targets:
  ubuntu@plana25.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDuXajaQgHe9XnbLOzI8WWFYVz6+TnOiTzbkIJPGOZpzQEjnUtJraQIEt5ABSeovMjiEj+V4XvunfyuSmEd0H9giRSyjmCHTPGlpndfTeCdVtCBpNqf5GkUqHaEY1Hp57XPbya2rGlwtFm0NeIDYx6pfkejKnsTOUqwhgUb6950TRhjHQhMjFgyALSyfAm/4y6vGZfjm57+yyih6XgDkqWiiQ6Y/aJVR2n+iCzvqEzV7JSCU+Brn+k8IQLHho1fadYqc5PjYct5BaVlHcP6c+T8nJE/DvqGwZ4gQaVJcuWJiDfLOPPYo1g/0AFicxauLwVNJ6HFR9FjLLGtGU+2DcVN
  ubuntu@plana26.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDx0US96hot7gygZ69W4nxJQ9myYnn3I22YtOaSPe+yFWOPJVQUuOST+aw5K6JDcjdO2Gq0aS6s01mgoWpZlO/FVDKss7vZ2KjMp3uPkGMpDZarNbR3QTe5YZYrl7Wfw4pMu4jh92hCWJEzy5nH0H3X2YJhOd5BdOYz0P97qsMSPQGxhlvDBYBhDl9MLgsS3lKm/Js/OPLO+Uf3/SZceCjUqO2m3WsrJSiQJKh8XUWUu3z+6C1Wg6TXSSlA/jdVCiokDg7WYwPN9zMwzzGkGv+GUGHKMZaPGRZb9LQJLTBf/OjwRSgclAVdDc3vnZeYAS5+sDnt2grnJnlBd1rBUj3n
  ubuntu@plana30.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDYDpptyOTrVH3RqiwH5A//Q2CkkVz5dPTpd/s8qG/Q4EHVA4WDMu80pcDvdSewOfFJl83MEtDKKjuJOuEzI4OGn0DPptDN5wHC1OWrXqFMcIaWVe/KBYOdWEZbA7FECeXgEZR1Sid2bH7XDUE9AYalpS2/SmuuHEU1ObL6zSpAqoY6AIPCR6LgFrtxAqrYmIdpb8YfSuI5uPBv6qikl0yvam06WNerUNQ9lnZXFmFm1wBeicRvWH3jZ6w/xlQBIp/zG6k9IJa0vaLm+FqztLkDWri8Qz1dbdsz0bNjyzD6iRuDOpgmz0Kf8m2IjaJRgRgz2ARcOOdBJKmwnnW/knk5
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph: null
- kclient: null
- workunit:
    clients:
      all:
      - suites/ffsb.sh

very reproducible.

Related issues 1 (0 open1 closed)

Related to Linux kernel client - Bug #2867: kclient: crash from ffsb in con_work -> kernel_sendmsg ResolvedSage Weil07/27/2012

Actions
Actions #1

Updated by Sage Weil almost 12 years ago

  • Assignee set to Sage Weil
Actions #2

Updated by Sage Weil over 11 years ago

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2012-07-19_08:03:32-marginal-master-testing-basic/14125

Actions #3

Updated by Sage Weil over 11 years ago


    4f9c:       31 f6                   xor    %esi,%esi

        if (con->ops->alloc_msg) {
                int skip = 0;

                mutex_unlock(&con->mutex);
                con->in_msg = con->ops->alloc_msg(con, hdr, &skip);
    4f9e:       48 89 83 28 04 00 00    mov    %rax,0x428(%rbx)
                mutex_lock(&con->mutex);
    4fa5:       e8 00 00 00 00          callq  4faa <con_work+0x1aba>
                        4fa6: R_X86_64_PC32     mutex_lock_nested-0x4
                if (con->in_msg) {
    4faa:       4c 8b a3 28 04 00 00    mov    0x428(%rbx),%r12
    4fb1:       4d 85 e4                test   %r12,%r12
    4fb4:       74 21                   je     4fd7 <con_work+0x1ae7>
                        con->in_msg->con = con->ops->get(con);
    4fb6:       48 8b 43 08             mov    0x8(%rbx),%rax
    4fba:       48 89 df                mov    %rbx,%rdi
    4fbd:       ff 10                   callq  *(%rax)
    4fbf:       49 89 44 24 78          mov    %rax,0x78(%r12)
                        BUG_ON(con->in_msg->con == NULL);
    4fc4:       4c 8b a3 28 04 00 00    mov    0x428(%rbx),%r12
    4fcb:       49 83 7c 24 78 00       cmpq   $0x0,0x78(%r12)
    4fd1:       0f 84 8e 0c 00 00       je     5c65 <con_work+0x2775>

problem appears to be teh if (con->in_msg) line.. %bx con is NULL???
[0]kdb> rd
ax: 0000000000000000  bx: 0000000000000000  cx: 0000000000004040
dx: 000000000000006a  si: 000000000000006a  di: 0000000000000000
bp: ffff8801f7855b60  sp: ffff8801f78559e0  r8: 000000000000006a
r9: 0000000000004040  r10: 0000000000000000  r11: 0000000000000000
r12: 000000000000006a  r13: 0000000000004040  r14: ffff8801f7855b98
r15: ffff8801f7855ab8  ip: ffffffff815078ef  flags: 00010246  cs: 00000010
ss: 00000018  ds: 00000018  es: 00000018  fs: 00000018  gs: 00000018
[0]kdb> bt      
Stack traceback for pid 10121
0xffff880216f6de80    10121        2  1    0   R  0xffff880216f6e2d8 *kworker/0:1
<c> ffff8801f78559e0<c> 0000000000000018<c> 0000000000000000<c> ffffffff8107fb28<c>
<c> 0000000000000000<c> ffffffff00000001<c> 0000000000000000<c> 0000000000000000<c>
<c> 0000000000000000<c> 0000000000000000<c> 0000000000000000<c> ffff880216f6de80<c>
Call Trace:
 [<ffffffff8107fb28>] ? finish_task_switch+0x48/0x110
 [<ffffffff810ae29d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8162cb50>] ? _raw_spin_unlock_irq+0x30/0x40
 [<ffffffff8107fb65>] ? finish_task_switch+0x85/0x110
 [<ffffffff8107fb28>] ? finish_task_switch+0x48/0x110
 [<ffffffff8162b172>] ? __schedule+0x402/0x820
 [<ffffffffa042dfaa>] ? con_work+0x1aba/0x2ed0 [libceph]
 [<ffffffffa042dfaa>] ? con_work+0x1aba/0x2ed0 [libceph]
 [<ffffffff81507996>] ? kernel_recvmsg+0x46/0x60
 [<ffffffffa042a778>] ? ceph_tcp_recvmsg+0x48/0x60 [libceph]
 [<ffffffffa042b73b>] ? read_partial_message_section.isra.17+0x6b/0xb0 [libceph]
 [<ffffffffa042cdbe>] ? con_work+0x8ce/0x2ed0 [libceph]
 [<ffffffff8108cee0>] ? load_balance+0xd0/0x7e0
 [<ffffffff8108da63>] ? idle_balance+0x133/0x180
 [<ffffffff8107fb28>] ? finish_task_switch+0x48/0x110
 [<ffffffffa042c4f0>] ? ceph_msg_new+0x2e0/0x2e0 [libceph]
[0]more> 
 [<ffffffff8106d18a>] ? process_one_work+0x18a/0x510
 [<ffffffff8106d11e>] ? process_one_work+0x11e/0x510
 [<ffffffff8106ecdf>] ? worker_thread+0x15f/0x350
 [<ffffffff8106eb80>] ? manage_workers.isra.27+0x230/0x230
 [<ffffffff8107411e>] ? kthread+0xae/0xc0
 [<ffffffff810ae29d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff816368f4>] ? kernel_thread_helper+0x4/0x10
 [<ffffffff8162d370>] ? retint_restore_args+0x13/0x13
 [<ffffffff81074070>] ? __init_kthread_worker+0x70/0x70
 [<ffffffff816368f0>] ? gs_change+0x13/0x13

Actions #4

Updated by Sage Weil over 11 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF