Project

General

Profile

Bug #5480

libceph: unexpected old state in con_sock_state_change

Added by Josh Durgin over 10 years ago. Updated over 9 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This happened after running this job in a loop for a while:

machine_type: mira
interactive-on-error: true
overrides:
  admin_socket:
    branch: cuttlefish
  ceph:
    conf:
      global:
        ms inject socket failures: 500
      osd:
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    branch: cuttlefish
  install:
    ceph:
      branch: cuttlefish
  workunit:
    branch: cuttlefish
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph: null
- exec:
    client.0:
    - modprobe rbd
    - echo 'module libceph +p' > /sys/kernel/debug/dynamic_debug/control
    - echo 'module rbd +p' > /sys/kernel/debug/dynamic_debug/control
- workunit:
    clients:
      client.0:
      - rbd/map-unmap.sh

The crash:

<4>[131857.709460] WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/net/ceph/messenger.c:360 ceph_sock_state_change+0x128/0x220 [libceph]()

<4>[131857.758949] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W    3.10.0-rc7-ceph-00046-g2eb04fe #1
<4>[131857.768041] Hardware name: Supermicro X8SIL/X8SIL, BIOS 1.1 05/27/2010
<4>[131857.774684]  ffffffffa0707448 ffff88043fc038d8 ffffffff81630d8f ffff88043fc03918
<4>[131857.782298]  ffffffff8103fae0 ffff88043fc038f8 000000003148e000 ffff88042c743418
<4>[131857.789931]  ffff8800234665a2 0000000000000000 ffff88002346658e ffff88043fc03928
<4>[131857.797570] Call Trace:
<4>[131857.800126]  <IRQ>  [<ffffffff81630d8f>] dump_stack+0x19/0x1b
<4>[131857.806028]  [<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0
<4>[131857.812151]  [<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20
<4>[131857.818111]  [<ffffffffa06ecd58>] ceph_sock_state_change+0x128/0x220 [libceph]
<4>[131857.825502]  [<ffffffff815760d9>] tcp_fin+0x149/0x1c0
<4>[131857.830668]  [<ffffffff81576ef0>] tcp_data_queue+0x750/0xcb0
<4>[131857.836444]  [<ffffffff81579b78>] tcp_rcv_established+0x258/0x880
<4>[131857.842648]  [<ffffffff815852cc>] tcp_v4_do_rcv+0x2cc/0x500
<4>[131857.848337]  [<ffffffff81586915>] tcp_v4_rcv+0x895/0xc00
<4>[131857.853763]  [<ffffffff8155e2f8>] ? ip_local_deliver_finish+0x48/0x370
<4>[131857.860415]  [<ffffffff815490cb>] ? sch_direct_xmit+0x6b/0x280
<4>[131857.866411]  [<ffffffff8155e3bd>] ip_local_deliver_finish+0x10d/0x370
<4>[131857.872967]  [<ffffffff8155e2f8>] ? ip_local_deliver_finish+0x48/0x370
<4>[131857.879619]  [<ffffffff81533147>] ? __skb_dst_set_noref+0x27/0xb0
<4>[131857.885879]  [<ffffffff8155edd8>] ip_local_deliver+0x88/0x90
<4>[131857.891649]  [<ffffffff8155e7a7>] ip_rcv_finish+0x187/0x5f0
<4>[131857.897340]  [<ffffffff8155f031>] ip_rcv+0x251/0x320
<4>[131857.903773]  [<ffffffff8152a853>] __netif_receive_skb_core+0x563/0x750
<4>[131857.910591]  [<ffffffff8152a357>] ? __netif_receive_skb_core+0x67/0x750
<4>[131857.917448]  [<ffffffff8152aa66>] __netif_receive_skb+0x26/0x70
<4>[131857.923846]  [<ffffffff8152ac7d>] netif_receive_skb+0x2d/0xf0
<4>[131857.929911]  [<ffffffff8152b608>] napi_gro_receive+0xe8/0x140
<4>[131857.936020]  [<ffffffffa0083ada>] e1000_receive_skb+0x7a/0xf0 [e1000e]
<4>[131857.943028]  [<ffffffffa0084cae>] e1000_clean_rx_irq+0x24e/0x430 [e1000e]
<4>[131857.950045]  [<ffffffffa008c96f>] e1000e_poll+0xaf/0x310 [e1000e]
<4>[131857.956262]  [<ffffffff810d9603>] ? handle_irq_event+0x53/0x70
<4>[131857.962212]  [<ffffffff8152b404>] net_rx_action+0xb4/0x1d0
<4>[131857.967814]  [<ffffffff81048195>] __do_softirq+0xe5/0x2a0
<4>[131857.973324]  [<ffffffff810484ce>] irq_exit+0x9e/0xc0
<4>[131857.978424]  [<ffffffff81641d83>] do_IRQ+0x63/0xe0
<4>[131857.983328]  [<ffffffff81637caf>] common_interrupt+0x6f/0x6f
<4>[131857.989174]  <EOI>  [<ffffffff814ed5f3>] ? cpuidle_enter_state+0x63/0xe0
<4>[131857.996033]  [<ffffffff814ed5ef>] ? cpuidle_enter_state+0x5f/0xe0
<4>[131858.002240]  [<ffffffff814ed73e>] cpuidle_idle_call+0xce/0x240
<4>[131858.008241]  [<ffffffff8100ac1e>] arch_cpu_idle+0xe/0x30
<4>[131858.013663]  [<ffffffff81091dee>] cpu_startup_entry+0xce/0x2b0
<4>[131858.019615]  [<ffffffff8161ba0c>] rest_init+0xbc/0xd0
<4>[131858.024784]  [<ffffffff8161b955>] ? rest_init+0x5/0xd0
<4>[131858.030035]  [<ffffffff81d00e79>] start_kernel+0x3f1/0x3fe
<4>[131858.035635]  [<ffffffff81d00888>] ? repair_env_string+0x5a/0x5a
<4>[131858.041669]  [<ffffffff81d005a6>] x86_64_start_reservations+0x2a/0x2c
<4>[131858.048225]  [<ffffffff81d0068d>] x86_64_start_kernel+0xe5/0xec

From dmesg:

<7>[131856.801962] rbd:  rbd_client_release: rbdc ffff88042d17da60
<7>[131856.801966] libceph:  destroy_client ffff88041eedb000
<7>[131856.801972] libceph:  try_read tag 1 in_base_pos 0
<7>[131856.801979] libceph:  try_read done on ffff8800592fb030 ret 0
<7>[131856.801984] libceph:  try_write start ffff8800592fb030 state 5
<7>[131856.801988] libceph:  try_write out_kvec_bytes 0
<7>[131856.802100] libceph:  prepare_write_ack ffff8800592fb030 16 -> 17
<7>[131856.802107] libceph:  try_write out_kvec_bytes 9
<7>[131856.802108] libceph:  write_partial_kvec ffff8800592fb030 9 left
<7>[131856.802112] libceph:  write_partial_kvec ffff8800592fb030 0 left in 0 kvecs ret = 1
<7>[131856.802114] libceph:  try_write nothing else to write.
<7>[131856.802115] libceph:  try_write done on ffff8800592fb030 ret 0
<7>[131856.802116] libceph:  put_osd ffff8800592fb000 2 -> 1
<7>[131856.802128] libceph:  osdmap_destroy ffff8802bf78c500
<7>[131856.802140] libceph:  remove_all_osds ffff88041eedb950
<7>[131856.802143] libceph:  __remove_osd ffff8800592fe800
<7>[131856.802148] libceph:  con_close ffff8800592fe830 peer 10.214.134.130:6800
<7>[131856.802152] libceph:  reset_connection ffff8800592fe830
<7>[131856.802155] libceph:  con_close_socket on ffff8800592fe830 sock ffff880411f5bc00
<7>[131856.802170] libceph:  ceph_sock_state_change ffff8800592fe830 state = 1 sk_state = 4
<7>[131856.802181] libceph:  con_sock_state_closed con ffff8800592fe830 sock 3 -> 1
<7>[131856.802184] libceph:  put_osd ffff8800592fe800 4 -> 3
<7>[131856.802187] libceph:  __remove_osd ffff8800592fb000
<7>[131856.802191] libceph:  con_close ffff8800592fb030 peer 10.214.134.134:6800
<7>[131856.802194] libceph:  reset_connection ffff8800592fb030
<7>[131856.802197] libceph:  con_close_socket on ffff8800592fb030 sock ffff88042cef7c00
<7>[131856.802205] libceph:  ceph_sock_state_change ffff8800592fb030 state = 1 sk_state = 4
<7>[131856.802213] libceph:  con_sock_state_closed con ffff8800592fb030 sock 3 -> 1
<7>[131856.802222] libceph:  put_osd ffff8800592fb000 1 -> 0
<7>[131856.802226] libceph:  buffer_release ffff88020e9c2200
<7>[131856.802233] libceph:  msgpool osd_op destroy
<7>[131856.802236] libceph:  msgpool_release osd_op ffff880412e53c18
<7>[131856.802239] libceph:  ceph_msg_put last one on ffff880412e53c18
<7>[131856.802242] libceph:  msg_kfree ffff880412e53c18
<7>[131856.802245] libceph:  msgpool_release osd_op ffff880412e52cb0
<7>[131856.802248] libceph:  ceph_msg_put last one on ffff880412e52cb0
<7>[131856.802251] libceph:  msg_kfree ffff880412e52cb0
<7>[131856.802254] libceph:  msgpool_release osd_op ffff880412e52e80
<7>[131856.802257] libceph:  ceph_msg_put last one on ffff880412e52e80
<7>[131856.802263] libceph:  msg_kfree ffff880412e52e80
<7>[131856.802264] libceph:  msgpool_release osd_op ffff880412e520e8
<7>[131856.802266] libceph:  ceph_msg_put last one on ffff880412e520e8
<7>[131856.802267] libceph:  msg_kfree ffff880412e520e8
<7>[131856.802268] libceph:  msgpool_release osd_op ffff880412e52d98
<7>[131856.802269] libceph:  ceph_msg_put last one on ffff880412e52d98
<7>[131856.802270] libceph:  msg_kfree ffff880412e52d98
<7>[131856.802272] libceph:  msgpool_release osd_op ffff880412e52000
<7>[131856.802273] libceph:  ceph_msg_put last one on ffff880412e52000
<7>[131856.802274] libceph:  msg_kfree ffff880412e52000
<7>[131856.802275] libceph:  msgpool_release osd_op ffff880412e53138
<7>[131856.802277] libceph:  ceph_msg_put last one on ffff880412e53138
<7>[131856.802278] libceph:  msg_kfree ffff880412e53138
<7>[131856.802279] libceph:  msgpool_release osd_op ffff880412e52740
<7>[131856.802280] libceph:  ceph_msg_put last one on ffff880412e52740
<7>[131856.802281] libceph:  msg_kfree ffff880412e52740
<7>[131856.802283] libceph:  msgpool_release osd_op ffff880412e52488
<7>[131856.802285] libceph:  ceph_msg_put last one on ffff880412e52488
<7>[131856.802286] libceph:  msg_kfree ffff880412e52488
<7>[131856.802287] libceph:  msgpool_release osd_op ffff880412e534d8
<7>[131856.802288] libceph:  ceph_msg_put last one on ffff880412e534d8
<7>[131856.802289] libceph:  msg_kfree ffff880412e534d8
<7>[131856.802291] libceph:  msgpool osd_op_reply destroy
<7>[131856.802292] libceph:  msgpool_release osd_op_reply ffff880412e53de8
<7>[131856.802294] libceph:  ceph_msg_put last one on ffff880412e53de8
<7>[131856.802295] libceph:  msg_kfree ffff880412e53de8
<7>[131856.802296] libceph:  msgpool_release osd_op_reply ffff880412e52828
<7>[131856.802297] libceph:  ceph_msg_put last one on ffff880412e52828
<7>[131856.802299] libceph:  msg_kfree ffff880412e52828
<7>[131856.802300] libceph:  msgpool_release osd_op_reply ffff880412e53ed0
<7>[131856.802301] libceph:  ceph_msg_put last one on ffff880412e53ed0
<7>[131856.802302] libceph:  msg_kfree ffff880412e53ed0
<7>[131856.802304] libceph:  msgpool_release osd_op_reply ffff880412e53960
<7>[131856.802305] libceph:  ceph_msg_put last one on ffff880412e53960
<7>[131856.802306] libceph:  msg_kfree ffff880412e53960
<7>[131856.802307] libceph:  msgpool_release osd_op_reply ffff880412e52ae0
<7>[131856.802308] libceph:  ceph_msg_put last one on ffff880412e52ae0
<7>[131856.802310] libceph:  msg_kfree ffff880412e52ae0
<7>[131856.802311] libceph:  msgpool_release osd_op_reply ffff880412e535c0
<7>[131856.802312] libceph:  ceph_msg_put last one on ffff880412e535c0
<7>[131856.802313] libceph:  msg_kfree ffff880412e535c0
<7>[131856.802315] libceph:  msgpool_release osd_op_reply ffff880412e52f68
<7>[131856.802316] libceph:  ceph_msg_put last one on ffff880412e52f68
<7>[131856.802317] libceph:  msg_kfree ffff880412e52f68
<7>[131856.802318] libceph:  msgpool_release osd_op_reply ffff880412e529f8
<7>[131856.802319] libceph:  ceph_msg_put last one on ffff880412e529f8
<7>[131856.802320] libceph:  msg_kfree ffff880412e529f8
<7>[131856.802322] libceph:  msgpool_release osd_op_reply ffff880412e53050
<7>[131856.802323] libceph:  ceph_msg_put last one on ffff880412e53050
<7>[131856.802324] libceph:  msg_kfree ffff880412e53050
<7>[131856.802325] libceph:  msgpool_release osd_op_reply ffff880412e53790
<7>[131856.802326] libceph:  ceph_msg_put last one on ffff880412e53790
<7>[131856.802328] libceph:  msg_kfree ffff880412e53790
<7>[131856.802329] libceph:  stop
<7>[131856.802331] libceph:  __close_session closing mon1
<7>[131856.802333] libceph:  ceph_msg_revoke_incoming msg ffff880412e53d00 null con
<7>[131856.802334] libceph:  ceph_msg_revoke_incoming msg ffff880412e53a48 null con
<7>[131856.802336] libceph:  con_close ffff88041eedb418 peer 10.214.134.134:6789
<7>[131856.802337] libceph:  reset_connection ffff88041eedb418
<7>[131856.802339] libceph:  con_close_socket on ffff88041eedb418 sock ffff88042cef6c00
<7>[131856.802343] libceph:  ceph_sock_state_change ffff88041eedb418 state = 1 sk_state = 4
<7>[131856.802347] libceph:  con_sock_state_closed con ffff88041eedb418 sock 3 -> 1
<7>[131856.802349] libceph:  auth_reset ffff880412790700
<7>[131856.802350] libceph:  reset
<7>[131856.802354] libceph:  auth_destroy ffff880412790700
<7>[131856.802356] libceph:  ceph_x_destroy ffff880412790700
<7>[131856.802357] libceph:  remove_ticket_handler ffff880429dd79c0 1
<7>[131856.802359] libceph:  buffer_release ffff8802bf687400
<7>[131856.802360] libceph:  remove_ticket_handler ffff880429dd7300 2
<7>[131856.802361] libceph:  buffer_release ffff8802bf687080
<7>[131856.802363] libceph:  remove_ticket_handler ffff880429dd71e0 4
<7>[131856.802364] libceph:  buffer_release ffff8802bf687700
<7>[131856.802365] libceph:  remove_ticket_handler ffff880429dd7060 32
<7>[131856.802366] libceph:  buffer_release ffff8802bf687f40
<7>[131856.802368] libceph:  buffer_release ffff8802bf687000
<7>[131856.802369] libceph:  ceph_msg_put last one on ffff880412e52910
<7>[131856.802370] libceph:  msg_kfree ffff880412e52910
<7>[131856.802372] libceph:  ceph_msg_put last one on ffff880412e53d00
<7>[131856.802373] libceph:  msg_kfree ffff880412e53d00
<7>[131856.802374] libceph:  ceph_msg_put last one on ffff880412e533f0
<7>[131856.802375] libceph:  msg_kfree ffff880412e533f0
<7>[131856.802376] libceph:  ceph_msg_put last one on ffff880412e53a48
<7>[131856.802378] libceph:  msg_kfree ffff880412e53a48
<7>[131856.802379] libceph:  ceph_debugfs_client_cleanup ffff88041eedb000
<7>[131856.802397] libceph:  destroy_options ffff880412791900
<7>[131856.802399] libceph:  destroy_client ffff88041eedb000 done
<7>[131857.704702] libceph:  ceph_sock_state_change ffff88042c743418 state = 4096 sk_state = 8
<7>[131857.704716] libceph:  ceph_sock_state_change TCP_CLOSE_WAIT
<4>[131858.058988] con_sock_state_closing: unexpected old state 826859520
<7>[131858.065326] libceph:  con_sock_state_closing con ffff88042c743418 sock 826859520 -> 4

mira087 is in kdb.

History

#1 Updated by Sage Weil over 9 years ago

  • Status changed from New to Can't reproduce

Also available in: Atom PDF