Project

General

Profile

Bug #2819

krbd: lockup on large writes, msgr fault injection

Added by Sage Weil over 11 years ago. Updated over 11 years ago.

Status:
Won't Fix
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

krbd + osd socket failure injection + iozone -> crash in uml, without any useful debugging. uml itself either locks up, or produces a core that gdb won't read, or produces a backtrace and core with no useful context.

iozone on cephfs with failure injection does not reproduce.
loop of mkfs.ext4 on rbd does
dd 1k writes to raw rbd does not
dd 1M writes to raw rbd does

History

#1 Updated by Sage Weil over 11 years ago

  • Priority changed from Urgent to High

i'm unable to reproduce this on a real kernel.. it only happens on uml.

here is a full backtrace:


(gdb) bt
#0  0x00007fc3cb65f757 in kill () at ../sysdeps/unix/syscall-template.S:82
#1  0x0000000060037fe6 in uml_abort () at arch/um/os-Linux/util.c:93
#2  0x000000006003824a in os_dump_core () at arch/um/os-Linux/util.c:138
#3  0x0000000060025b9d in panic_exit (self=<optimized out>, 
    unused1=<optimized out>, unused2=<optimized out>)
    at arch/um/kernel/um_arch.c:240
#4  0x0000000060062918 in notifier_call_chain (nl=<optimized out>, val=0, 
    v=0x609983e0, nr_to_call=-2, nr_calls=0x0) at kernel/notifier.c:93
#5  0x00000000600629a7 in __atomic_notifier_call_chain (nr_calls=0x0, 
    nr_to_call=-1, v=<optimized out>, val=<optimized out>, nh=<optimized out>)
    at kernel/notifier.c:182
#6  atomic_notifier_call_chain (nh=<optimized out>, val=<optimized out>, 
    v=<optimized out>) at kernel/notifier.c:191
#7  0x000000006067f4c1 in panic (fmt=0x60764a24 "Segfault with no mm")
    at kernel/panic.c:120
#8  0x0000000060025709 in segv (fi=..., ip=1617460152, is_user=0, 
    regs=0x60851850) at arch/um/kernel/trap.c:232
#9  0x000000006002590f in segv_handler (sig=<optimized out>, 
    regs=<optimized out>) at arch/um/kernel/trap.c:184
#10 0x0000000060036f1c in sig_handler_common (sig=11, mc=0x60851c28)
    at arch/um/os-Linux/signal.c:43
#11 0x0000000060037074 in sig_handler (sig=<optimized out>, mc=<optimized out>)
    at arch/um/os-Linux/signal.c:230
#12 0x0000000060036b57 in hard_handler (sig=<optimized out>, 
    info=<optimized out>, p=0x60851c00) at arch/um/os-Linux/signal.c:164
#13 <signal handler called>
#14 0x0000000060687bb8 in task_pid_nr (tsk=0x0) at include/linux/sched.h:1667
#15 spin_dump (lock=0x60854080, msg=0x607d2ccf "wrong owner")
    at lib/spinlock_debug.c:58
#16 0x0000000060687c7f in spin_bug (lock=<optimized out>, msg=<optimized out>)
    at lib/spinlock_debug.c:75
#17 0x000000006056328b in debug_spin_unlock (lock=0x60854080)
    at lib/spinlock_debug.c:99
#18 do_raw_spin_unlock (lock=0x60854080) at lib/spinlock_debug.c:154
#19 0x000000006068aece in __raw_spin_unlock (lock=<optimized out>)
    at include/linux/spinlock_api_smp.h:152
#20 _raw_spin_unlock (lock=<optimized out>) at kernel/spinlock.c:169
#21 0x0000000060023d0b in spin_unlock (lock=0x60854080)
    at include/linux/spinlock.h:325
#22 sigio_unlock () at arch/um/kernel/sigio.c:49
#23 0x0000000060036671 in add_sigio_fd (fd=<optimized out>)
    at arch/um/os-Linux/sigio.c:198
#24 0x00000000600225fd in reactivate_fd (fd=13, irqnum=<optimized out>)
    at arch/um/kernel/irq.c:240
#25 0x000000006002ca80 in uml_net_interrupt (dev_id=0x9f0a1800, 
    irq=<optimized out>) at arch/um/drivers/net_kern.c:140
#26 uml_net_interrupt (irq=<optimized out>, dev_id=0x9f0a1800)
    at arch/um/drivers/net_kern.c:115
#27 0x000000006007ca23 in handle_irq_event_percpu (desc=0x60864cf8, 
    action=0x9f1ddcc0) at kernel/irq/handle.c:142
#28 0x000000006007cbb7 in handle_irq_event (desc=0x60864cf8)
    at kernel/irq/handle.c:192
#29 0x000000006007f211 in handle_edge_irq (irq=<optimized out>, 
    desc=0x60864cf8) at kernel/irq/chip.c:519
#30 0x000000006007c320 in generic_handle_irq_desc (desc=<optimized out>, 
    irq=<optimized out>) at include/linux/irqdesc.h:114
#31 generic_handle_irq (irq=<optimized out>) at kernel/irq/irqdesc.c:314
#32 0x00000000600226ab in do_IRQ (irq=<optimized out>, regs=<optimized out>)
    at arch/um/kernel/irq.c:294
#33 0x0000000060022737 in sigio_handler (sig=<optimized out>, regs=0x92d0a240)
    at arch/um/kernel/irq.c:53
#34 0x0000000060036f1c in sig_handler_common (sig=29, mc=0x0)
    at arch/um/os-Linux/signal.c:43
#35 0x0000000060036e69 in unblock_signals () at arch/um/os-Linux/signal.c:278
#36 0x0000000060036fbb in set_signals (enable=<optimized out>)
    at arch/um/os-Linux/signal.c:298
#37 0x000000006068aef3 in arch_local_irq_restore (flags=1)
    at /mnt/sdd/ceph-client/arch/um/include/asm/irqflags.h:16
#38 __raw_spin_unlock_irqrestore (flags=1, lock=<optimized out>)
    at include/linux/spinlock_api_smp.h:161
#39 _raw_spin_unlock_irqrestore (lock=<optimized out>, flags=<optimized out>)
    at kernel/spinlock.c:177
#40 0x000000006002bc68 in spin_unlock_irqrestore (flags=1, 
    lock=<optimized out>) at include/linux/spinlock.h:340
#41 uml_net_start_xmit (skb=0x8f9b5f20, dev=0x9f0a1800)
    at arch/um/drivers/net_kern.c:240
#42 0x00000000605bd21d in dev_hard_start_xmit (skb=0x8f9b5f20, dev=0x9f0a1800, 
    txq=0x9f7f7ef0) at net/core/dev.c:2212
#43 0x00000000605d02aa in sch_direct_xmit (skb=0x8f9b5f20, q=0x9c4c1c00, 
    dev=0x9f0a1800, txq=0x9f7f7ef0, root_lock=0x9c4c1ca0)
    at net/sched/sch_generic.c:124
#44 0x00000000605bd576 in __dev_xmit_skb (txq=<optimized out>, dev=0x9f0a1800, 
    q=0x9c4c1c00, skb=0x8f9b5f20) at net/core/dev.c:2415
#45 dev_queue_xmit (skb=0x8f9b5f20) at net/core/dev.c:2508
---Type <return> to continue, or q <return> to quit---
#46 0x00000000605df650 in neigh_hh_output (skb=<optimized out>, 
    hh=<optimized out>) at include/net/neighbour.h:351
#47 neigh_output (skb=0x8f9b5f20, n=0x9b6d0df0) at include/net/neighbour.h:358
#48 ip_finish_output2 (skb=0x8f9b5f20) at net/ipv4/ip_output.c:210
#49 ip_finish_output (skb=<optimized out>) at net/ipv4/ip_output.c:243
#50 0x00000000605e09c1 in ip_output (skb=0x8f9b5f20)
    at net/ipv4/ip_output.c:316
#51 0x00000000605e024a in dst_output (skb=0x8f9b5f20) at include/net/dst.h:435
#52 ip_local_out (skb=0x8f9b5f20) at net/ipv4/ip_output.c:110
#53 0x00000000605e057e in ip_queue_xmit (skb=0x8f9b5f20, fl=0x9ccc0c48)
    at net/ipv4/ip_output.c:412
#54 0x00000000605f59bb in tcp_transmit_skb (sk=0x9ccc0900, skb=0x8f9b5f20, 
    clone_it=<optimized out>, gfp_mask=<optimized out>)
    at net/ipv4/tcp_output.c:905
#55 0x00000000605f639e in tcp_write_xmit (sk=0x9ccc0900, mss_now=1448, 
    nonagle=0, push_one=0, gfp=32) at net/ipv4/tcp_output.c:1814
#56 0x00000000605f6573 in __tcp_push_pending_frames (sk=0x9ccc0900, 
    cur_mss=<optimized out>, nonagle=<optimized out>)
    at net/ipv4/tcp_output.c:1852
#57 0x00000000605f2f2b in tcp_push_pending_frames (sk=0x9ccc0900)
    at include/net/tcp.h:1396
#58 tcp_data_snd_check (sk=0x9ccc0900) at net/ipv4/tcp_input.c:5198
#59 tcp_rcv_established (sk=0x9ccc0900, skb=<optimized out>, th=0x97aa37e2, 
    len=<optimized out>) at net/ipv4/tcp_input.c:5712
#60 0x00000000605fa052 in tcp_v4_do_rcv (sk=0x9ccc0900, skb=0x9b6cff28)
    at net/ipv4/tcp_ipv4.c:1652
#61 0x00000000605fbfb0 in tcp_v4_rcv (skb=0x9b6cff28)
    at net/ipv4/tcp_ipv4.c:1755
#62 0x00000000605db9e3 in ip_local_deliver_finish (skb=0x9b6cff28)
    at net/ipv4/ip_input.c:226
#63 ip_local_deliver (skb=0x9b6cff28) at net/ipv4/ip_input.c:264
#64 0x00000000605dbf01 in dst_input (skb=0x9b6cff28) at include/net/dst.h:441
#65 ip_rcv_finish (skb=0x9b6cff28) at net/ipv4/ip_input.c:365
#66 ip_rcv (skb=<optimized out>, dev=<optimized out>, pt=<optimized out>, 
    orig_dev=<optimized out>) at net/ipv4/ip_input.c:443
#67 0x00000000605b8896 in __netif_receive_skb (skb=0x9b6cff28)
    at net/core/dev.c:3235
#68 0x00000000605b8975 in process_backlog (napi=0x6088d960, quota=1)
    at net/core/dev.c:3685
#69 0x00000000605bb2b9 in net_rx_action (h=<optimized out>)
    at net/core/dev.c:3843
#70 0x0000000060047d65 in __do_softirq () at kernel/softirq.c:238
#71 0x0000000060048007 in do_softirq () at kernel/softirq.c:285
#72 do_softirq () at kernel/softirq.c:272
#73 0x0000000060048267 in invoke_softirq () at kernel/softirq.c:319
#74 irq_exit () at kernel/softirq.c:338
#75 0x00000000600226b7 in do_IRQ (irq=<optimized out>, regs=<optimized out>)
    at arch/um/kernel/irq.c:295
#76 0x0000000060022737 in sigio_handler (sig=<optimized out>, regs=0x92d0acf0)
    at arch/um/kernel/irq.c:53
#77 0x0000000060036f1c in sig_handler_common (sig=29, mc=0x0)
    at arch/um/os-Linux/signal.c:43
#78 0x0000000060036e69 in unblock_signals () at arch/um/os-Linux/signal.c:278
#79 0x0000000060036fbb in set_signals (enable=<optimized out>)
    at arch/um/os-Linux/signal.c:298
#80 0x00000000600425ca in arch_local_irq_restore (flags=1)
    at /mnt/sdd/ceph-client/arch/um/include/asm/irqflags.h:16
#81 vprintk_emit (facility=0, level=7, dict=0x0, dictlen=0, 
    fmt=<optimized out>, args=<optimized out>) at kernel/printk.c:1550
#82 0x000000006067f694 in printk (fmt=<optimized out>) at kernel/printk.c:1612
#83 0x000000006056c9f3 in __dynamic_pr_debug (descriptor=<optimized out>, 
    fmt=<optimized out>) at lib/dynamic_debug.c:564
#84 0x000000006066c56f in get_osd (osd=0x9c1067f0) at net/ceph/osd_client.c:654
#85 get_osd_con (con=0x9c106820) at net/ceph/osd_client.c:2122
#86 0x0000000060664b59 in ceph_con_send (con=0x9c106820, msg=0x97a9aef0)
    at net/ceph/messenger.c:2490
#87 0x000000006066ce81 in __send_request (osdc=<optimized out>, req=0x97a97bf0)
    at net/ceph/osd_client.c:1051
#88 0x000000006066dceb in ceph_osdc_start_request (osdc=0x9ce717a8, 
    req=0x97a97bf0, nofail=false) at net/ceph/osd_client.c:1751
#89 0x000000006058c547 in rbd_do_request (rq=<optimized out>, 
    rbd_dev=0x9c59fbf0, snapc=0x9f2e1420, snapid=18446744073709551614, 
    object_name=0x97a96f30 "rb.0.100d.2ea0144e.00000000082b", ofs=0, 
    len=147456, bio=0x97a6cf30, pages=0x0, flags=36, ops=0x9b68e8f0, 
    coll=0x979edea8, coll_index=1, rbd_cb=0x6058bcf0 <rbd_req_cb>, 
    linger_req=0x0, ver=0x0, num_pages=<optimized out>)
    at drivers/block/rbd.c:946
#90 0x000000006058ce88 in rbd_do_op (rq=0x9766ae98, rbd_dev=0x9c59fbf0, 
    snapc=0x9f2e1420, snapid=18446744073709551614, opcode=8705, flags=36, 
    ofs=8770289664, len=147456, bio=0x97a6cf30, coll=0x979edea8, coll_index=1)
    at drivers/block/rbd.c:1116
#91 0x000000006058d5dc in rbd_req_read (coll_index=<optimized out>, 
    coll=<optimized out>, bio=<optimized out>, len=<optimized out>, 
    ofs=<optimized out>, snapid=<optimized out>, rbd_dev=<optimized out>, 
---Type <return> to continue, or q <return> to quit---q

the socket (sk=..) in question is connected to ceph_connection 000000009c106820. here is the last bit of the log:
[  316.240000] libceph:   messenger.c:2671 : ceph_msg_new 0000000097a98ef0 front 512
[  316.240000] libceph:   messenger.c:2671 : ceph_msg_new 0000000097a9aef0 front 208
[  316.240000] libceph:      osdmap.c:1000 : mapping 0~147456  osize 1073741824 fl_su 1073741824
[  316.240000] libceph:      osdmap.c:1003 : osize 1073741824 / su 1073741824 = su_per_object 1
[  316.240000] libceph:      osdmap.c:1010 : off 0 / su 1073741824 = bl 0
[  316.240000] libceph:      osdmap.c:1017 : objset 0 * sc 1 = ono 0
[  316.240000] libceph:      osdmap.c:1032 :  obj extent 0~147456
[  316.240000] libceph:  osd_client.c:85   : calc_layout bno=0 0~147456 (36 pages)
[  316.240000] libceph:  osd_client.c:821  : __register_request 0000000097a97bf0 tid 3057
[  316.240000] libceph:  osd_client.c:963  : map_request 0000000097a97bf0 tid 3057
[  316.240000] libceph:      osdmap.c:1063 : calc_object_layout 'rb.0.100d.2ea0144e.00000000082b' pgid 2.c63f57a9
[  316.240000] libceph:  osd_client.c:989  : map_request tid 3057 pgid 2.57a9 osd0 (was osd-1)
[  316.240000] libceph:  osd_client.c:712  : __remove_osd_from_lru 000000009c1067f0
[  316.240000] libceph:  osd_client.c:1040 : send_request 0000000097a97bf0 tid 3057 to osd0 flags 36
[  316.240000] libceph:   messenger.c:1089 : write_partial_msg_pages 000000009c106820 msg 000000008d900ef0 done
[  316.240000] libceph:   messenger.c:705  : prepare_write_message_footer 000000009c106820
[  316.240000] libceph:   messenger.c:937  : write_partial_kvec 000000009c106820 13 left
[  316.240000] libceph:   messenger.c:966  : write_partial_kvec 000000009c106820 0 left in 0 kvecs ret = 1
[  316.240000] libceph:   messenger.c:759  : prepare_write_message 000000008d948ef0 seq 8 type 42 len 139+0+266240 65 pgs
[  316.240000] libceph:   messenger.c:786  : prepare_write_message front_crc 2611082766 middle_crc 0
[  316.240000] libceph:   messenger.c:2019 : try_write out_kvec_bytes 193
[  316.240000] libceph:   messenger.c:937  : write_partial_kvec 000000009c106820 193 left
[  316.240000] libceph:   messenger.c:966  : write_partial_kvec 000000009c106820 0 left in 0 kvecs ret = 1
[  316.240000] libceph:   messenger.c:1020 : write_partial_msg_pages 000000009c106820 msg 000000008d948ef0 page 0/65 offset 0
[  316.260000] libceph:   messenger.c:1089 : write_partial_msg_pages 000000009c106820 msg 000000008d948ef0 done
[  316.260000] libceph:   messenger.c:705  : prepare_write_message_footer 000000009c106820
[  316.260000] libceph:   messenger.c:937  : write_partial_kvec 000000009c106820 13 left
[  316.260000] libceph:   messenger.c:966  : write_partial_kvec 000000009c106820 0 left in 0 kvecs ret = 1
[  316.260000] libceph:   messenger.c:759  : prepare_write_message 000000007552eef0 seq 9 type 42 len 139+0+258048 63 pgs
[  316.260000] libceph:   messenger.c:786  : prepare_write_message front_crc 2684432628 middle_crc 0
[  316.260000] libceph:   messenger.c:2019 : try_write out_kvec_bytes 193
[  316.260000] libceph:   messenger.c:937  : write_partial_kvec 000000009c106820 193 left
[  316.260000] libceph:   messenger.c:966  : write_partial_kvec 000000009c106820 0 left in 0 kvecs ret = 1
[  316.260000] libceph:   messenger.c:1020 : write_partial_msg_pages 000000009c106820 msg 000000007552eef0 page 0/63 offset 0
[  316.270000] libceph:   messenger.c:1089 : write_partial_msg_pages 000000009c106820 msg 000000007552eef0 done
[  316.270000] libceph:   messenger.c:705  : prepare_write_message_footer 000000009c106820
[  316.270000] libceph:   messenger.c:937  : write_partial_kvec 000000009c106820 13 left
[  316.270000] libceph:   messenger.c:966  : write_partial_kvec 000000009c106820 0 left in 0 kvecs ret = 1
[  316.270000] libceph:   messenger.c:759  : prepare_write_message 0000000088428ef0 seq 10 type 42 len 139+0+524288 128 pgs
[  316.270000] libceph:   messenger.c:786  : prepare_write_message front_crc 2256008681 middle_crc 0
[  316.270000] libceph:   messenger.c:2019 : try_write out_kvec_bytes 193
[  316.270000] libceph:   messenger.c:937  : write_partial_kvec 000000009c106820 193 left
[  316.270000] libceph:   messenger.c:966  : write_partial_kvec 000000009c106820 0 left in 0 kvecs ret = 1
[  316.270000] libceph:   messenger.c:1020 : write_partial_msg_pages 000000009c106820 msg 0000000088428ef0 page 0/128 offset 0
[  316.300000] libceph:   messenger.c:1089 : write_partial_msg_pages 000000009c106820 msg 0000000088428ef0 done
[  316.300000] libceph:   messenger.c:705  : prepare_write_message_footer 000000009c106820
[  316.300000] libceph:   messenger.c:937  : write_partial_kvec 000000009c106820 13 left
[  316.300000] libceph:   messenger.c:966  : write_partial_kvec 000000009c106820 0 left in 0 kvecs ret = 1
[  316.300000] libceph:   messenger.c:759  : prepare_write_message 00000000884afef0 seq 11 type 42 len 139+0+524288 128 pgs
[  316.300000] libceph:   messenger.c:786  : prepare_write_message front_crc 3704656303 middle_crc 0
[  316.300000] libceph:   messenger.c:2019 : try_write out_kvec_bytes 193
[  316.300000] libceph:   messenger.c:937  : write_partial_kvec 000000009c106820 193 left
[  316.300000] libceph:   messenger.c:966  : write_partial_kvec 000000009c106820 0 left in 0 kvecs ret = 1
[  316.300000] libceph:   messenger.c:1020 : write_partial_msg_pages 000000009c106820 msg 00000000884afef0 page 0/128 offset 0
[  316.300000] libceph:   messenger.c:2097 : try_write done on 000000009c106820 ret 0
[  316.300000] libceph:  osd_client.c:666  : put_osd 000000009c1067f0 49 -> 48
[  316.300000] libceph:   messenger.c:2111 : try_read start on 000000009c433820 state 5
[  316.300000] libceph:   messenger.c:2120 : try_read tag 1 in_base_pos 0
[  316.300000] libceph:   messenger.c:2181 : try_read got tag 8
[  316.300000] libceph:   messenger.c:1138 : prepare_read_ack 000000009c433820
[  316.300000] libceph:   messenger.c:1703 : got ack for seq 1 type 42 at 000000009c2eeef0
[  316.300000] libceph:  osd_client.c:666  : put_osd 000000009c4337f0 96 -> 95
[  316.300000] libceph:   messenger.c:1144 : prepare_read_tag 000000009c433820
[  316.300000] libceph:   messenger.c:2111 : try_read start on 000000009c433820 state 5
[  316.300000] libceph:   messenger.c:2120 : try_read tag 1 in_base_pos 0
[  316.300000] libceph:   messenger.c:2181 : try_read got tag 7
[  316.300000] libceph:   messenger.c:1154 : prepare_read_message 000000009c433820
[  316.300000] libceph:   messenger.c:1818 : read_partial_message con 000000009c433820 msg           (null)
[  316.300000] libceph:   messenger.c:1868 : got hdr type 43 front 93 data 0
[  316.300000] libceph:  osd_client.c:655  : get_osd 000000009c1067f0 48 -> 49
[  316.300000] 
[  316.300000] Modules linked in:
[  316.300000] Pid: -1831820912, comm:  Not tainted 3.5.0-00117-g4821f45
[  316.300000] RIP: 0033:[<0000000060687bb8>]
[  316.300000] RSP: 0000000092d09fc0  EFLAGS: 00010206
[  316.300000] RAX: 0000000000000000 RBX: 0000000060854080 RCX: 0000000000000001
[  316.300000] RDX: 000000006085fad0 RSI: 00000000607d2ccf RDI: 0000000060854080
[  316.300000] RBP: 0000000092d09fe0 R08: 0000000000000000 R09: 0000000000000000
[  316.300000] R10: 0000000000000000 R11: 00007fc3cb6b968c R12: 0000000000000000
[  316.300000] R13: 000000006002bd56 R14: 000000000000000d R15: 000000009b68f7a0
...

i don't see anything out of the ordinary

#2 Updated by Alex Elder over 11 years ago

That's a huge stack, with lots of network interrupts.
I don't know whether UML has the same stack limits
as the normal kernel does. But it looks to me like
maybe a debug print got interrupted by an incoming
network packet, and that led to outgoing packets,
and somehow another interrupt for incoming work too.

No thoughts on a fix, just a little info is all
(sorry).

#3 Updated by Sage Weil over 11 years ago

maybe this is just a matter of increasing my stack size? a stack overflow might explain my unhelpful uml crash...

#4 Updated by David Zafman over 11 years ago

I'm saw a lockup. My UML kernel appears to be wedge (no console echo). I'm getting continuous SIGSEGVs. The stack has 134 frames.

Stack overview:
SEGV hit by account_group_system_time()
From do_IRQ() which interrupted spin lock debugging
From debugging in kmem_cache_free() called by uml_net_rx()
Further down was a do_softirq() interrupting printk() from ceph OSD code.

The tsk is garbage
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
account_group_system_time (cputime=1, tsk=0x60aca778) at /home/dzafman/linux/kernel/sched/stats.h:202
202 if (!cputimer->running)
(gdb) list 190
185 }
186
187 /**
188 * account_group_system_time - Maintain stime for a thread group.
189 *
190 * @tsk: Pointer to task structure.
191 * @cputime: Time value by which to increment the stime field of the
192 * thread_group_cputime structure.
193 *
194 * If thread group time is being maintained, get the structure for the
(gdb) list
195 * running CPU and update the stime field there.
196 */
197 static inline void account_group_system_time(struct task_struct *tsk,
198 cputime_t cputime)
199 {
200 struct thread_group_cputimer *cputimer = &tsk->signal->cputimer;
201
202 if (!cputimer->running)
203 return;
204
(gdb) p tsk
$8 = (struct task_struct *) 0x60aca778
(gdb) p tsk->state
$11 = -2401176521382297600

The tsk is pointing at garbage.

#5 Updated by Alex Elder over 11 years ago

(Sadly, my machine crashed in the middle of a somewhat elaborate
update to this. I'll try to recapture what I had already said...)

I think this may just be a UML bug, or at least a problem that it's
up to whoever develops UML to fix.

Looking again at your original stack:
- you were doing an rbd read request
- that started working on sending a request to an osd
- that required getting the connection for an osd
- while working on that, a dout() call was made
- in the midst of that a network receive interrupt arrived
(via SIGIO)
- that landed in sig_handler_common.c:230
- while processing the incoming data, the network driver
started transmitting some data queued for output
(Here it starts getting fuzzy)
- when that was done, the uml network code was called to
handle another receive interrupt (?)
- in doing so we-reenter the same path, through sigio
handling, which goes through sigio_handler() and
leads to sigio_lock() which tries to acquire sigio_spinlock.
- I think that this lock may already be held, by the
processing earlier on in this chain.

In any case, this may be a problem but I think it's probably
not our problem (directly) so this bug should be closed,
or set aside somehow.

#6 Updated by Sage Weil over 11 years ago

  • Status changed from New to Won't Fix

Also available in: Atom PDF