Bug #1793
closedNULL pointer dereference at try_write+0x627/0x1060
0%
Description
Found in sepia50's console:
[ 8727.491802] Call Trace: [ 8727.491802] [<ffffffff8107f0c5>] wq_worker_sleeping+0x15/0xa0 [ 8727.491802] [<ffffffff81600763>] __schedule+0x5c3/0x940 [ 8727.491802] [<ffffffff81066e81>] ? do_exit+0x551/0x880 [ 8727.491802] [<ffffffff81065d7d>] ? release_task+0x1d/0x470 [ 8727.491802] [<ffffffff81600e0f>] schedule+0x3f/0x60 [ 8727.491802] [<ffffffff81066ef7>] do_exit+0x5c7/0x880 [ 8727.491802] [<ffffffff81063b05>] ? kmsg_dump+0x75/0x140 [ 8727.491802] [<ffffffff81604ac0>] oops_end+0xb0/0xf0 [ 8727.491802] [<ffffffff8103f85d>] no_context+0xfd/0x270 [ 8727.491802] [<ffffffff814edfd5>] ? release_sock+0x35/0x190 [ 8727.491802] [<ffffffff8103fb15>] __bad_area_nosemaphore+0x145/0x230 [ 8727.491802] [<ffffffff810509a3>] ? get_parent_ip+0x33/0x50 [ 8727.491802] [<ffffffff8103fc13>] bad_area_nosemaphore+0x13/0x20 [ 8727.491802] [<ffffffff8160763e>] do_page_fault+0x34e/0x4b0 [ 8727.491802] [<ffffffff814ee0e4>] ? release_sock+0x144/0x190 [ 8727.491802] [<ffffffff815429e5>] ? tcp_sendpage+0xc5/0x590 [ 8727.491802] [<ffffffff81313fbd>] ? trace_hardirqs_off_thunk+0x3a/0x3c [ 8727.491802] [<ffffffff81603ef5>] page_fault+0x25/0x30 [ 8727.491802] [<ffffffffa00c6a87>] ? try_write+0x627/0x1060 [libceph] [ 8727.491802] [<ffffffff814e8db4>] ? kernel_recvmsg+0x44/0x60 [ 8727.491802] [<ffffffffa00c5748>] ? ceph_tcp_recvmsg+0x48/0x60 [libceph] [ 8727.491802] [<ffffffffa00c80d0>] con_work+0xc10/0x1b00 [libceph] [ 8727.491802] [<ffffffff81313f7e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 8727.491802] [<ffffffff81053978>] ? finish_task_switch+0x48/0x120 [ 8727.491802] [<ffffffff8107fcb6>] process_one_work+0x1a6/0x520 [ 8727.491802] [<ffffffff8107fc47>] ? process_one_work+0x137/0x520 [ 8727.491802] [<ffffffffa00c74c0>] ? try_write+0x1060/0x1060 [libceph] [ 8727.491802] [<ffffffff81081fe3>] worker_thread+0x173/0x400 [ 8727.491802] [<ffffffff81081e70>] ? manage_workers+0x210/0x210 [ 8727.491802] [<ffffffff81087026>] kthread+0xb6/0xc0 [ 8727.491802] [<ffffffff8160dd44>] kernel_thread_helper+0x4/0x10 [ 8727.491802] [<ffffffff81603c74>] ? retint_restore_args+0x13/0x13 [ 8727.491802] [<ffffffff81086f70>] ? __init_kthread_worker+0x70/0x70 [ 8727.491802] [<ffffffff8160dd40>] ? gs_change+0x13/0x13 [ 8727.491802] Code: 66 66 66 90 65 48 8b 04 25 40 c4 00 00 48 8b 80 48 03 00 00 8b 40 f0 c9 c3 66 90 55 48 89 e5 66 66 66 66 90 48 8b 87 48 03 00 00 [ 8727.491802] 8b 40 f8 c9 c3 eb 08 90 90 90 90 90 90 90 90 55 48 89 e5 66 [ 8727.491802] RIP [<ffffffff81086a10>] kthread_data+0x10/0x20 [ 8727.491802] RSP <ffff8800455b76f8> [ 8727.491802] CR2: fffffffffffffff8 [ 8727.491802] ---[ end trace 9654ef6c74784bc5 ]--- [ 8727.491802] Fixing recursive fault but reboot is needed! [ 8727.491802] BUG: spinlock lockup on CPU#0, kworker/0:2/5057 [ 8727.491802] lock: ffff8800fbc13b80, .magic: dead4ead, .owner: kworker/0:2/5057, .owner_cpu: 0 [ 8727.491802] Pid: 5057, comm: kworker/0:2 Tainted: G D 3.1.0-ceph-08936-gb643987 #1 [ 8727.491802] Call Trace: [ 8727.491802] [<ffffffff8131a178>] spin_dump+0x78/0xc0 [ 8727.491802] [<ffffffff8131a39d>] do_raw_spin_lock+0xed/0x120 [ 8727.491802] [<ffffffff816031e5>] _raw_spin_lock_irq+0x45/0x50 [ 8727.491802] [<ffffffff81600277>] ? __schedule+0xd7/0x940 [ 8727.491802] [<ffffffff81600277>] __schedule+0xd7/0x940 [ 8727.491802] [<ffffffff81600e0f>] schedule+0x3f/0x60 [ 8727.491802] [<ffffffff81067164>] do_exit+0x834/0x880 [ 8727.491802] [<ffffffff81063b95>] ? kmsg_dump+0x105/0x140 [ 8727.491802] [<ffffffff81063b05>] ? kmsg_dump+0x75/0x140 [ 8727.491802] [<ffffffff81604ac0>] oops_end+0xb0/0xf0 [ 8727.491802] [<ffffffff8103f85d>] no_context+0xfd/0x270 [ 8727.491802] [<ffffffff8103fb15>] __bad_area_nosemaphore+0x145/0x230 [ 8727.491802] [<ffffffff811b3af0>] ? fsnotify_clear_marks_by_inode+0x30/0xf0 [ 8727.491802] [<ffffffff8103fc13>] bad_area_nosemaphore+0x13/0x20 [ 8727.491802] [<ffffffff8160763e>] do_page_fault+0x34e/0x4b0 [ 8727.491802] [<ffffffff81057bc4>] ? cpuacct_charge+0x24/0xb0 [ 8727.491802] [<ffffffff81057bc4>] ? cpuacct_charge+0x24/0xb0 [ 8727.491802] [<ffffffff81313fbd>] ? trace_hardirqs_off_thunk+0x3a/0x3c [ 8727.491802] [<ffffffff81603ef5>] page_fault+0x25/0x30 [ 8727.491802] [<ffffffff81086a10>] ? kthread_data+0x10/0x20 [ 8727.491802] [<ffffffff8107f0c5>] wq_worker_sleeping+0x15/0xa0 [ 8727.491802] [<ffffffff81600763>] __schedule+0x5c3/0x940 [ 8727.491802] [<ffffffff81066e81>] ? do_exit+0x551/0x880 [ 8727.491802] [<ffffffff81065d7d>] ? release_task+0x1d/0x470 [ 8727.491802] [<ffffffff81600e0f>] schedule+0x3f/0x60 [ 8727.491802] [<ffffffff81066ef7>] do_exit+0x5c7/0x880 [ 8727.491802] [<ffffffff81063b05>] ? kmsg_dump+0x75/0x140 [ 8727.491802] [<ffffffff81604ac0>] oops_end+0xb0/0xf0 [ 8727.491802] [<ffffffff8103f85d>] no_context+0xfd/0x270 [ 8727.491802] [<ffffffff814edfd5>] ? release_sock+0x35/0x190 [ 8727.491802] [<ffffffff8103fb15>] __bad_area_nosemaphore+0x145/0x230 [ 8727.491802] [<ffffffff810509a3>] ? get_parent_ip+0x33/0x50 [ 8727.491802] [<ffffffff8103fc13>] bad_area_nosemaphore+0x13/0x20 [ 8727.491802] [<ffffffff8160763e>] do_page_fault+0x34e/0x4b0 [ 8727.491802] [<ffffffff814ee0e4>] ? release_sock+0x144/0x190 [ 8727.491802] [<ffffffff815429e5>] ? tcp_sendpage+0xc5/0x590 [ 8727.491802] [<ffffffff81313fbd>] ? trace_hardirqs_off_thunk+0x3a/0x3c [ 8727.491802] [<ffffffff81603ef5>] page_fault+0x25/0x30 [ 8727.491802] [<ffffffffa00c6a87>] ? try_write+0x627/0x1060 [libceph] [ 8727.491802] [<ffffffff814e8db4>] ? kernel_recvmsg+0x44/0x60 [ 8727.491802] [<ffffffffa00c5748>] ? ceph_tcp_recvmsg+0x48/0x60 [libceph] [ 8727.491802] [<ffffffffa00c80d0>] con_work+0xc10/0x1b00 [libceph] [ 8727.491802] [<ffffffff81313f7e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 8727.491802] [<ffffffff81053978>] ? finish_task_switch+0x48/0x120 [ 8727.491802] [<ffffffff8107fcb6>] process_one_work+0x1a6/0x520 [ 8727.491802] [<ffffffff8107fc47>] ? process_one_work+0x137/0x520 [ 8727.491802] [<ffffffffa00c74c0>] ? try_write+0x1060/0x1060 [libceph] [ 8727.491802] [<ffffffff81081fe3>] worker_thread+0x173/0x400 [ 8727.491802] [<ffffffff81081e70>] ? manage_workers+0x210/0x210 [ 8727.491802] [<ffffffff81087026>] kthread+0xb6/0xc0 [ 8727.491802] [<ffffffff8160dd44>] kernel_thread_helper+0x4/0x10 [ 8727.491802] [<ffffffff81603c74>] ? retint_restore_args+0x13/0x13 [ 8727.491802] [<ffffffff81086f70>] ? __init_kthread_worker+0x70/0x70 [ 8727.491802] [<ffffffff8160dd40>] ? gs_change+0x13/0x13
Updated by Josh Durgin over 12 years ago
Got the same trace on sepia18 while running mkfs.ext3 on an rbd image.
Updated by Josh Durgin over 12 years ago
A disk error prevented me from getting logs before:
[ 226.628240] sd 2:0:0:0: [sda] Unhandled error code [ 226.637615] sd 2:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [ 226.649859] sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 39 84 61 f0 00 00 10 00 [ 226.661756] end_request: I/O error, dev sda, sector 964977136 [ 226.672023] Aborting journal on device sda1-8. [ 226.680985] sd 2:0:0:0: [sda] Unhandled error code [ 226.690213] sd 2:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [ 226.702446] sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 39 84 08 00 00 00 08 00 [ 226.714316] end_request: I/O error, dev sda, sector 964954112 [ 226.724576] quiet_error: 24 callbacks suppressed [ 226.733687] Buffer I/O error on device sda1, logical block 120619008 [ 226.744608] lost page write due to I/O error on sda1 [ 226.744620] JBD2: I/O error detected when updating journal superblock for sda1-8.
But now I got the full trace of what happens before the page fault handling:
Dec 9 15:44:44 sepia18 kernel: [ 126.472614] libceph: client4104 fsid b10de2f1-02b3-4bd7-90cb-544fa7dca7d1 Dec 9 15:44:44 sepia18 kernel: [ 126.472802] libceph: mon0 10.3.14.141:6789 session established Dec 9 15:44:44 sepia18 kernel: [ 126.474825] rbd0: unknown partition table Dec 9 15:44:44 sepia18 kernel: [ 126.474907] rbd: rbd0: added with size 0x500000000 Dec 9 15:44:45 sepia18 kernel: [ 127.043581] libceph: osd1 up Dec 9 15:44:45 sepia18 kernel: [ 127.043585] libceph: osd1 weight 0x10000 (in) Dec 9 15:44:45 sepia18 kernel: [ 127.059822] libceph: osd0 10.3.14.141:6800 socket closed Dec 9 15:44:45 sepia18 kernel: [ 127.073590] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 Dec 9 15:44:45 sepia18 kernel: [ 127.081594] IP: [<ffffffffa02c5a87>] try_write+0x627/0x1060 [libceph] Dec 9 15:44:45 sepia18 kernel: [ 127.083535] PGD 37b7c067 PUD 37bbc067 PMD 0 Dec 9 15:44:45 sepia18 kernel: [ 127.083535] Oops: 0000 [#1] SMP Dec 9 15:44:45 sepia18 kernel: [ 127.083535] CPU 1 Dec 9 15:44:45 sepia18 kernel: [ 127.083535] Modules linked in: cryptd aes_x86_64 aes_generic rbd libceph radeon ttm drm_kms_helper drm lp ppdev parport_pc parport shpchp i2c_algo_bit i3000_edac edac_core serio_raw ahci libahci floppy e1000e btrfs zlib_deflate crc32c libcrc32c Dec 9 15:44:45 sepia18 kernel: [ 127.083535] Dec 9 15:44:45 sepia18 kernel: [ 127.083535] Pid: 46, comm: kworker/1:2 Not tainted 3.1.0-ceph-08936-gfb6ee37 #1 Supermicro PDSMi/PDSMi+ Dec 9 15:44:45 sepia18 kernel: [ 127.083535] RIP: 0010:[<ffffffffa02c5a87>] [<ffffffffa02c5a87>] try_write+0x627/0x1060 [libceph] Dec 9 15:44:45 sepia18 kernel: [ 127.083535] RSP: 0018:ffff88010368bb00 EFLAGS: 00010246 Dec 9 15:44:45 sepia18 kernel: [ 127.083535] RAX: 0000000000000000 RBX: ffff880037be3830 RCX: 0000000000000000 Dec 9 15:44:45 sepia18 kernel: [ 127.083535] RDX: 0000000000061000 RSI: 0000000000000001 RDI: 0000000000000000 Dec 9 15:44:45 sepia18 kernel: [ 127.083535] RBP: ffff88010368bc20 R08: 0000000000000000 R09: 0000000000000000 Dec 9 15:44:45 sepia18 kernel: [ 127.083535] R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0002d6b1c0 Dec 9 15:44:45 sepia18 kernel: [ 127.083535] R13: 000000000001f000 R14: ffff880037848b00 R15: 0000000000000000 Dec 9 15:44:45 sepia18 kernel: [ 127.083535] FS: 0000000000000000(0000) GS:ffff88011fc80000(0000) knlGS:0000000000000000 Dec 9 15:44:45 sepia18 kernel: [ 127.083535] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Dec 9 15:44:45 sepia18 kernel: [ 127.083535] CR2: 0000000000000048 CR3: 0000000037af4000 CR4: 00000000000006e0 Dec 9 15:44:45 sepia18 kernel: [ 127.083535] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 9 15:44:45 sepia18 kernel: [ 127.083535] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 9 15:44:45 sepia18 kernel: [ 127.083535] Process kworker/1:2 (pid: 46, threadinfo ffff88010368a000, task ffff88010364bf60) Dec 9 15:44:45 sepia18 kernel: [ 127.260013] Stack: Dec 9 15:44:45 sepia18 kernel: [ 127.260013] 0000000100004040 ffff88010319a800 0000000000000000 0000000000000000 Dec 9 15:44:45 sepia18 kernel: [ 127.260013] ffff88010368bbd0 0000000000000000 ffff88010860be91 ffff88010860be00 Dec 9 15:44:45 sepia18 kernel: [ 127.260013] ffff880037be3c10 ffff880037be3a40 ffff880037be3870 ffff880037be3a30 Dec 9 15:44:45 sepia18 kernel: [ 127.260013] Call Trace: Dec 9 15:44:45 sepia18 kernel: [ 127.260013] [<ffffffff814e8db4>] ? kernel_recvmsg+0x44/0x60 Dec 9 15:44:45 sepia18 kernel: [ 127.260013] [<ffffffffa02c4748>] ? ceph_tcp_recvmsg+0x48/0x60 [libceph] Dec 9 15:44:45 sepia18 kernel: [ 127.260013] [<ffffffffa02c70d0>] con_work+0xc10/0x1b00 [libceph] Dec 9 15:44:45 sepia18 kernel: [ 127.260013] [<ffffffff81053978>] ? finish_task_switch+0x48/0x120 Dec 9 15:44:45 sepia18 kernel: [ 127.260013] [<ffffffff8107fcb6>] process_one_work+0x1a6/0x520 Dec 9 15:44:45 sepia18 kernel: [ 127.260013] [<ffffffff8107fc47>] ? process_one_work+0x137/0x520 Dec 9 15:44:45 sepia18 kernel: [ 127.260013] [<ffffffffa02c64c0>] ? try_write+0x1060/0x1060 [libceph] Dec 9 15:44:45 sepia18 kernel: [ 127.260013] [<ffffffff81081fe3>] worker_thread+0x173/0x400 Dec 9 15:44:45 sepia18 kernel: [ 127.260013] [<ffffffff81081e70>] ? manage_workers+0x210/0x210 Dec 9 15:44:45 sepia18 kernel: [ 127.260013] [<ffffffff81087026>] kthread+0xb6/0xc0 Dec 9 15:44:45 sepia18 kernel: [ 127.260013] [<ffffffff8109ee55>] ? trace_hardirqs_on_caller+0x105/0x190 Dec 9 15:44:45 sepia18 kernel: [ 127.260013] [<ffffffff8160dd44>] kernel_thread_helper+0x4/0x10 Dec 9 15:44:45 sepia18 kernel: [ 127.260013] [<ffffffff81603c74>] ? retint_restore_args+0x13/0x13 Dec 9 15:44:45 sepia18 kernel: [ 127.260013] [<ffffffff81086f70>] ? __init_kthread_worker+0x70/0x70 Dec 9 15:44:45 sepia18 kernel: [ 127.260013] [<ffffffff8160dd40>] ? gs_change+0x13/0x13 Dec 9 15:44:45 sepia18 kernel: [ 127.260013] Code: 7c fb ff ff 49 83 be 90 00 00 00 00 0f 84 d2 04 00 00 49 63 8e a0 00 00 00 49 8b 86 98 00 00 00 44 8b 9d 5c ff ff ff 48 c1 e1 04 Dec 9 15:44:45 sepia18 kernel: [ 127.260013] RIP [<ffffffffa02c5a87>] try_write+0x627/0x1060 [libceph] Dec 9 15:44:45 sepia18 kernel: [ 127.260013] RSP <ffff88010368bb00> Dec 9 15:44:45 sepia18 kernel: [ 127.260013] CR2: 0000000000000048 Dec 9 15:44:45 sepia18 kernel: [ 127.264188] ---[ end trace 6f04513fdb7876aa ]--- Dec 9 15:44:45 sepia18 kernel: [ 127.264332] BUG: unable to handle kernel paging request at fffffffffffffff8 Dec 9 15:44:45 sepia18 kernel: [ 127.264334] IP: [<ffffffff81086a10>] kthread_data+0x10/0x20 Dec 9 15:44:45 sepia18 kernel: [ 127.264339] PGD 1c07067 PUD 1c08067 PMD 0 Dec 9 15:44:45 sepia18 kernel: [ 127.264342] Oops: 0000 [#2] SMP Dec 9 15:44:45 sepia18 kernel: [ 127.264344] CPU 1 Dec 9 15:44:45 sepia18 kernel: [ 127.264345] Modules linked in: cryptd aes_x86_64 aes_generic rbd libceph radeon ttm drm_kms_helper drm lp ppdev parport_pc parport shpchp i2c_algo_bit i3000_edac edac_core serio_raw ahci libahci floppy e1000e btrfs zlib_deflate crc32c libcrc32c Dec 9 15:44:45 sepia18 kernel: [ 127.264359] Dec 9 15:44:45 sepia18 kernel: [ 127.264361] Pid: 46, comm: kworker/1:2 Tainted: G D 3.1.0-ceph-08936-gfb6ee37 #1 Supermicro PDSMi/PDSMi+ Dec 9 15:44:45 sepia18 kernel: [ 127.264364] RIP: 0010:[<ffffffff81086a10>] [<ffffffff81086a10>] kthread_data+0x10/0x20 Dec 9 15:44:45 sepia18 kernel: [ 127.264368] RSP: 0018:ffff88010368b6f8 EFLAGS: 00010092 Dec 9 15:44:45 sepia18 kernel: [ 127.264370] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001 Dec 9 15:44:45 sepia18 kernel: [ 127.264372] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88010364bf60 Dec 9 15:44:45 sepia18 kernel: [ 127.264374] RBP: ffff88010368b6f8 R08: 0000000000989680 R09: 0000000000000001 Dec 9 15:44:45 sepia18 kernel: [ 127.264376] R10: 0000000000000400 R11: 0000000000000000 R12: ffff88010364c300 Dec 9 15:44:45 sepia18 kernel: [ 127.264378] R13: 0000000000000001 R14: 0000000000000001 R15: ffff88010368b820 Dec 9 15:44:45 sepia18 kernel: [ 127.264381] FS: 0000000000000000(0000) GS:ffff88011fc80000(0000) knlGS:0000000000000000 Dec 9 15:44:45 sepia18 kernel: [ 127.264383] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Dec 9 15:44:45 sepia18 kernel: [ 127.264385] CR2: fffffffffffffff8 CR3: 0000000037af4000 CR4: 00000000000006e0 Dec 9 15:44:45 sepia18 kernel: [ 127.264387] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 9 15:44:45 sepia18 kernel: [ 127.264389] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 9 15:44:45 sepia18 kernel: [ 127.264391] Process kworker/1:2 (pid: 46, threadinfo ffff88010368a000, task ffff88010364bf60) Dec 9 15:44:45 sepia18 kernel: [ 127.264393] Stack: Dec 9 15:44:45 sepia18 kernel: [ 127.264394] ffff88010368b718 ffffffff8107f0c5 ffff88010368b718 ffff88011fc93b80 Dec 9 15:44:45 sepia18 kernel: [ 127.264398] ffff88010368b7b8 ffffffff81600763 0000000000000000 ffffffff81066e81 Dec 9 15:44:45 sepia18 kernel: [ 127.264401] ffff88010364bf60 0000000000013b80 ffff88010368bfd8 ffff88010368a010 Dec 9 15:44:45 sepia18 kernel: [ 127.264404] Call Trace: Dec 9 15:44:45 sepia18 kernel: [ 127.264407] [<ffffffff8107f0c5>] wq_worker_sleeping+0x15/0xa0 Dec 9 15:44:45 sepia18 kernel: [ 127.264410] [<ffffffff81600763>] __schedule+0x5c3/0x940 Dec 9 15:44:45 sepia18 kernel: [ 127.264414] [<ffffffff81066e81>] ? do_exit+0x551/0x880 Dec 9 15:44:45 sepia18 kernel: [ 127.264416] [<ffffffff81065d7d>] ? release_task+0x1d/0x470 Dec 9 15:44:45 sepia18 kernel: [ 127.264419] [<ffffffff81600e0f>] schedule+0x3f/0x60 Dec 9 15:44:45 sepia18 kernel: [ 127.264421] [<ffffffff81066ef7>] do_exit+0x5c7/0x880 Dec 9 15:44:45 sepia18 kernel: [ 127.264424] [<ffffffff81063b05>] ? kmsg_dump+0x75/0x140 Dec 9 15:44:45 sepia18 kernel: [ 127.264427] [<ffffffff81604ac0>] oops_end+0xb0/0xf0 Dec 9 15:44:45 sepia18 kernel: [ 127.264431] [<ffffffff8103f85d>] no_context+0xfd/0x270 Dec 9 15:44:45 sepia18 kernel: [ 127.264434] [<ffffffff814edfd5>] ? release_sock+0x35/0x190 Dec 9 15:44:45 sepia18 kernel: [ 127.264437] [<ffffffff8103fb15>] __bad_area_nosemaphore+0x145/0x230 Dec 9 15:44:45 sepia18 kernel: [ 127.264440] [<ffffffff810509a3>] ? get_parent_ip+0x33/0x50 Dec 9 15:44:45 sepia18 kernel: [ 127.264443] [<ffffffff8103fc13>] bad_area_nosemaphore+0x13/0x20 Dec 9 15:44:45 sepia18 kernel: [ 127.264446] [<ffffffff8160763e>] do_page_fault+0x34e/0x4b0 Dec 9 15:44:45 sepia18 kernel: [ 127.264449] [<ffffffff814ee0e4>] ? release_sock+0x144/0x190 Dec 9 15:44:45 sepia18 kernel: [ 127.264453] [<ffffffff815429e5>] ? tcp_sendpage+0xc5/0x590 Dec 9 15:44:45 sepia18 kernel: [ 127.264457] [<ffffffff81313fbd>] ? trace_hardirqs_off_thunk+0x3a/0x3c Dec 9 15:44:45 sepia18 kernel: [ 127.264460] [<ffffffff81603ef5>] page_fault+0x25/0x30 Dec 9 15:44:45 sepia18 kernel: [ 127.264467] [<ffffffffa02c5a87>] ? try_write+0x627/0x1060 [libceph] Dec 9 15:44:45 sepia18 kernel: [ 127.264470] [<ffffffff814e8db4>] ? kernel_recvmsg+0x44/0x60 Dec 9 15:44:45 sepia18 kernel: [ 127.264476] [<ffffffffa02c4748>] ? ceph_tcp_recvmsg+0x48/0x60 [libceph] Dec 9 15:44:45 sepia18 kernel: [ 127.264483] [<ffffffffa02c70d0>] con_work+0xc10/0x1b00 [libceph] Dec 9 15:44:45 sepia18 kernel: [ 127.264486] [<ffffffff81053978>] ? finish_task_switch+0x48/0x120 Dec 9 15:44:45 sepia18 kernel: [ 127.264489] [<ffffffff8107fcb6>] process_one_work+0x1a6/0x520 Dec 9 15:44:45 sepia18 kernel: [ 127.264491] [<ffffffff8107fc47>] ? process_one_work+0x137/0x520 Dec 9 15:44:45 sepia18 kernel: [ 127.264498] [<ffffffffa02c64c0>] ? try_write+0x1060/0x1060 [libceph] Dec 9 15:44:45 sepia18 kernel: [ 127.264501] [<ffffffff81081fe3>] worker_thread+0x173/0x400 Dec 9 15:44:45 sepia18 kernel: [ 127.264504] [<ffffffff81081e70>] ? manage_workers+0x210/0x210 Dec 9 15:44:45 sepia18 kernel: [ 127.264506] [<ffffffff81087026>] kthread+0xb6/0xc0 Dec 9 15:44:45 sepia18 kernel: [ 127.264510] [<ffffffff8109ee55>] ? trace_hardirqs_on_caller+0x105/0x190 Dec 9 15:44:45 sepia18 kernel: [ 127.264513] [<ffffffff8160dd44>] kernel_thread_helper+0x4/0x10 Dec 9 15:44:45 sepia18 kernel: [ 127.264516] [<ffffffff81603c74>] ? retint_restore_args+0x13/0x13 Dec 9 15:44:45 sepia18 kernel: [ 127.264519] [<ffffffff81086f70>] ? __init_kthread_worker+0x70/0x70 Dec 9 15:44:45 sepia18 kernel: [ 127.264522] [<ffffffff8160dd40>] ? gs_change+0x13/0x13 Dec 9 15:44:45 sepia18 kernel: [ 127.264523] Code: 66 66 66 90 65 48 8b 04 25 40 c4 00 00 48 8b 80 48 03 00 00 8b 40 f0 c9 c3 66 90 55 48 89 e5 66 66 66 66 90 48 8b 87 48 03 00 00 Dec 9 15:44:45 sepia18 kernel: [ 127.264546] RIP [<ffffffff81086a10>] kthread_data+0x10/0x20 Dec 9 15:44:45 sepia18 kernel: [ 127.264549] RSP <ffff88010368b6f8> Dec 9 15:44:45 sepia18 kernel: [ 127.264551] CR2: fffffffffffffff8 Dec 9 15:44:45 sepia18 kernel: [ 127.264553] ---[ end trace 6f04513fdb7876ab ]--- Dec 9 15:44:45 sepia18 kernel: [ 127.264554] Fixing recursive fault but reboot is needed!
Updated by Sage Weil over 12 years ago
- Category set to libceph
- Priority changed from Normal to High
- Target version set to v3.2
Updated by Josh Durgin over 12 years ago
This happened again on sepia70 during the kernel untar build workunit on rbd.
Updated by Josh Durgin over 12 years ago
This happened again with the same workload on sepia81.
Updated by Sage Weil over 12 years ago
looks like a null pointer dereference.. look for a struct member that's 0x48 bytes in?
Updated by Sage Weil over 12 years ago
- Translation missing: en.field_position set to 1
Updated by Sage Weil over 12 years ago
- Target version set to v3.3
- Translation missing: en.field_position deleted (
2) - Translation missing: en.field_position set to 699
Updated by Sage Weil over 12 years ago
- Subject changed from spinlock lockup to NULL pointer dereference at try_write+0x627/0x1060
Updated by Alex Elder about 12 years ago
Is there a core file for this problem anywhere?
It would really be nice to poke around in the message, or the
connection structure, or basically anything. I was unable to
reproduce the problem before when trying to repeatedly run
the kernel_untar_build workunit, which was mentioned in bug
1866 as what was running when it happened there.
Updated by Sage Weil about 12 years ago
Hmm.. yeah, I don't think we have anything beyond these console dumps. And we don't capture any kind of kernel core dumps (can/should we?).
Actually, I don't think we've seen this in more than a month. :/
Updated by Sage Weil about 12 years ago
- Status changed from New to Need More Info
Updated by Alex Elder about 12 years ago
Bugs 1793 and 2081 have a signature of a page fault/bad memory reference
from process_one_work() -> con_work(), and they may well be the same issue.
So I have been treating them as such, and have been attempting to reproduce
them by running the kernel_untar_build.sh workunit repeatedly.
The last day or two I have been testing with the latest version of the
ceph-client code (c666601a93). One time I hit a kernel failure that had a
signature different from either 1793 or 2081--it reported:
INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 4, t=492980 jiffies)
I believe this is bug 2174, and am adding an update to that to record
this occurrence.
I restarted testing and completed 50 more iterations of kernel_untar_build.sh
without error. I will try a bit more, I think we should keep these two bugs
on a sort of inactive status (1793 is already "Need More Info" but 2081 was
not until now) until we start seeing occurrences again. A kernel core dump
would also be helpful, once we get that in place (see bug 2127).
Updated by Alex Elder about 12 years ago
Another 100 iterations of kernel_untar_build.sh using the current
master branch (c666601a935b94cc0f3310339411b6940de751ba) without
error. I'm going to quit testing this now and wait to see if
our regular automated testing produces it again.
Updated by Alex Elder about 12 years ago
- Status changed from Need More Info to Can't reproduce
Marking this Can't Reproduce. Will reopen if it shows up again.