Project

General

Profile

Actions

Bug #4706

closed

kclient: Oops when two clients concurrently write a file

Added by Zheng Yan about 11 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
kceph
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

[  229.868015] Modules linked in: netconsole ceph libceph libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle bridge lockd sunrpc bnep bluetooth stp llc rfkill be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi virtio_net pcspkr microcode virtio_balloon uinput virtio_blk
[  229.868015] CPU 1 
[  229.868015] Pid: 50, comm: kworker/1:2 Tainted: G      D      3.8.0+ #1 Bochs Bochs
[  229.868015] RIP: 0010:[<ffffffff81084540>]  [<ffffffff81084540>] kthread_data+0x10/0x20
[  229.868015] RSP: 0018:ffff88003711b528  EFLAGS: 00010092
[  229.868015] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 000000000000000e
[  229.868015] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff880037120000
[  229.868015] RBP: ffff88003711b528 R08: ffff880037120070 R09: 0000000000000000
[  229.868015] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88003fd14800
[  229.868015] R13: 0000000000000001 R14: ffff88003711fff0 R15: ffff880037120000
[  229.868015] FS:  0000000000000000(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000
[  229.868015] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  229.868015] CR2: ffffffffffffffa8 CR3: 000000003ce33000 CR4: 00000000000006e0
[  229.868015] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  229.868015] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  229.868015] Process kworker/1:2 (pid: 50, threadinfo ffff88003711a000, task ffff880037120000)
[  229.868015] Stack:
[  229.868015]  ffff88003711b548 ffffffff8107f375 ffff88003711b548 ffff8800371203d0
[  229.868015]  ffff88003711b5b8 ffffffff81677802 ffff880037120000 ffff88003711bfd8
[  229.868015]  ffff88003711bfd8 ffff88003711bfd8 ffff88003711b340 ffff880037120000
[  229.868015] Call Trace:
[  229.868015]  [<ffffffff8107f375>] wq_worker_sleeping+0x15/0xc0
[  229.868015]  [<ffffffff81677802>] __schedule+0x5e2/0x800
[  229.868015]  [<ffffffff81677d49>] schedule+0x29/0x70
[  229.868015]  [<ffffffff810649f2>] do_exit+0x6a2/0x9f0
[  229.868015]  [<ffffffff8167a8ed>] oops_end+0x9d/0xe0
[  229.868015]  [<ffffffff8166d4e6>] no_context+0x253/0x27e
[  229.868015]  [<ffffffff81312962>] ? put_dec+0x72/0x90
[  229.868015]  [<ffffffff8166d6dc>] __bad_area_nosemaphore+0x1cb/0x1ea
[  229.868015]  [<ffffffff8166d70e>] bad_area_nosemaphore+0x13/0x15
[  229.868015]  [<ffffffff8167d70e>] __do_page_fault+0x36e/0x500
[  229.868015]  [<ffffffff81314c94>] ? vsnprintf+0x354/0x640
[  229.868015]  [<ffffffff81314fc0>] ? sprintf+0x40/0x50
[  229.868015]  [<ffffffff8167d8ae>] do_page_fault+0xe/0x10
[  229.868015]  [<ffffffff8167d025>] do_async_page_fault+0x35/0x90
[  229.868015]  [<ffffffff81679c78>] async_page_fault+0x28/0x30
[  229.868015]  [<ffffffff810c1bd1>] ? __lock_acquire+0x61/0x1dc0
[  229.868015]  [<ffffffff8166dc0b>] ? printk+0x61/0x63
[  229.868015]  [<ffffffff810c3ef1>] lock_acquire+0xa1/0x120
[  229.868015]  [<ffffffffa03260ff>] ? sync_write_commit+0x4f/0xb0 [ceph]
[  229.868015]  [<ffffffff81678c81>] _raw_spin_lock+0x31/0x40
[  229.868015]  [<ffffffffa03260ff>] ? sync_write_commit+0x4f/0xb0 [ceph]
[  229.868015]  [<ffffffffa03260ff>] sync_write_commit+0x4f/0xb0 [ceph]
[  229.868015]  [<ffffffffa02e0a81>] complete_request+0x21/0x40 [libceph]
[  229.868015]  [<ffffffffa02e5364>] dispatch+0x6b4/0x920 [libceph]
[  229.868015]  [<ffffffff81676b1b>] ? __mutex_unlock_slowpath+0xdb/0x170
[  229.868015]  [<ffffffffa02dbbb8>] con_work+0x1428/0x2e00 [libceph]
[  229.868015]  [<ffffffff810c1f8a>] ? __lock_acquire+0x41a/0x1dc0
[  229.868015]  [<ffffffff8107c25b>] ? process_one_work+0x13b/0x550
[  229.868015]  [<ffffffff8107c2c1>] process_one_work+0x1a1/0x550
[  229.868015]  [<ffffffff8107c25b>] ? process_one_work+0x13b/0x550
[  229.868015]  [<ffffffffa02da790>] ? ceph_con_close+0xd0/0xd0 [libceph]
[  229.868015]  [<ffffffff8107ea9e>] worker_thread+0x15e/0x440
[  229.868015]  [<ffffffff8107e940>] ? busy_worker_rebind_fn+0x100/0x100
[  229.868015]  [<ffffffff810843fa>] kthread+0xea/0xf0
[  229.868015]  [<ffffffff81084310>] ? flush_kthread_work+0x1b0/0x1b0
[  229.868015]  [<ffffffff81681f6c>] ret_from_fork+0x7c/0xb0
[  229.868015]  [<ffffffff81084310>] ? flush_kthread_work+0x1b0/0x1b0

Got above Oops when doing concurrent write with current "testing" branch.
When two clients write data to a file at the same time, they do sync write
even the file is not opened in sync mode. I think the issue is new, it can
be reproduced by running following command on two kclients.

dd if=/dev/zero bs=4k conv=notrunc of=test1

Files

patch (3.12 KB) patch Zheng Yan, 04/11/2013 09:23 AM

Related issues 1 (0 open1 closed)

Related to CephFS - Bug #4679: ceph: hang while running blogbench on mira nodesResolvedAlex Elder04/08/2013

Actions
Actions

Also available in: Atom PDF