Project

General

Profile

Actions

Bug #742

closed

ceph-client.git unstable , tiotest ,printk "timeout, ceph: mds0 caps stale",then oops

Added by changping Wu over 13 years ago. Updated about 13 years ago.

Status:
Won't Fix
Priority:
High
Assignee:
Category:
fs/ceph
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

hi ,
i git ceph-client.git ,checkout to unstable.
build it at ubuntu 10.04 ,make install,reboot,
then run as linux-2.6.37+ kernel.
system infos:
1.ubuntu 10.04 server + linux-2.6.37+(ceph-client.git unstable);
2.cpu intel(R) Core(TM)2 Uuad CPU Q8400 @2.66HGz
3.mem 4Gb
4.SATA disk:WDC WD3200AAKS-7 Rev:02.0
5. one mon ,one mds,two osd

6. run tiotest example shell script: =======================================
#!/bin/bash

i=0
while [ $i -le 100 ]
do
./tiotest -f 10 -t 30 -d /mnt/ceph -T -D 20 -r 100
let i+=1
done
exit0

============
7.logs:

run tiotest example ,for several minutes,
printk "timeout ,reset osd" ,
then printk "mds0 caps stale" ,then a OOPS
maybe

at __cap_is_valid(struct ceph_cap *cap),exist a invalid spinlock.

The detail logs attached.


[ 1475.050194] libceph: tid 57429 timed out on osd1, will reset osd
[ 1548.990035] ceph: mds0 caps stale
[ 1558.349986] INFO: rcu_sched_state detected stall on CPU 1 (t=6500 jiffies)


[ 1346.330300] libceph: tid 50475 timed out on osd0, will reset osd
[ 1366.400033] handle_timeout:timeout
[ 1366.403740] libceph: tid 49617 timed out on osd0, will reset osd
[ 1412.640037] handle_timeout:timeout
[ 1434.920041] handle_timeout:timeout
[ 1454.960029] handle_timeout:timeout
[ 1454.963664] libceph: tid 55571 timed out on osd1, will reset osd
[ 1454.970235] libceph: tid 55675 timed out on osd0, will reset osd
[ 1475.040045] handle_timeout:timeout
[ 1475.043719] libceph: tid 57408 timed out on osd0, will reset osd
[ 1475.050194] libceph: tid 57429 timed out on osd1, will reset osd
[ 1548.990035] ceph: mds0 caps stale
[ 1558.349986] INFO: rcu_sched_state detected stall on CPU 1 (t=6500 jiffies)
[ 1558.350008] sending NMI to all CPUs:
[ 1558.350008] NMI backtrace for cpu 1
[ 1558.350008] CPU 1
[ 1558.350008] Modules linked in: ceph libceph crc32c libcrc32c fbcon tileblit font bitblit softcursor i915 snd_hda_codec_analog drm_kms_helper drm snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd i2c_algo_bit i2c_core ppdev psmouse intel_agp video soundcore lp intel_gtt snd_page_alloc parport_pc shpchp output dell_wmi parport dcdbas serio_raw sparse_keymap usbhid r8169 mii hid ata_piix [last unloaded: libceph]
[ 1558.350008]
[ 1558.350008] Pid: 2783, comm: cosd Not tainted 2.6.37+ #1 0V4W66/OptiPlex 780
[ 1558.350008] RIP: 0010:[<ffffffff812b76a7>] [<ffffffff812b76a7>] delay_tsc+0x27/0x80
[ 1558.350008] RSP: 0018:ffff8800dc403cd8 EFLAGS: 00000006
[ 1558.350008] RAX: 000003cf49baea28 RBX: 0000000000000001 RCX: 0000000049baea28
[ 1558.350008] RDX: 00000000000003cf RSI: 0000000000000000 RDI: 00000000002896a5
[ 1558.350008] RBP: ffff8800dc403cf8 R08: 0000000000000000 R09: 0000000000000001
[ 1558.350008] R10: ffffffff8163b960 R11: 0000000000000001 R12: ffffffff81a21b80
[ 1558.350008] R13: 0000000000000001 R14: 00000000002896a5 R15: ffffffff8109c120
[ 1558.350008] FS: 00007f28daace700(0000) GS:ffff8800dc400000(0000) knlGS:0000000000000000
[ 1558.350008] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1558.350008] CR2: 000000000f153000 CR3: 000000010ba17000 CR4: 00000000000406e0
[ 1558.350008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1558.350008] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1558.350008] Process cosd (pid: 2783, threadinfo ffff8801118de000, task ffff88010b598000)
[ 1558.350008] Stack:
[ 1558.350008] 0000000000000001 ffffffff81a21b80 ffffffff81a21b80 ffff8800dc403e88
[ 1558.350008] ffff8800dc403d08 ffffffff812b7677 ffff8800dc403d28 ffffffff8102e895
[ 1558.350008] 0000000000000000 ffff8800dc5d0360 ffff8800dc403d78 ffffffff810e2de9
[ 1558.350008] Call Trace:
[ 1558.350008] <IRQ>
[ 1558.350008] [<ffffffff812b7677>] __const_udelay+0x47/0x50
[ 1558.350008] [<ffffffff8102e895>] arch_trigger_all_cpu_backtrace+0x55/0x70
[ 1558.350008] [<ffffffff810e2de9>] __rcu_pending+0x169/0x3b0
[ 1558.350008] [<ffffffff8109c120>] ? tick_sched_timer+0x0/0xc0
[ 1558.350008] [<ffffffff810e309c>] rcu_check_callbacks+0x6c/0x120
[ 1558.350008] [<ffffffff81076998>] update_process_times+0x48/0x90
[ 1558.350008] [<ffffffff8109c186>] tick_sched_timer+0x66/0xc0
[ 1558.350008] [<ffffffff8108f5d3>] __run_hrtimer+0x93/0x200
[ 1558.350008] [<ffffffff8108f9de>] hrtimer_interrupt+0xde/0x250
[ 1558.350008] [<ffffffff81091ebf>] ? local_clock+0x6f/0x80
[ 1558.350008] [<ffffffff810351d7>] hpet_interrupt_handler+0x17/0x40
[ 1558.350008] [<ffffffff810dcf05>] handle_IRQ_event+0x55/0x190
[ 1558.350008] [<ffffffff812c94fe>] ? do_raw_spin_unlock+0x5e/0xb0
[ 1558.350008] [<ffffffff810df636>] handle_edge_irq+0xd6/0x180
[ 1558.350008] [<ffffffff8100eef9>] handle_irq+0x49/0xa0
[ 1558.350008] [<ffffffff81595948>] do_IRQ+0x68/0xf0
[ 1558.350008] [<ffffffff8158db53>] ret_from_intr+0x0/0x16
[ 1558.350008] <EOI>
[ 1558.350008] [<ffffffff810a4a9c>] ? lock_acquire+0xcc/0x150
[ 1558.350008] [<ffffffffa03ecfb0>] ? __cap_is_valid+0x30/0xf0 [ceph]
[ 1558.350008] [<ffffffff8158cca6>] _raw_spin_lock+0x36/0x70
[ 1558.350008] [<ffffffffa03ecfb0>] ? __cap_is_valid+0x30/0xf0 [ceph]
[ 1558.350008] [<ffffffffa03ecfb0>] __cap_is_valid+0x30/0xf0 [ceph]
[ 1558.350008] [<ffffffffa03edcdc>] __ceph_caps_issued+0x5c/0x100 [ceph]
[ 1558.350008] [<ffffffffa03f05e6>] ceph_check_caps+0x146/0xc60 [ceph]
[ 1558.350008] [<ffffffff81013ae3>] ? native_sched_clock+0x13/0x60
[ 1558.350008] [<ffffffff81013949>] ? sched_clock+0x9/0x10
[ 1558.350008] [<ffffffff81013ae3>] ? native_sched_clock+0x13/0x60
[ 1558.350008] [<ffffffff81013949>] ? sched_clock+0x9/0x10
[ 1558.350008] [<ffffffff81091cd5>] ? sched_clock_local+0x25/0x90
[ 1558.350008] [<ffffffffa03f12d2>] ceph_flush_dirty_caps+0xd2/0x240 [ceph]
[ 1558.350008] [<ffffffff81197dc0>] ? sync_one_sb+0x0/0x30
[ 1558.350008] [<ffffffffa03dbb6f>] ceph_sync_fs+0x2f/0x160 [ceph]
[ 1558.350008] [<ffffffff81197d83>] __sync_filesystem+0x63/0xa0
[ 1558.350008] [<ffffffff81197de0>] sync_one_sb+0x20/0x30
[ 1558.350008] [<ffffffff8116f4b7>] iterate_supers+0x77/0xe0
[ 1558.350008] [<ffffffff81197e1f>] sys_sync+0x2f/0x70
[ 1558.350008] [<ffffffff8100c082>] system_call_fastpath+0x16/0x1b
[ 1558.350008] Code: 00 00 00 00 55 48 89 e5 41 56 41 55 41 54 53 0f 1f 44 00 00 65 44 8b 2c 25 40 dc 00 00 49 89 fe 66 66 90 0f ae e8 e8 c9 ce d5 ff <66> 90 4c 63 e0 eb 11 66 90 f3 90 65 8b 1c 25 40 dc 00 00 44 39
[ 1558.350008] Call Trace:


Files

ceph-client.git_tiotest_logs (90 KB) ceph-client.git_tiotest_logs changping Wu, 01/25/2011 11:47 PM
mds_caps_stale_logs (14.2 KB) mds_caps_stale_logs changping Wu, 01/28/2011 12:27 AM
ceph.conf (2.38 KB) ceph.conf changping Wu, 02/09/2011 07:55 PM
filetype (490 Bytes) filetype changping Wu, 02/09/2011 07:55 PM
tiotest_log_20110210 (113 KB) tiotest_log_20110210 changping Wu, 02/09/2011 07:55 PM
cephs (434 Bytes) cephs ceph -s changping Wu, 02/09/2011 07:55 PM
Actions

Also available in: Atom PDF