Project

General

Profile

Actions

Bug #750

closed

run "dd", printk " libceph: osd1 172.16.10.68:6805 socket closed"

Added by changping Wu about 13 years ago. Updated about 13 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Hi
i git ceph-client-standalone.git master-backport
build and insmod it.
ceph server: ceph 0.24.2

OS:linux-2.6.37+ (ceph-client.git unstable)
disk: SATA disk,
ceph config: one mon ,one mds,two osd ,at a machine.

run :
dd if=/dev/zero of=/mnt/ceph/dddd bs=1M count=2048

sometimes ,printk "

[ 1205.636693] libceph: osd1 172.16.10.68:6805 socket closed
"
then handle the timeout req failly.


Files

ceph.conf (2.38 KB) ceph.conf changping Wu, 02/09/2011 07:49 PM
dd_log_20110210 (65.5 KB) dd_log_20110210 changping Wu, 02/09/2011 07:49 PM
filetype (490 Bytes) filetype changping Wu, 02/09/2011 07:49 PM
Actions #1

Updated by Sage Weil about 13 years ago

From the error is sounds like the OSDs are down. Can you include 'ceph -s' output?

Actions #2

Updated by changping Wu about 13 years ago

I try to reproduce this issue.
but ceph-client.git unstable + ceph 0.24.2 ,
one mon ,one mds ,two osd at the same host machine.
run

dd if=/dev/zero of=/mnt/ceph/dd bs=1M count=4096
printk the logs:

=====================

[ 715.520033] handle_timeout:timeout
[ 715.523435] libceph: tid 684 timed out on osd1, will reset osd
[ 735.600038] handle_timeout:timeout
[ 735.603443] libceph: tid 876 timed out on osd0, will reset osd
[ 735.609381] libceph: tid 844 timed out on osd1, will reset osd
[ 755.680032] handle_timeout:timeout
[ 755.683438] libceph: tid 909 timed out on osd1, will reset osd
[ 755.689368] libceph: tid 954 timed out on osd0, will reset osd
[ 775.760024] handle_timeout:timeout
[ 775.763423] libceph: tid 1014 timed out on osd0, will reset osd
[ 775.769425] libceph: tid 1004 timed out on osd1, will reset osd
[ 800.960048] handle_timeout:timeout
[ 800.963458] libceph: tid 1061 timed out on osd0, will reset osd
[ 821.040041] handle_timeout:timeout
[ 841.120047] handle_timeout:timeout
[ 841.123452] libceph: tid 1251 timed out on osd0, will reset osd
[ 841.129472] libceph: tid 1292 timed out on osd1, will reset osd
[ 847.320046] INFO: task cosd:1210 blocked for more than 120 seconds.
[ 847.326306] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 847.334165] cosd D 00000001000096b1 4688 1210 1 0x00000000
[ 847.341073] ffff880110a91c18 0000000000000046 ffff880110a91b68 ffffffff00000000
[ 847.348492] 00000000001d4140 ffff88010c632310 ffff88010c6326a8 ffff880110a91fd8
[ 847.355949] ffff88010c6326b0 00000000001d4140 ffff880110a90010 00000000001d4140
[ 847.363405] Call Trace:
[ 847.365848] [<ffffffff815aab65>] schedule_timeout+0x245/0x320
[ 847.371708] [<ffffffff8108fc38>] ? sched_clock_cpu+0xb8/0x110
[ 847.377527] [<ffffffff815a9b69>] ? wait_for_common+0x119/0x190
[ 847.383471] [<ffffffff815a9b71>] wait_for_common+0x121/0x190
[ 847.389204] [<ffffffff810584d0>] ? default_wake_function+0x0/0x20
[ 847.395448] [<ffffffff815adab4>] ? _raw_spin_unlock_bh+0x34/0x40
[ 847.401559] [<ffffffff815a9cbd>] wait_for_completion+0x1d/0x20
[ 847.407468] [<ffffffff81187ebd>] sync_inodes_sb+0x13d/0x260
[ 847.413159] [<ffffffff815a9a99>] ? wait_for_common+0x49/0x190
[ 847.419824] [<ffffffff8118d260>] ? sync_one_sb+0x0/0x30
[ 847.426016] [<ffffffff8118d248>] __sync_filesystem+0x88/0xa0
[ 847.432689] [<ffffffff8118d280>] sync_one_sb+0x20/0x30
[ 847.438773] [<ffffffff811656e7>] iterate_supers+0x77/0xe0
[ 847.445178] [<ffffffff8118d2d5>] sys_sync+0x45/0x70
[ 847.451053] [<ffffffff8100c0c2>] system_call_fastpath+0x16/0x1b
[ 847.457947] 1 lock held by cosd/1210:
[ 847.462531] #0: (&type->s_umount_key#21){......}, at: [<ffffffff811656d7>] iterate_supers+0x67/0xe0
[ 847.472744] INFO: task cosd:1267 blocked for more than 120 seconds.
[ 847.479909] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 847.488717] cosd D 000000010000935a 4688 1267 1 0x00000000
[ 847.496558] ffff8801115f7c18 0000000000000046 ffff8801115f7b68 ffffffff00000000
[ 847.504956] 00000000001d4140 ffff88010b3cc620 ffff88010b3cc9b8 ffff8801115f7fd8
[ 847.513375] ffff88010b3cc9c0 00000000001d4140 ffff8801115f6010 00000000001d4140
[ 847.522038] Call Trace:
[ 847.525517] [<ffffffff815aab65>] schedule_timeout+0x245/0x320
[ 847.532429] [<ffffffff8108fc38>] ? sched_clock_cpu+0xb8/0x110
[ 847.539200] [<ffffffff815a9b69>] ? wait_for_common+0x119/0x190
[ 847.546119] [<ffffffff815a9b71>] wait_for_common+0x121/0x190
[ 847.552842] [<ffffffff810584d0>] ? default_wake_function+0x0/0x20
[ 847.559960] [<ffffffff815adab4>] ? _raw_spin_unlock_bh+0x34/0x40
[ 847.567035] [<ffffffff815a9cbd>] wait_for_completion+0x1d/0x20
[ 847.573935] [<ffffffff81187ebd>] sync_inodes_sb+0x13d/0x260
[ 847.580583] [<ffffffff815a9a99>] ? wait_for_common+0x49/0x190
[ 847.587345] [<ffffffff8118d260>] ? sync_one_sb+0x0/0x30
[ 847.593626] [<ffffffff8118d248>] __sync_filesystem+0x88/0xa0
[ 847.600326] [<ffffffff8118d280>] sync_one_sb+0x20/0x30
[ 847.606449] [<ffffffff811656e7>] iterate_supers+0x77/0xe0
[ 847.612873] [<ffffffff8118d2d5>] sys_sync+0x45/0x70
[ 847.618736] [<ffffffff8100c0c2>] system_call_fastpath+0x16/0x1b
[ 847.625675] 1 lock held by cosd/1267:
[ 847.630258] #0: (&type->s_umount_key#21){......}, at: [<ffffffff811656d7>] iterate_supers+0x67/0xe0
[ 861.200032] handle_timeout:timeout
[ 861.204384] libceph: tid 1350 timed out on osd0, will reset osd
[ 881.280023] handle_timeout:timeout

Actions #3

Updated by changping Wu about 13 years ago

ceph client driver is from ceph-client.git unstable

Actions #4

Updated by Sage Weil about 13 years ago

Jeff Wu wrote:

ceph client driver is from ceph-client.git unstable

There are some error handling fixes recently pushed to -standalone.git that may address this. Can you confirm whether you've tried the latest? (commit:64a4571cc1a04fee7f6a16d40a40e4664dd15278)

Updated by changping Wu about 13 years ago

hi ,
1.
i git ceph-client ,checkout unstable (commit 9c01177349b435186025a088f612a6f5ce2f3de9)

make menuconfig
make all
make modules_install
make install
reboot

2. git ceph-client-standalone, master (commit 64a4571cc1a04fee7f6a16d40a40e4664dd15278)
cd ceph-client-standalone
make -C ./libceph
cp ./libceph/Modules.* ./ceph
make -C ./ceph
modprobe libcrc32c
insmod ./libceph/libceph.ko
insmod ./ceph/ceph.ko

=====
mount.ceph 172.16.10.68:6789:/ /mnt/ceph -o mount_timeout=10
file type:ext4
run
"dd if=/dev/zero of=/mnt/ceph/dddd bs=1M count=2048 (or 4096)"

Now ,it can't be reproduced "libceph: osd1 172.16.10.68:6805 socket closed" issue.

but printk :

[ 268.385392] CE: hpet3 increased min_delta_ns to 11250 nsec
[ 300.237899] dd used greatest stack depth: 3488 bytes left
[ 607.310049] INFO: task cosd:1194 blocked for more than 120 seconds.
[ 607.317397] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 607.326329] cosd D 0000000100001af4 4688 1194 1 0x00000000
[ 607.334281] ffff88010c94bc18 0000000000000046 ffff88010c94bb68 ffffffff00000000
[ 607.342792] 00000000001d4140 ffff88010c5ba310 ffff88010c5ba6a8 ffff88010c94bfd8
[ 607.351307] ffff88010c5ba6b0 00000000001d4140 ffff88010c94a010 00000000001d4140
[ 607.359783] Call Trace:
[ 607.363354] [<ffffffff815aab65>] schedule_timeout+0x245/0x320
[ 607.370285] [<ffffffff8108fc38>] ? sched_clock_cpu+0xb8/0x110
[ 607.377140] [<ffffffff815a9b69>] ? wait_for_common+0x119/0x190
[ 607.384103] [<ffffffff815a9b71>] wait_for_common+0x121/0x190
[ 607.390861] [<ffffffff810584d0>] ? default_wake_function+0x0/0x20
[ 607.398006] [<ffffffff815adab4>] ? _raw_spin_unlock_bh+0x34/0x40
[ 607.405100] [<ffffffff815a9cbd>] wait_for_completion+0x1d/0x20
[ 607.411995] [<ffffffff81187ebd>] sync_inodes_sb+0x13d/0x260
[ 607.418589] [<ffffffff815a9a99>] ? wait_for_common+0x49/0x190
[ 607.425408] [<ffffffff8118d260>] ? sync_one_sb+0x0/0x30
[ 607.431682] [<ffffffff8118d248>] __sync_filesystem+0x88/0xa0
[ 607.438343] [<ffffffff8118d280>] sync_one_sb+0x20/0x30
[ 607.444517] [<ffffffff811656e7>] iterate_supers+0x77/0xe0
[ 607.450936] [<ffffffff8118d2d5>] sys_sync+0x45/0x70
[ 607.456789] [<ffffffff8100c0c2>] system_call_fastpath+0x16/0x1b

the detail infos be attached.

Actions #6

Updated by Greg Farnum about 13 years ago

  • Status changed from New to Won't Fix

I imagine this has the same cause as #742. Please re-open if you manage to reproduce while using btrfs for the cosd store, or while running the kernel client on a machine that isn't serving as an OSD.

Actions

Also available in: Atom PDF