Bug #674
closedtiobench stress test , OSD timeout
0%
Description
Hi,
we do multi-thread stress test for ceph 0.23.1 , ceph client printk osd timeout.
1. test tool: tiobench-0.3.3 ,test command:
./tiobench-0.3.3/tiotest -t 4 -f 4096 -r 1000 -b 131072 -d /mnt/ceph -T
2. ceph server version: 0.23-1
3. ceph client: git from git://ceph.newdeam.net/git/ceph-client-standalone.git, unstable-backport.
4. ceph server hosts OS: ubuntu 10.04 server x86_64 ,kernel:2.6.32-21-server
5. ceph client host OS:ubuntu 10.04 server x86_64,kernel: 2.6.32-21-server
6. ceph config: ceph-none, two OSD server:osd0 and osd1 , one MDS server and one MON server
===========================================
1. tiobench log:
===================================================================================
Run #1: ./tiobench-0.3.3/tiotest -t 4 -f 4096 -r 1000 -b 131072 -d /mnt/ceph -T
Unit information
================
File size = megabytes
Blk Size = bytes
Rate = megabytes per second
CPU% = percentage of CPU used during the test
Latency = milliseconds
Lat% = percent of requests that took longer than X seconds
CPU Eff = Rate divided by CPU% - throughput per cpu load
File Blk Num Avg Maximum Lat% Lat% CPU
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----
File Blk Num Avg Maximum Lat% Lat% CPU
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----
File Blk Num Avg Maximum Lat% Lat% CPU
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----
File Blk Num Avg Maximum Lat% Lat% CPU
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----
File Blk Num Avg Maximum Lat% Lat% CPU
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----
File Blk Num Avg Maximum Lat% Lat% CPU
Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff
---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----
Sequential Reads
2.6.32-21-server 16384 4096 4 66.62 18.81% 0.227 251.11 0.00000 0.00000 354
2.6.32-21-server 16384 8192 4 65.77 16.16% 0.452 458.29 0.00000 0.00000 407
2.6.32-21-server 16384 16384 4 63.92 16.11% 0.968 559.74 0.00000 0.00000 397
2.6.32-21-server 16384 32768 4 64.61 14.39% 1.815 1046.43 0.00000 0.00000 449
2.6.32-21-server 16384 65536 4 66.26 15.32% 3.765 837.34 0.00000 0.00000 432
2.6.32-21-server 16384 13107 4 64.53 14.43% 7.566 1349.89 0.00000 0.00000 447
Random Reads
2.6.32-21-server 16384 4096 4 0.75 1.108% 20.405 106.68 0.00000 0.00000 68
2.6.32-21-server 16384 8192 4 1.66 0.797% 17.845 109.55 0.00000 0.00000 208
2.6.32-21-server 16384 16384 4 2.90 1.812% 20.977 333.91 0.00000 0.00000 160
2.6.32-21-server 16384 32768 4 6.25 1.849% 18.955 279.49 0.00000 0.00000 338
2.6.32-21-server 16384 65536 4 11.87 3.702% 20.782 439.90 0.00000 0.00000 321
2.6.32-21-server 16384 13107 4 22.14 6.199% 22.256 367.77 0.00000 0.00000 357
Sequential Writes
2.6.32-21-server 16384 4096 4 17.18 7.038% 0.862 14455.88 0.01473 0.00026 244
2.6.32-21-server 16384 8192 4 16.09 5.656% 1.831 18047.37 0.02990 0.00076 284
2.6.32-21-server 16384 16384 4 16.19 5.111% 3.671 13165.22 0.06485 0.00057 317
2.6.32-21-server 16384 32768 4 13.84 4.163% 8.497 24246.45 0.14687 0.00782 332
2.6.32-21-server 16384 65536 4 13.27 3.808% 17.584 24053.48 0.29259 0.01717 348
2.6.32-21-server 16384 13107 4 13.45 3.814% 35.383 23944.59 0.62485 0.03281 353
Random Writes
2.6.32-21-server 16384 4096 4 0.60 0.880% 0.220 330.88 0.00000 0.00000 68
2.6.32-21-server 16384 8192 4 1.00 0.669% 0.945 3338.01 0.02500 0.00000 149
2.6.32-21-server 16384 16384 4 1.41 0.699% 0.853 2497.44 0.02500 0.00000 202
2.6.32-21-server 16384 32768 4 2.61 0.814% 0.264 493.69 0.00000 0.00000 321
2.6.32-21-server 16384 65536 4 0.77 0.353% 1.176 1520.99 0.00000 0.00000 217
2.6.32-21-server 16384 13107 4 1.44 0.610% 0.838 1001.35 0.00000 0.00000 236
2. dmesg log:
========================================================
..............................................
..............................................
[75004.800179] INFO: task tiotest:2337 blocked for more than 120 seconds.
[75004.800213] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[75004.800240] tiotest D 00000000ffffffff 0 2337 2180 0x00000008
[75004.800243] ffff880116257cc8 0000000000000082 0000000000015bc0 0000000000015bc0
[75004.800246] ffff880115825f80 ffff880116257fd8 0000000000015bc0 ffff880115825bc0
[75004.800249] 0000000000015bc0 ffff880116257fd8 0000000000015bc0 ffff880115825f80
[75004.800252] Call Trace:
[75004.800254] [<ffffffff810f3460>] ? sync_page+0x0/0x50
[75004.800257] [<ffffffff815555c7>] io_schedule+0x47/0x70
[75004.800259] [<ffffffff810f349d>] sync_page+0x3d/0x50
[75004.800262] [<ffffffff81555bef>] __wait_on_bit+0x5f/0x90
[75004.800264] [<ffffffff810f3653>] wait_on_page_bit+0x73/0x80
[75004.800267] [<ffffffff81084fe0>] ? wake_bit_function+0x0/0x40
[75004.800269] [<ffffffff810fd9f5>] ? pagevec_lookup_tag+0x25/0x40
[75004.800272] [<ffffffff810f3ae5>] wait_on_page_writeback_range+0xf5/0x190
[75004.800275] [<ffffffff810f3cb8>] filemap_write_and_wait_range+0x78/0x90
[75004.800277] [<ffffffff8116a62e>] vfs_fsync_range+0x7e/0xe0
[75004.800280] [<ffffffff8116a6fd>] vfs_fsync+0x1d/0x20
[75004.800282] [<ffffffff8116a73e>] do_fsync+0x3e/0x60
[75004.800284] [<ffffffff8116a790>] sys_fsync+0x10/0x20
[75004.800286] [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b
[75058.860051] libceph: tid 232244 timed out on osd0, will reset osd
[75058.861555] libceph: tid 232902 timed out on osd1, will reset osd
[75118.860035] libceph: tid 232596 timed out on osd0, will reset osd
[75118.860925] libceph: tid 233897 timed out on osd1, will reset osd
[75178.860021] libceph: tid 233467 timed out on osd0, will reset osd
[75846.350031] libceph: tid 280625 timed out on osd0, will reset osd
[75911.350030] libceph: tid 280786 timed out on osd0, will reset osd
[75986.350025] libceph: tid 281143 timed out on osd0, will reset osd
[76046.350037] libceph: tid 281253 timed out on osd0, will reset osd
[76106.350039] libceph: tid 281459 timed out on osd0, will reset osd
[76166.350038] libceph: tid 281661 timed out on osd0, will reset osd
[76226.350038] libceph: tid 281859 timed out on osd0, will reset osd
[76286.360026] libceph: tid 282082 timed out on osd0, will reset osd
[76346.360029] libceph: tid 282286 timed out on osd0, will reset osd
[76406.360025] libceph: tid 282478 timed out on osd0, will reset osd
[76466.360033] libceph: tid 282702 timed out on osd0, will reset osd
[76526.360029] libceph: tid 282899 timed out on osd0, will reset osd
[76586.360035] libceph: tid 283090 timed out on osd0, will reset osd
[76646.360033] libceph: tid 283304 timed out on osd0, will reset osd
[76754.770199] libceph: tid 283819 timed out on osd0, will reset osd
[76754.773797] libceph: tid 284035 timed out on osd1, will reset osd
[76814.770080] libceph: tid 284624 timed out on osd0, will reset osd
[76814.772596] libceph: tid 285401 timed out on osd1, will reset osd
[76874.770052] libceph: tid 285191 timed out on osd0, will reset osd
[76874.771749] libceph: tid 286271 timed out on osd1, will reset osd
[76934.770035] libceph: tid 285730 timed out on osd0, will reset osd
[76934.771057] libceph: tid 287155 timed out on osd1, will reset osd
[76994.770023] libceph: tid 286576 timed out on osd0, will reset osd
[77612.270030] libceph: tid 328771 timed out on osd0, will reset osd
-==========
1. test tool attached, test command,for instance: ./tiobench.sh /mnt/ceph 4096
Files
Updated by changping Wu over 13 years ago
printk many timed out messages, then ,tiobench test fail.
Updated by changping Wu over 13 years ago
git the lastest ceph-client-standalone:master-backport:
run:
dd if=/dev/zero of=/mnt/ceph bs=4k count=500000
also pritk osd timeout ,logs:
=================================
dmesg:
...........................
............................
[118468.643817] PM: resume of drv:usbhid dev:3-1:1.0 complete after 1403.326 msecs
[118468.643825] PM: resume of drv: dev:ep_81 complete after 1108.365 msecs
[118472.630009] ata1: link is slow to respond, please be patient (ready=0)
[118472.690015] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[118472.697432] ata1.00: configured for UDMA/133
[118472.701101] PM: resume of drv:sd dev:0:0:0:0 complete after 5460.546 msecs
[118472.701110] PM: resume of drv:scsi_device dev:0:0:0:0 complete after 5460.546 msecs
[118472.701114] PM: resume of drv:scsi_disk dev:0:0:0:0 complete after 4057.260 msecs
[118472.701157] PM: resume of devices complete after 5464.718 msecs
[118472.701342] PM: resume devices took 5.470 seconds
[118472.701370] PM: Finishing wakeup.
[118472.701371] Restarting tasks ... done.
[118472.812045] e1000e 0000:00:19.0: irq 48 for MSI/MSI-X
[118472.871916] e1000e 0000:00:19.0: irq 48 for MSI/MSI-X
[118472.872179] ADDRCONF: eth0: link is not ready
[118474.441709] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
[118474.441713] e1000e 0000:00:19.0: eth0: 10/100 speed: disabling TSO
[118474.441959] ADDRCONF: eth0: link becomes ready
[118483.876611] EXT4-fs (sda5): re-mounted. Opts: errors=remount-ro,user_xattr,commit=0
[118484.890013] eth0: no IPv6 routers present
[118487.501593] EXT4-fs (sda6): re-mounted. Opts: user_xattr,commit=0
[118489.511361] libceph: mon0 172.16.10.63:6789 socket closed
[118489.557216] libceph: mon0 172.16.10.63:6789 session lost, hunting for new mon
[118489.954784] libceph: mon0 172.16.10.63:6789 session established
[118500.474077] ceph: mds0 reconnect start
[118500.564930] ceph: mds0 reconnect success
[118547.953053] ceph: mds0 recovery completed
[119165.763499] npviewer.bin29917: segfault at 418 ip 00000000f60b4b66 sp 00000000ffe47768 error 6 in libflashplayer.so[f5e66000+b2c000]
[122995.993102] libceph: loaded (mon/osd proto 15/24, osdmap 5/5 5/5)
[123002.540923] ceph: loaded (mds proto 32)
[123061.089543] libceph: client4120 fsid 12d0d091-c0f9-10a3-d4d5-6e858826cdb6
[123061.089662] libceph: mon0 172.16.10.63:6789 session established
[123196.780685] libceph: tid 9 timed out on osd0, will reset osd
[123196.781112] libceph: tid 22 timed out on osd1, will reset osd
[123230.961895] ceph: mds0 caps stale
[123245.961887] ceph: mds0 caps stale
[123256.910053] libceph: tid 75 timed out on osd0, will reset osd
[123256.910409] libceph: tid 81 timed out on osd1, will reset osd
[123317.030666] libceph: tid 206 timed out on osd0, will reset osd
[123317.030955] libceph: tid 81 timed out on osd1, will reset osd
[123377.150044] libceph: tid 206 timed out on osd0, will reset osd
[123377.150331] libceph: tid 81 timed out on osd1, will reset osd
[123437.270668] libceph: tid 206 timed out on osd0, will reset osd
[123437.270957] libceph: tid 81 timed out on osd1, will reset osd
[123484.390044] INFO: task cosd:18013 blocked for more than 120 seconds.
[123484.443872] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[123484.510054] cosd D 0000000000000002 0 18013 1 0x00000000
[123484.510059] ffff8800780afc18 0000000000000086 ffff8800780afb88 0000000000015980
[123484.510063] ffff8800780affd8 0000000000015980 ffff8800780affd8 ffff880014a12dc0
[123484.510066] 0000000000015980 0000000000015980 ffff8800780affd8 0000000000015980
[123484.510069] Call Trace:
[123484.510076] [<ffffffff81101600>] ? sync_page+0x0/0x50
[123484.510080] [<ffffffff81587023>] io_schedule+0x73/0xc0
[123484.510082] [<ffffffff8110163d>] sync_page+0x3d/0x50
[123484.510084] [<ffffffff815878ff>] __wait_on_bit+0x5f/0x90
[123484.510087] [<ffffffff811017f3>] wait_on_page_bit+0x73/0x80
[123484.510090] [<ffffffff8107f650>] ? wake_bit_function+0x0/0x40
[123484.510093] [<ffffffff8110bee5>] ? pagevec_lookup_tag+0x25/0x40
[123484.510095] [<ffffffff81101cbd>] filemap_fdatawait_range+0x10d/0x1a0
[123484.510098] [<ffffffff81101d7b>] filemap_fdatawait+0x2b/0x30
[123484.510102] [<ffffffff81175a61>] wait_sb_inodes+0xa1/0x100
[123484.510104] [<ffffffff81175b51>] sync_inodes_sb+0x91/0xc0
[123484.510107] [<ffffffff8117a660>] ? sync_one_sb+0x0/0x30
[123484.510110] [<ffffffff8117a648>] __sync_filesystem+0x88/0xa0
[123484.510112] [<ffffffff8117a680>] sync_one_sb+0x20/0x30
[123484.510115] [<ffffffff81154f5b>] iterate_supers+0x8b/0xd0
[123484.510117] [<ffffffff8117a6d5>] sys_sync+0x45/0x70
[123484.510121] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
[123484.510127] INFO: task cosd:18063 blocked for more than 120 seconds.
[123484.563950] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[123484.630127] cosd D 0000000100bba75b 0 18063 1 0x00000000
[123484.630130] ffff8800783fbc18 0000000000000082 ffff880000000000 0000000000015980
[123484.630134] ffff8800783fbfd8 0000000000015980 ffff8800783fbfd8 ffff880051b28000
[123484.630137] 0000000000015980 0000000000015980 ffff8800783fbfd8 0000000000015980
[123484.630140] Call Trace:
[123484.630144] [<ffffffff81101600>] ? sync_page+0x0/0x50
[123484.630146] [<ffffffff81587023>] io_schedule+0x73/0xc0
[123484.630149] [<ffffffff8110163d>] sync_page+0x3d/0x50
[123484.630151] [<ffffffff815878ff>] __wait_on_bit+0x5f/0x90
[123484.630153] [<ffffffff811017f3>] wait_on_page_bit+0x73/0x80
[123484.630156] [<ffffffff8107f650>] ? wake_bit_function+0x0/0x40
[123484.630159] [<ffffffff8110bee5>] ? pagevec_lookup_tag+0x25/0x40
[123484.630162] [<ffffffff81101cbd>] filemap_fdatawait_range+0x10d/0x1a0
[123484.630164] [<ffffffff81101d7b>] filemap_fdatawait+0x2b/0x30
[123484.630168] [<ffffffff81175a61>] wait_sb_inodes+0xa1/0x100
[123484.630171] [<ffffffff81175b51>] sync_inodes_sb+0x91/0xc0
[123484.630173] [<ffffffff8117a660>] ? sync_one_sb+0x0/0x30
[123484.630176] [<ffffffff8117a648>] __sync_filesystem+0x88/0xa0
[123484.630178] [<ffffffff8117a680>] sync_one_sb+0x20/0x30
[123484.630181] [<ffffffff81154f5b>] iterate_supers+0x8b/0xd0
[123484.630183] [<ffffffff8117a6d5>] sys_sync+0x45/0x70
[123484.630187] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
[123497.390665] libceph: tid 206 timed out on osd0, will reset osd
[123497.390954] libceph: tid 81 timed out on osd1, will reset osd
[123557.515370] libceph: tid 206 timed out on osd0, will reset osd
[123557.515626] libceph: tid 81 timed out on osd1, will reset osd
[123604.620049] INFO: task cosd:18013 blocked for more than 120 seconds.
[123604.673879] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[123604.740066] cosd D 0000000000000002 0 18013 1 0x00000000
[123604.740070] ffff8800780afc18 0000000000000086 ffff8800780afb88 0000000000015980
[123604.740073] ffff8800780affd8 0000000000015980 ffff8800780affd8 ffff880014a12dc0
[123604.740076] 0000000000015980 0000000000015980 ffff8800780affd8 0000000000015980
[123604.740079] Call Trace:
[123604.740087] [<ffffffff81101600>] ? sync_page+0x0/0x50
[123604.740091] [<ffffffff81587023>] io_schedule+0x73/0xc0
[123604.740094] [<ffffffff8110163d>] sync_page+0x3d/0x50
[123604.740096] [<ffffffff815878ff>] __wait_on_bit+0x5f/0x90
[123604.740098] [<ffffffff811017f3>] wait_on_page_bit+0x73/0x80
[123604.740102] [<ffffffff8107f650>] ? wake_bit_function+0x0/0x40
[123604.740105] [<ffffffff8110bee5>] ? pagevec_lookup_tag+0x25/0x40
[123604.740107] [<ffffffff81101cbd>] filemap_fdatawait_range+0x10d/0x1a0
[123604.740110] [<ffffffff81101d7b>] filemap_fdatawait+0x2b/0x30
[123604.740114] [<ffffffff81175a61>] wait_sb_inodes+0xa1/0x100
[123604.740116] [<ffffffff81175b51>] sync_inodes_sb+0x91/0xc0
[123604.740119] [<ffffffff8117a660>] ? sync_one_sb+0x0/0x30
[123604.740121] [<ffffffff8117a648>] __sync_filesystem+0x88/0xa0
[123604.740124] [<ffffffff8117a680>] sync_one_sb+0x20/0x30
[123604.740127] [<ffffffff81154f5b>] iterate_supers+0x8b/0xd0
[123604.740129] [<ffffffff8117a6d5>] sys_sync+0x45/0x70
[123604.740133] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
[123604.740138] INFO: task cosd:18063 blocked for more than 120 seconds.
[123604.793982] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[123604.860197] cosd D 0000000100bba75b 0 18063 1 0x00000000
[123604.860200] ffff8800783fbc18 0000000000000082 ffff880000000000 0000000000015980
[123604.860204] ffff8800783fbfd8 0000000000015980 ffff8800783fbfd8 ffff880051b28000
[123604.860207] 0000000000015980 0000000000015980 ffff8800783fbfd8 0000000000015980
[123604.860210] Call Trace:
[123604.860214] [<ffffffff81101600>] ? sync_page+0x0/0x50
[123604.860216] [<ffffffff81587023>] io_schedule+0x73/0xc0
[123604.860218] [<ffffffff8110163d>] sync_page+0x3d/0x50
[123604.860220] [<ffffffff815878ff>] __wait_on_bit+0x5f/0x90
[123604.860223] [<ffffffff811017f3>] wait_on_page_bit+0x73/0x80
[123604.860225] [<ffffffff8107f650>] ? wake_bit_function+0x0/0x40
[123604.860228] [<ffffffff8110bee5>] ? pagevec_lookup_tag+0x25/0x40
[123604.860230] [<ffffffff81101cbd>] filemap_fdatawait_range+0x10d/0x1a0
[123604.860233] [<ffffffff81101d7b>] filemap_fdatawait+0x2b/0x30
[123604.860236] [<ffffffff81175a61>] wait_sb_inodes+0xa1/0x100
[123604.860239] [<ffffffff81175b51>] sync_inodes_sb+0x91/0xc0
[123604.860242] [<ffffffff8117a660>] ? sync_one_sb+0x0/0x30
[123604.860244] [<ffffffff8117a648>] __sync_filesystem+0x88/0xa0
[123604.860246] [<ffffffff8117a680>] sync_one_sb+0x20/0x30
[123604.860248] [<ffffffff81154f5b>] iterate_supers+0x8b/0xd0
[123604.860251] [<ffffffff8117a6d5>] sys_sync+0x45/0x70
[123604.860253] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
[123617.630046] libceph: tid 206 timed out on osd0, will reset osd
[123617.630333] libceph: tid 81 timed out on osd1, will reset osd
[123677.760665] libceph: tid 206 timed out on osd0, will reset osd
[123677.760953] libceph: tid 81 timed out on osd1, will reset osd
[123724.860048] INFO: task cosd:18013 blocked for more than 120 seconds.
[123724.913869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[123724.980045] cosd D 0000000000000002 0 18013 1 0x00000000
[123724.980049] ffff8800780afc18 0000000000000086 ffff8800780afb88 0000000000015980
[123724.980052] ffff8800780affd8 0000000000015980 ffff8800780affd8 ffff880014a12dc0
[123724.980056] 0000000000015980 0000000000015980 ffff8800780affd8 0000000000015980
[123724.980059] Call Trace:
[123724.980065] [<ffffffff81101600>] ? sync_page+0x0/0x50
[123724.980069] [<ffffffff81587023>] io_schedule+0x73/0xc0
[123724.980072] [<ffffffff8110163d>] sync_page+0x3d/0x50
[123724.980074] [<ffffffff815878ff>] __wait_on_bit+0x5f/0x90
[123724.980076] [<ffffffff811017f3>] wait_on_page_bit+0x73/0x80
[123724.980079] [<ffffffff8107f650>] ? wake_bit_function+0x0/0x40
[123724.980082] [<ffffffff8110bee5>] ? pagevec_lookup_tag+0x25/0x40
[123724.980085] [<ffffffff81101cbd>] filemap_fdatawait_range+0x10d/0x1a0
[123724.980088] [<ffffffff81101d7b>] filemap_fdatawait+0x2b/0x30
[123724.980091] [<ffffffff81175a61>] wait_sb_inodes+0xa1/0x100
[123724.980094] [<ffffffff81175b51>] sync_inodes_sb+0x91/0xc0
[123724.980096] [<ffffffff8117a660>] ? sync_one_sb+0x0/0x30
[123724.980099] [<ffffffff8117a648>] __sync_filesystem+0x88/0xa0
[123724.980101] [<ffffffff8117a680>] sync_one_sb+0x20/0x30
[123724.980104] [<ffffffff81154f5b>] iterate_supers+0x8b/0xd0
[123724.980106] [<ffffffff8117a6d5>] sys_sync+0x45/0x70
[123724.980110] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
[123724.980116] INFO: task cosd:18063 blocked for more than 120 seconds.
[123725.033949] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[123725.100154] cosd D 0000000100bba75b 0 18063 1 0x00000000
[123725.100158] ffff8800783fbc18 0000000000000082 ffff880000000000 0000000000015980
[123725.100161] ffff8800783fbfd8 0000000000015980 ffff8800783fbfd8 ffff880051b28000
[123725.100164] 0000000000015980 0000000000015980 ffff8800783fbfd8 0000000000015980
[123725.100167] Call Trace:
[123725.100171] [<ffffffff81101600>] ? sync_page+0x0/0x50
[123725.100173] [<ffffffff81587023>] io_schedule+0x73/0xc0
[123725.100176] [<ffffffff8110163d>] sync_page+0x3d/0x50
[123725.100178] [<ffffffff815878ff>] __wait_on_bit+0x5f/0x90
[123725.100180] [<ffffffff811017f3>] wait_on_page_bit+0x73/0x80
[123725.100183] [<ffffffff8107f650>] ? wake_bit_function+0x0/0x40
[123725.100185] [<ffffffff8110bee5>] ? pagevec_lookup_tag+0x25/0x40
[123725.100188] [<ffffffff81101cbd>] filemap_fdatawait_range+0x10d/0x1a0
[123725.100191] [<ffffffff81101d7b>] filemap_fdatawait+0x2b/0x30
[123725.100194] [<ffffffff81175a61>] wait_sb_inodes+0xa1/0x100
[123725.100197] [<ffffffff81175b51>] sync_inodes_sb+0x91/0xc0
[123725.100200] [<ffffffff8117a660>] ? sync_one_sb+0x0/0x30
[123725.100202] [<ffffffff8117a648>] __sync_filesystem+0x88/0xa0
[123725.100204] [<ffffffff8117a680>] sync_one_sb+0x20/0x30
[123725.100207] [<ffffffff81154f5b>] iterate_supers+0x8b/0xd0
[123725.100210] [<ffffffff8117a6d5>] sys_sync+0x45/0x70
[123725.100213] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
[123737.870668] libceph: tid 206 timed out on osd0, will reset osd
[123737.870956] libceph: tid 81 timed out on osd1, will reset osd
[123797.990667] libceph: tid 206 timed out on osd0, will reset osd
[123797.990955] libceph: tid 81 timed out on osd1, will reset osd
[123845.100049] INFO: task cosd:18013 blocked for more than 120 seconds.
[123845.153892] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[123845.220093] cosd D 0000000000000002 0 18013 1 0x00000000
[123845.220098] ffff8800780afc18 0000000000000086 ffff8800780afb88 0000000000015980
[123845.220102] ffff8800780affd8 0000000000015980 ffff8800780affd8 ffff880014a12dc0
[123845.220105] 0000000000015980 0000000000015980 ffff8800780affd8 0000000000015980
[123845.220108] Call Trace:
[123845.220115] [<ffffffff81101600>] ? sync_page+0x0/0x50
[123845.220119] [<ffffffff81587023>] io_schedule+0x73/0xc0
[123845.220121] [<ffffffff8110163d>] sync_page+0x3d/0x50
[123845.220124] [<ffffffff815878ff>] __wait_on_bit+0x5f/0x90
[123845.220126] [<ffffffff811017f3>] wait_on_page_bit+0x73/0x80
[123845.220130] [<ffffffff8107f650>] ? wake_bit_function+0x0/0x40
[123845.220133] [<ffffffff8110bee5>] ? pagevec_lookup_tag+0x25/0x40
[123845.220135] [<ffffffff81101cbd>] filemap_fdatawait_range+0x10d/0x1a0
[123845.220138] [<ffffffff81101d7b>] filemap_fdatawait+0x2b/0x30
[123845.220142] [<ffffffff81175a61>] wait_sb_inodes+0xa1/0x100
[123845.220144] [<ffffffff81175b51>] sync_inodes_sb+0x91/0xc0
[123845.220147] [<ffffffff8117a660>] ? sync_one_sb+0x0/0x30
[123845.220149] [<ffffffff8117a648>] __sync_filesystem+0x88/0xa0
[123845.220152] [<ffffffff8117a680>] sync_one_sb+0x20/0x30
[123845.220155] [<ffffffff81154f5b>] iterate_supers+0x8b/0xd0
[123845.220157] [<ffffffff8117a6d5>] sys_sync+0x45/0x70
[123845.220161] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
[123845.220168] INFO: task cosd:18063 blocked for more than 120 seconds.
[123845.274010] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[123845.340222] cosd D 0000000100bba75b 0 18063 1 0x00000000
[123845.340228] ffff8800783fbc18 0000000000000082 ffff880000000000 0000000000015980
[123845.340231] ffff8800783fbfd8 0000000000015980 ffff8800783fbfd8 ffff880051b28000
[123845.340234] 0000000000015980 0000000000015980 ffff8800783fbfd8 0000000000015980
[123845.340238] Call Trace:
[123845.340246] [<ffffffff81101600>] ? sync_page+0x0/0x50
[123845.340251] [<ffffffff81587023>] io_schedule+0x73/0xc0
[123845.340254] [<ffffffff8110163d>] sync_page+0x3d/0x50
[123845.340256] [<ffffffff815878ff>] __wait_on_bit+0x5f/0x90
[123845.340259] [<ffffffff811017f3>] wait_on_page_bit+0x73/0x80
[123845.340262] [<ffffffff8107f650>] ? wake_bit_function+0x0/0x40
[123845.340266] [<ffffffff8110bee5>] ? pagevec_lookup_tag+0x25/0x40
[123845.340268] [<ffffffff81101cbd>] filemap_fdatawait_range+0x10d/0x1a0
[123845.340271] [<ffffffff81101d7b>] filemap_fdatawait+0x2b/0x30
[123845.340275] [<ffffffff81175a61>] wait_sb_inodes+0xa1/0x100
[123845.340278] [<ffffffff81175b51>] sync_inodes_sb+0x91/0xc0
[123845.340281] [<ffffffff8117a660>] ? sync_one_sb+0x0/0x30
[123845.340283] [<ffffffff8117a648>] __sync_filesystem+0x88/0xa0
[123845.340285] [<ffffffff8117a680>] sync_one_sb+0x20/0x30
[123845.340288] [<ffffffff81154f5b>] iterate_supers+0x8b/0xd0
[123845.340291] [<ffffffff8117a6d5>] sys_sync+0x45/0x70
[123845.340295] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
[123858.110668] libceph: tid 206 timed out on osd0, will reset osd
[123858.110956] libceph: tid 81 timed out on osd1, will reset osd
[123918.230673] libceph: tid 206 timed out on osd0, will reset osd
[123918.230962] libceph: tid 81 timed out on osd1, will reset osd
[123965.340053] INFO: task cosd:18013 blocked for more than 120 seconds.
[123965.393886] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[123965.460087] cosd D 0000000000000002 0 18013 1 0x00000000
[123965.460094] ffff8800780afc18 0000000000000086 ffff8800780afb88 0000000000015980
[123965.460098] ffff8800780affd8 0000000000015980 ffff8800780affd8 ffff880014a12dc0
[123965.460101] 0000000000015980 0000000000015980 ffff8800780affd8 0000000000015980
[123965.460104] Call Trace:
[123965.460112] [<ffffffff81101600>] ? sync_page+0x0/0x50
[123965.460116] [<ffffffff81587023>] io_schedule+0x73/0xc0
[123965.460119] [<ffffffff8110163d>] sync_page+0x3d/0x50
[123965.460121] [<ffffffff815878ff>] __wait_on_bit+0x5f/0x90
[123965.460123] [<ffffffff811017f3>] wait_on_page_bit+0x73/0x80
[123965.460127] [<ffffffff8107f650>] ? wake_bit_function+0x0/0x40
[123965.460130] [<ffffffff8110bee5>] ? pagevec_lookup_tag+0x25/0x40
[123965.460133] [<ffffffff81101cbd>] filemap_fdatawait_range+0x10d/0x1a0
[123965.460135] [<ffffffff81101d7b>] filemap_fdatawait+0x2b/0x30
[123965.460144] [<ffffffff81175a61>] wait_sb_inodes+0xa1/0x100
[123965.460147] [<ffffffff81175b51>] sync_inodes_sb+0x91/0xc0
[123965.460150] [<ffffffff8117a660>] ? sync_one_sb+0x0/0x30
[123965.460152] [<ffffffff8117a648>] __sync_filesystem+0x88/0xa0
[123965.460154] [<ffffffff8117a680>] sync_one_sb+0x20/0x30
[123965.460157] [<ffffffff81154f5b>] iterate_supers+0x8b/0xd0
[123965.460159] [<ffffffff8117a6d5>] sys_sync+0x45/0x70
[123965.460163] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
[123965.460170] INFO: task cosd:18063 blocked for more than 120 seconds.
[123965.513993] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[123965.580202] cosd D 0000000100bba75b 0 18063 1 0x00000000
[123965.580207] ffff8800783fbc18 0000000000000082 ffff880000000000 0000000000015980
[123965.580210] ffff8800783fbfd8 0000000000015980 ffff8800783fbfd8 ffff880051b28000
[123965.580213] 0000000000015980 0000000000015980 ffff8800783fbfd8 0000000000015980
[123965.580216] Call Trace:
[123965.580223] [<ffffffff81101600>] ? sync_page+0x0/0x50
[123965.580226] [<ffffffff81587023>] io_schedule+0x73/0xc0
[123965.580228] [<ffffffff8110163d>] sync_page+0x3d/0x50
[123965.580230] [<ffffffff815878ff>] __wait_on_bit+0x5f/0x90
[123965.580232] [<ffffffff811017f3>] wait_on_page_bit+0x73/0x80
[123965.580236] [<ffffffff8107f650>] ? wake_bit_function+0x0/0x40
[123965.580239] [<ffffffff8110bee5>] ? pagevec_lookup_tag+0x25/0x40
[123965.580241] [<ffffffff81101cbd>] filemap_fdatawait_range+0x10d/0x1a0
[123965.580244] [<ffffffff81101d7b>] filemap_fdatawait+0x2b/0x30
[123965.580247] [<ffffffff81175a61>] wait_sb_inodes+0xa1/0x100
[123965.580250] [<ffffffff81175b51>] sync_inodes_sb+0x91/0xc0
[123965.580253] [<ffffffff8117a660>] ? sync_one_sb+0x0/0x30
[123965.580255] [<ffffffff8117a648>] __sync_filesystem+0x88/0xa0
[123965.580257] [<ffffffff8117a680>] sync_one_sb+0x20/0x30
[123965.580260] [<ffffffff81154f5b>] iterate_supers+0x8b/0xd0
[123965.580262] [<ffffffff8117a6d5>] sys_sync+0x45/0x70
[123965.580266] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
[123978.350667] libceph: tid 206 timed out on osd0, will reset osd
[123978.350961] libceph: tid 81 timed out on osd1, will reset osd
[124038.470668] libceph: tid 206 timed out on osd0, will reset osd
[124038.470956] libceph: tid 81 timed out on osd1, will reset osd
jeff@cephhost:~/source/git-src/gitosis-admin$
Updated by Sage Weil over 13 years ago
are the cosd instances running? what does 'ceph -s' show? it looks like it's just blocking because the osds aren't reachable..
Updated by changping Wu over 13 years ago
Hi ,
ceph 0.24, git ceph-client-standalone:master-backport.
It's very easy to reproduce this issue with tiotest.
example.sh:
#!/bin/bash
#while echo `date`; do ./tiotest -f 400 -t 20 -d /dev/$1 -R -T -c -D 20 -r 2000;done
while echo `date`; do ./tiotest -f 20 -t 10 -d $1 -T -c -D 20 -r 1000;done
------------------
with the logs, cosd instances are running,but osd0 should be down.
tiotest tool and log are attached.
jeff@cephhost:/usr/local/etc/ceph$ sudo ceph -s
2011-01-04 10:15:46.238974 pg v94: 528 pgs: 528 active+clean; 395 MB data, 44324 MB used, 62595 MB / 110 GB avail
2011-01-04 10:15:46.239897 mds e5: 1/1/1 up {0=up:active}, 1 up:standby
2011-01-04 10:15:46.239923 osd e8: 2 osds: 1 up, 2 in
2011-01-04 10:15:46.239963 log 2011-01-04 10:12:35.032217 mon0 172.16.10.63:6789/0 9 : [INF] osd0 172.16.10.63:6802/9022 failed (by osd1 172.16.10.63:6805/9082)
2011-01-04 10:15:46.240003 mon e1: 1 mons at {0=172.16.10.63:6789/0}
jeff@cephhost:/usr/local/etc/ceph$ ps -ef | grep cosd
root 9023 1 3 10:03 ? 00:00:24 /usr/local/bin/cosd -i 0 -c ceph.conf
root 9083 1 3 10:03 ? 00:00:27 /usr/local/bin/cosd -i 1 -c ceph.conf
jeff 9575 3866 0 10:15 pts/1 00:00:00 grep --color=auto cosd
jeff@cephhost:/usr/local/etc/ceph$
jeff@cephhost:/usr/local/etc/ceph$ sudo ceph -s
2011-01-04 10:18:58.982886 pg v95: 528 pgs: 528 active+clean; 395 MB data, 22164 MB used, 31295 MB / 56320 MB avail
2011-01-04 10:18:58.983853 mds e5: 1/1/1 up {0=up:active}, 1 up:standby
2011-01-04 10:18:58.983879 osd e9: 2 osds: 1 up, 1 in
2011-01-04 10:18:58.983913 log 2011-01-04 10:17:36.663650 mon0 172.16.10.63:6789/0 10 : [INF] osd0 out (down for 301.311964)
2011-01-04 10:18:58.983949 mon e1: 1 mons at {0=172.16.10.63:6789/0}
======================================
jeff@cephhost:/usr/local/etc/ceph$ sudo tailf /var/log/messages
[sudo] password for jeff:
Jan 4 09:35:28 cephhost rsyslogd: [origin software="rsyslogd" swVersion="4.2.0" x-pid="851" x-info="http://www.rsyslog.com"] rsyslogd was HUPed, type 'lightweight'.
Jan 4 10:03:54 cephhost kernel: [ 3220.198539] libceph: loaded (mon/osd proto 15/24, osdmap 5/5 5/5)
Jan 4 10:04:00 cephhost kernel: [ 3226.543170] ceph: loaded (mds proto 32)
Jan 4 10:05:25 cephhost kernel: [ 3311.217016] libceph: client4101 fsid 20a2bbea-d2b7-fc70-e375-43c84f817bc9
Jan 4 10:05:25 cephhost kernel: [ 3311.217122] libceph: mon0 172.16.10.63:6789 session established
Jan 4 10:12:35 cephhost kernel: [ 3741.050294] libceph: osd0 down
Jan 4 10:13:36 cephhost kernel: [ 3801.930665] libceph: tid 17654 timed out on osd1, will reset osd
Jan 4 10:14:18 cephhost kernel: [ 3844.374119] cosd D 0000000000000000 0 9042 1 0x00000000
Jan 4 10:14:18 cephhost kernel: [ 3844.374122] ffff88002f355ce8 0000000000000082 ffff88002f355cd8 0000000000015980
Jan 4 10:14:18 cephhost kernel: [ 3844.374126] ffff88002f355fd8 0000000000015980 ffff88002f355fd8 ffff88002f345b80
Jan 4 10:14:18 cephhost kernel: [ 3844.374129] 0000000000015980 0000000000015980 ffff88002f355fd8 0000000000015980
Jan 4 10:14:18 cephhost kernel: [ 3844.374132] Call Trace:
Jan 4 10:14:18 cephhost kernel: [ 3844.374139] [<ffffffff8158760d>] schedule_timeout+0x22d/0x310
Jan 4 10:14:18 cephhost kernel: [ 3844.374143] [<ffffffff8104ed12>] ? enqueue_entity+0x132/0x1b0
Jan 4 10:14:18 cephhost kernel: [ 3844.374145] [<ffffffff8104f1d5>] ? enqueue_task_fair+0x45/0x90
Jan 4 10:14:18 cephhost kernel: [ 3844.374148] [<ffffffff81052bd7>] ? enqueue_task+0x77/0x90
Jan 4 10:14:18 cephhost kernel: [ 3844.374151] [<ffffffff815871eb>] wait_for_common+0xdb/0x180
Jan 4 10:14:18 cephhost kernel: [ 3844.374154] [<ffffffff81056770>] ? default_wake_function+0x0/0x20
Jan 4 10:14:18 cephhost kernel: [ 3844.374156] [<ffffffff8158736d>] wait_for_completion+0x1d/0x20
Jan 4 10:14:18 cephhost kernel: [ 3844.374160] [<ffffffff81175713>] writeback_inodes_sb+0xb3/0xe0
Jan 4 10:14:18 cephhost kernel: [ 3844.374162] [<ffffffff8117a660>] ? sync_one_sb+0x0/0x30
Jan 4 10:14:18 cephhost kernel: [ 3844.374165] [<ffffffff8117a60e>] __sync_filesystem+0x4e/0xa0
Jan 4 10:14:18 cephhost kernel: [ 3844.374167] [<ffffffff8117a680>] sync_one_sb+0x20/0x30
Jan 4 10:14:18 cephhost kernel: [ 3844.374170] [<ffffffff81154f5b>] iterate_supers+0x8b/0xd0
Jan 4 10:14:18 cephhost kernel: [ 3844.374172] [<ffffffff8117a6bf>] sys_sync+0x2f/0x70
Jan 4 10:14:18 cephhost kernel: [ 3844.374176] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
Jan 4 10:14:18 cephhost kernel: [ 3844.388497] tiotest D 0000000000000002 0 9483 9381 0x00020000
Jan 4 10:14:18 cephhost kernel: [ 3844.388501] ffff88006175bcd8 0000000000000086 ffff88006175bc58 0000000000015980
Jan 4 10:14:18 cephhost kernel: [ 3844.388505] ffff88006175bfd8 0000000000015980 ffff88006175bfd8 ffff880062f096e0
Jan 4 10:14:18 cephhost kernel: [ 3844.388510] 0000000000015980 0000000000015980 ffff88006175bfd8 0000000000015980
Jan 4 10:14:18 cephhost kernel: [ 3844.388514] Call Trace:
Jan 4 10:14:18 cephhost kernel: [ 3844.388518] [<ffffffff81101600>] ? sync_page+0x0/0x50
Jan 4 10:14:18 cephhost kernel: [ 3844.388521] [<ffffffff81587023>] io_schedule+0x73/0xc0
Jan 4 10:14:18 cephhost kernel: [ 3844.388523] [<ffffffff8110163d>] sync_page+0x3d/0x50
Jan 4 10:14:18 cephhost kernel: [ 3844.388526] [<ffffffff815878ff>] __wait_on_bit+0x5f/0x90
Jan 4 10:14:18 cephhost kernel: [ 3844.388529] [<ffffffff811017f3>] wait_on_page_bit+0x73/0x80
Jan 4 10:14:18 cephhost kernel: [ 3844.388532] [<ffffffff8107f650>] ? wake_bit_function+0x0/0x40
Jan 4 10:14:18 cephhost kernel: [ 3844.388535] [<ffffffff8110bee5>] ? pagevec_lookup_tag+0x25/0x40
Jan 4 10:14:18 cephhost kernel: [ 3844.388538] [<ffffffff81101cbd>] filemap_fdatawait_range+0x10d/0x1a0
Jan 4 10:14:18 cephhost kernel: [ 3844.388541] [<ffffffff81101e50>] filemap_write_and_wait_range+0x70/0x80
Jan 4 10:14:18 cephhost kernel: [ 3844.388544] [<ffffffff8117a39a>] vfs_fsync_range+0x5a/0xa0
Jan 4 10:14:18 cephhost kernel: [ 3844.388547] [<ffffffff8117a44c>] vfs_fsync+0x1c/0x20
Jan 4 10:14:18 cephhost kernel: [ 3844.388550] [<ffffffff8117a48a>] do_fsync+0x3a/0x60
Jan 4 10:14:18 cephhost kernel: [ 3844.388553] [<ffffffff8117a4e0>] sys_fsync+0x10/0x20
Jan 4 10:14:18 cephhost kernel: [ 3844.388556] [<ffffffff810461a0>] sysenter_dispatch+0x7/0x2e
Jan 4 10:14:18 cephhost kernel: [ 3844.402880] tiotest D 0000000000000002 0 9484 9381 0x00020000
Jan 4 10:14:18 cephhost kernel: [ 3844.402884] ffff88005433fcd8 0000000000000086 ffff88005433fc58 0000000000015980
Jan 4 10:14:18 cephhost kernel: [ 3844.402889] ffff88005433ffd8 0000000000015980 ffff88005433ffd8 ffff880062f0adc0
Jan 4 10:14:18 cephhost kernel: [ 3844.402893] 0000000000015980 0000000000015980 ffff88005433ffd8 0000000000015980
Jan 4 10:14:18 cephhost kernel: [ 3844.402898] Call Trace:
Jan 4 10:14:18 cephhost kernel: [ 3844.402901] [<ffffffff81101600>] ? sync_page+0x0/0x50
Jan 4 10:14:18 cephhost kernel: [ 3844.402904] [<ffffffff81587023>] io_schedule+0x73/0xc0
Jan 4 10:14:18 cephhost kernel: [ 3844.402906] [<ffffffff8110163d>] sync_page+0x3d/0x50
Jan 4 10:14:18 cephhost kernel: [ 3844.402909] [<ffffffff815878ff>] __wait_on_bit+0x5f/0x90
Jan 4 10:14:18 cephhost kernel: [ 3844.402912] [<ffffffff811017f3>] wait_on_page_bit+0x73/0x80
Jan 4 10:14:18 cephhost kernel: [ 3844.402914] [<ffffffff8107f650>] ? wake_bit_function+0x0/0x40
Jan 4 10:14:18 cephhost kernel: [ 3844.402917] [<ffffffff8110bee5>] ? pagevec_lookup_tag+0x25/0x40
Jan 4 10:14:18 cephhost kernel: [ 3844.402920] [<ffffffff81101cbd>] filemap_fdatawait_range+0x10d/0x1a0
Jan 4 10:14:18 cephhost kernel: [ 3844.402923] [<ffffffff81101e50>] filemap_write_and_wait_range+0x70/0x80
Jan 4 10:14:18 cephhost kernel: [ 3844.402926] [<ffffffff8117a39a>] vfs_fsync_range+0x5a/0xa0
Jan 4 10:14:18 cephhost kernel: [ 3844.402930] [<ffffffff8117a44c>] vfs_fsync+0x1c/0x20
Jan 4 10:14:18 cephhost kernel: [ 3844.402932] [<ffffffff8117a48a>] do_fsync+0x3a/0x60
Jan 4 10:14:18 cephhost kernel: [ 3844.402935] [<ffffffff8117a4e0>] sys_fsync+0x10/0x20
Jan 4 10:14:18 cephhost kernel: [ 3844.402938] [<ffffffff810461a0>] sysenter_dispatch+0x7/0x2e
Jan 4 10:14:18 cephhost kernel: [ 3844.417262] tiotest D 000000010005386f 0 9485 9381 0x00020000
Jan 4 10:14:18 cephhost kernel: [ 3844.417266] ffff880047d39cd8 0000000000000086 ffff880047d39c48 0000000000015980
Jan 4 10:14:18 cephhost kernel: [ 3844.417273] ffff880047d39fd8 0000000000015980 ffff880047d39fd8 ffff880062f0c4a0
Jan 4 10:14:18 cephhost kernel: [ 3844.417276] 0000000000015980 0000000000015980 ffff880047d39fd8 0000000000015980
Jan 4 10:14:18 cephhost kernel: [ 3844.417279] Call Trace:
Jan 4 10:14:18 cephhost kernel: [ 3844.417283] [<ffffffff81101600>] ? sync_page+0x0/0x50
Jan 4 10:14:18 cephhost kernel: [ 3844.417285] [<ffffffff81587023>] io_schedule+0x73/0xc0
Jan 4 10:14:18 cephhost kernel: [ 3844.417287] [<ffffffff8110163d>] sync_page+0x3d/0x50
Jan 4 10:14:18 cephhost kernel: [ 3844.417290] [<ffffffff815878ff>] __wait_on_bit+0x5f/0x90
Jan 4 10:14:18 cephhost kernel: [ 3844.417292] [<ffffffff811017f3>] wait_on_page_bit+0x73/0x80
Jan 4 10:14:18 cephhost kernel: [ 3844.417295] [<ffffffff8107f650>] ? wake_bit_function+0x0/0x40
Jan 4 10:14:18 cephhost kernel: [ 3844.417298] [<ffffffff8110bee5>] ? pagevec_lookup_tag+0x25/0x40
Jan 4 10:14:18 cephhost kernel: [ 3844.417301] [<ffffffff81101cbd>] filemap_fdatawait_range+0x10d/0x1a0
Jan 4 10:14:18 cephhost kernel: [ 3844.417304] [<ffffffff81101e50>] filemap_write_and_wait_range+0x70/0x80
Jan 4 10:14:18 cephhost kernel: [ 3844.417307] [<ffffffff8117a39a>] vfs_fsync_range+0x5a/0xa0
Jan 4 10:14:18 cephhost kernel: [ 3844.417310] [<ffffffff8117a44c>] vfs_fsync+0x1c/0x20
Jan 4 10:14:18 cephhost kernel: [ 3844.417313] [<ffffffff8117a48a>] do_fsync+0x3a/0x60
Jan 4 10:14:18 cephhost kernel: [ 3844.417316] [<ffffffff8117a4e0>] sys_fsync+0x10/0x20
Jan 4 10:14:18 cephhost kernel: [ 3844.417319] [<ffffffff810461a0>] sysenter_dispatch+0x7/0x2e
Jan 4 10:14:18 cephhost kernel: [ 3844.431643] tiotest D 0000000000000002 0 9486 9381 0x00020000
Jan 4 10:14:18 cephhost kernel: [ 3844.431647] ffff880047d5bcd8 0000000000000086 ffff880047d5bc58 0000000000015980
Jan 4 10:14:18 cephhost kernel: [ 3844.431652] ffff880047d5bfd8 0000000000015980 ffff880047d5bfd8 ffff880062f08000
Jan 4 10:14:18 cephhost kernel: [ 3844.431656] 0000000000015980 0000000000015980 ffff880047d5bfd8 0000000000015980
Jan 4 10:14:18 cephhost kernel: [ 3844.431660] Call Trace:
Jan 4 10:14:18 cephhost kernel: [ 3844.431665] [<ffffffff81101600>] ? sync_page+0x0/0x50
Jan 4 10:14:18 cephhost kernel: [ 3844.431669] [<ffffffff81587023>] io_schedule+0x73/0xc0
Jan 4 10:14:18 cephhost kernel: [ 3844.431671] [<ffffffff8110163d>] sync_page+0x3d/0x50
Jan 4 10:14:18 cephhost kernel: [ 3844.431674] [<ffffffff815878ff>] __wait_on_bit+0x5f/0x90
Jan 4 10:14:18 cephhost kernel: [ 3844.431676] [<ffffffff811017f3>] wait_on_page_bit+0x73/0x80
Jan 4 10:14:18 cephhost kernel: [ 3844.431680] [<ffffffff8107f650>] ? wake_bit_function+0x0/0x40
Jan 4 10:14:18 cephhost kernel: [ 3844.431683] [<ffffffff8110bee5>] ? pagevec_lookup_tag+0x25/0x40
Jan 4 10:14:18 cephhost kernel: [ 3844.431686] [<ffffffff81101cbd>] filemap_fdatawait_range+0x10d/0x1a0
Jan 4 10:14:18 cephhost kernel: [ 3844.431689] [<ffffffff81101e50>] filemap_write_and_wait_range+0x70/0x80
Jan 4 10:14:18 cephhost kernel: [ 3844.431692] [<ffffffff8117a39a>] vfs_fsync_range+0x5a/0xa0
Jan 4 10:14:18 cephhost kernel: [ 3844.431695] [<ffffffff8117a44c>] vfs_fsync+0x1c/0x20
Jan 4 10:14:18 cephhost kernel: [ 3844.431698] [<ffffffff8117a48a>] do_fsync+0x3a/0x60
Jan 4 10:14:18 cephhost kernel: [ 3844.431701] [<ffffffff8117a4e0>] sys_fsync+0x10/0x20
Jan 4 10:14:18 cephhost kernel: [ 3844.431705] [<ffffffff810461a0>] sysenter_dispatch+0x7/0x2e
Jan 4 10:14:18 cephhost kernel: [ 3844.446015] tiotest D 0000000000000002 0 9487 9381 0x00020000
Jan 4 10:14:18 cephhost kernel: [ 3844.446019] ffff880047c73cd8 0000000000000086 ffff880047c73c58 0000000000015980
Jan 4 10:14:18 cephhost kernel: [ 3844.446024] ffff880047c73fd8 0000000000015980 ffff880047c73fd8 ffff880062f0db80
Jan 4 10:14:18 cephhost kernel: [ 3844.446028] 0000000000015980 0000000000015980 ffff880047c73fd8 0000000000015980
Jan 4 10:14:18 cephhost kernel: [ 3844.446032] Call Trace:
Jan 4 10:14:18 cephhost kernel: [ 3844.446035] [<ffffffff81101600>] ? sync_page+0x0/0x50
Jan 4 10:14:18 cephhost kernel: [ 3844.446038] [<ffffffff81587023>] io_schedule+0x73/0xc0
Jan 4 10:14:18 cephhost kernel: [ 3844.446041] [<ffffffff8110163d>] sync_page+0x3d/0x50
Jan 4 10:14:18 cephhost kernel: [ 3844.446043] [<ffffffff815878ff>] __wait_on_bit+0x5f/0x90
Jan 4 10:14:18 cephhost kernel: [ 3844.446046] [<ffffffff811017f3>] wait_on_page_bit+0x73/0x80
Jan 4 10:14:18 cephhost kernel: [ 3844.446049] [<ffffffff8107f650>] ? wake_bit_function+0x0/0x40
Jan 4 10:14:18 cephhost kernel: [ 3844.446052] [<ffffffff8110bee5>] ? pagevec_lookup_tag+0x25/0x40
Jan 4 10:14:18 cephhost kernel: [ 3844.446055] [<ffffffff81101cbd>] filemap_fdatawait_range+0x10d/0x1a0
Jan 4 10:14:18 cephhost kernel: [ 3844.446058] [<ffffffff81101e50>] filemap_write_and_wait_range+0x70/0x80
Jan 4 10:14:18 cephhost kernel: [ 3844.446061] [<ffffffff8117a39a>] vfs_fsync_range+0x5a/0xa0
Jan 4 10:14:18 cephhost kernel: [ 3844.446064] [<ffffffff8117a44c>] vfs_fsync+0x1c/0x20
Jan 4 10:14:18 cephhost kernel: [ 3844.446067] [<ffffffff8117a48a>] do_fsync+0x3a/0x60
Jan 4 10:14:18 cephhost kernel: [ 3844.446070] [<ffffffff8117a4e0>] sys_fsync+0x10/0x20
Jan 4 10:14:18 cephhost kernel: [ 3844.446073] [<ffffffff810461a0>] sysenter_dispatch+0x7/0x2e
Jan 4 10:14:18 cephhost kernel: [ 3844.460366] tiotest D 0000000000000002 0 9488 9381 0x00020000
Jan 4 10:14:18 cephhost kernel: [ 3844.460369] ffff880047ca9cd8 0000000000000086 ffff880047ca9c58 0000000000015980
Jan 4 10:14:18 cephhost kernel: [ 3844.460372] ffff880047ca9fd8 0000000000015980 ffff880047ca9fd8 ffff88006e4f2dc0
Jan 4 10:14:18 cephhost kernel: [ 3844.460375] 0000000000015980 0000000000015980 ffff880047ca9fd8 0000000000015980
Jan 4 10:14:18 cephhost kernel: [ 3844.460378] Call Trace:
Jan 4 10:14:18 cephhost kernel: [ 3844.460381] [<ffffffff81101600>] ? sync_page+0x0/0x50
Jan 4 10:14:18 cephhost kernel: [ 3844.460383] [<ffffffff81587023>] io_schedule+0x73/0xc0
Jan 4 10:14:18 cephhost kernel: [ 3844.460385] [<ffffffff8110163d>] sync_page+0x3d/0x50
Jan 4 10:14:18 cephhost kernel: [ 3844.460388] [<ffffffff815878ff>] __wait_on_bit+0x5f/0x90
Jan 4 10:14:18 cephhost kernel: [ 3844.460390] [<ffffffff811017f3>] wait_on_page_bit+0x73/0x80
Jan 4 10:14:18 cephhost kernel: [ 3844.460392] [<ffffffff8107f650>] ? wake_bit_function+0x0/0x40
Jan 4 10:14:18 cephhost kernel: [ 3844.460395] [<ffffffff8110bee5>] ? pagevec_lookup_tag+0x25/0x40
Jan 4 10:14:18 cephhost kernel: [ 3844.460398] [<ffffffff81101cbd>] filemap_fdatawait_range+0x10d/0x1a0
Jan 4 10:14:18 cephhost kernel: [ 3844.460401] [<ffffffff81101e50>] filemap_write_and_wait_range+0x70/0x80
Jan 4 10:14:18 cephhost kernel: [ 3844.460403] [<ffffffff8117a39a>] vfs_fsync_range+0x5a/0xa0
Jan 4 10:14:18 cephhost kernel: [ 3844.460406] [<ffffffff8117a44c>] vfs_fsync+0x1c/0x20
Jan 4 10:14:18 cephhost kernel: [ 3844.460408] [<ffffffff8117a48a>] do_fsync+0x3a/0x60
Jan 4 10:14:18 cephhost kernel: [ 3844.460411] [<ffffffff8117a4e0>] sys_fsync+0x10/0x20
Jan 4 10:14:18 cephhost kernel: [ 3844.460413] [<ffffffff810461a0>] sysenter_dispatch+0x7/0x2e
Jan 4 10:14:18 cephhost kernel: [ 3844.474782] tiotest D 00000001000533a8 0 9489 9381 0x00020000
Jan 4 10:14:18 cephhost kernel: [ 3844.474788] ffff880047cfdcd8 0000000000000086 ffff880047cfdc48 0000000000015980
Jan 4 10:14:18 cephhost kernel: [ 3844.474788] ffff880047cfdcd8 0000000000000086 ffff880047cfdc48 0000000000015980
Jan 4 10:14:18 cephhost kernel: [ 3844.474791] ffff880047cfdfd8 0000000000015980 ffff880047cfdfd8 ffff88006e4f5b80
Jan 4 10:14:18 cephhost kernel: [ 3844.474794] 0000000000015980 0000000000015980 ffff880047cfdfd8 0000000000015980
Jan 4 10:14:18 cephhost kernel: [ 3844.474797] Call Trace:
Jan 4 10:14:18 cephhost kernel: [ 3844.474804] [<ffffffff81101600>] ? sync_page+0x0/0x50
Jan 4 10:14:18 cephhost kernel: [ 3844.474808] [<ffffffff81587023>] io_schedule+0x73/0xc0
Jan 4 10:14:18 cephhost kernel: [ 3844.474810] [<ffffffff8110163d>] sync_page+0x3d/0x50
Jan 4 10:14:18 cephhost kernel: [ 3844.474812] [<ffffffff815878ff>] __wait_on_bit+0x5f/0x90
Jan 4 10:14:18 cephhost kernel: [ 3844.474814] [<ffffffff811017f3>] wait_on_page_bit+0x73/0x80
Jan 4 10:14:18 cephhost kernel: [ 3844.474818] [<ffffffff8107f650>] ? wake_bit_function+0x0/0x40
Jan 4 10:14:18 cephhost kernel: [ 3844.474821] [<ffffffff8110bee5>] ? pagevec_lookup_tag+0x25/0x40
Jan 4 10:14:18 cephhost kernel: [ 3844.474823] [<ffffffff81101cbd>] filemap_fdatawait_range+0x10d/0x1a0
Jan 4 10:14:18 cephhost kernel: [ 3844.474826] [<ffffffff81101e50>] filemap_write_and_wait_range+0x70/0x80
Jan 4 10:14:18 cephhost kernel: [ 3844.474831] [<ffffffff8117a39a>] vfs_fsync_range+0x5a/0xa0
Jan 4 10:14:18 cephhost kernel: [ 3844.474834] [<ffffffff8117a44c>] vfs_fsync+0x1c/0x20
Jan 4 10:14:18 cephhost kernel: [ 3844.474836] [<ffffffff8117a48a>] do_fsync+0x3a/0x60
Jan 4 10:14:18 cephhost kernel: [ 3844.474838] [<ffffffff8117a4e0>] sys_fsync+0x10/0x20
Jan 4 10:14:18 cephhost kernel: [ 3844.474844] [<ffffffff810461a0>] sysenter_dispatch+0x7/0x2e
Jan 4 10:14:18 cephhost kernel: [ 3844.489160] tiotest D 00000001000531f5 0 9490 9381 0x00020000
Jan 4 10:14:18 cephhost kernel: [ 3844.489163] ffff880047c4dcd8 0000000000000086 ffff880000000000 0000000000015980
Jan 4 10:14:18 cephhost kernel: [ 3844.489166] ffff880047c4dfd8 0000000000015980 ffff880047c4dfd8 ffff88006e4f44a0
Jan 4 10:14:18 cephhost kernel: [ 3844.489169] 0000000000015980 0000000000015980 ffff880047c4dfd8 0000000000015980
Jan 4 10:14:18 cephhost kernel: [ 3844.489172] Call Trace:
Jan 4 10:14:18 cephhost kernel: [ 3844.489177] [<ffffffff81101600>] ? sync_page+0x0/0x50
Jan 4 10:14:18 cephhost kernel: [ 3844.489180] [<ffffffff81587023>] io_schedule+0x73/0xc0
Jan 4 10:14:18 cephhost kernel: [ 3844.489182] [<ffffffff8110163d>] sync_page+0x3d/0x50
Jan 4 10:14:18 cephhost kernel: [ 3844.489184] [<ffffffff815878ff>] __wait_on_bit+0x5f/0x90
Jan 4 10:14:18 cephhost kernel: [ 3844.489187] [<ffffffff811017f3>] wait_on_page_bit+0x73/0x80
Jan 4 10:14:18 cephhost kernel: [ 3844.489189] [<ffffffff8107f650>] ? wake_bit_function+0x0/0x40
Jan 4 10:14:18 cephhost kernel: [ 3844.489192] [<ffffffff8110bee5>] ? pagevec_lookup_tag+0x25/0x40
Jan 4 10:14:18 cephhost kernel: [ 3844.489195] [<ffffffff81101cbd>] filemap_fdatawait_range+0x10d/0x1a0
Jan 4 10:14:18 cephhost kernel: [ 3844.489198] [<ffffffff81101e50>] filemap_write_and_wait_range+0x70/0x80
Jan 4 10:14:18 cephhost kernel: [ 3844.489201] [<ffffffff8117a39a>] vfs_fsync_range+0x5a/0xa0
Jan 4 10:14:18 cephhost kernel: [ 3844.489204] [<ffffffff8117a44c>] vfs_fsync+0x1c/0x20
Jan 4 10:14:18 cephhost kernel: [ 3844.489206] [<ffffffff8117a48a>] do_fsync+0x3a/0x60
Jan 4 10:14:18 cephhost kernel: [ 3844.489208] [<ffffffff8117a4e0>] sys_fsync+0x10/0x20
Jan 4 10:14:18 cephhost kernel: [ 3844.489212] [<ffffffff810461a0>] sysenter_dispatch+0x7/0x2e
Jan 4 10:14:18 cephhost kernel: [ 3844.503653] tiotest D 0000000100053418 0 9491 9381 0x00020000
Jan 4 10:14:18 cephhost kernel: [ 3844.503656] ffff8800542c3cd8 0000000000000086 ffff880000000000 0000000000015980
Jan 4 10:14:18 cephhost kernel: [ 3844.503660] ffff8800542c3fd8 0000000000015980 ffff8800542c3fd8 ffff8800020216e0
Jan 4 10:14:18 cephhost kernel: [ 3844.503662] 0000000000015980 0000000000015980 ffff8800542c3fd8 0000000000015980
Jan 4 10:14:18 cephhost kernel: [ 3844.503665] Call Trace:
Jan 4 10:14:18 cephhost kernel: [ 3844.503669] [<ffffffff81101600>] ? sync_page+0x0/0x50
Jan 4 10:14:18 cephhost kernel: [ 3844.503672] [<ffffffff81587023>] io_schedule+0x73/0xc0
Jan 4 10:14:18 cephhost kernel: [ 3844.503674] [<ffffffff8110163d>] sync_page+0x3d/0x50
Jan 4 10:14:18 cephhost kernel: [ 3844.503676] [<ffffffff815878ff>] __wait_on_bit+0x5f/0x90
Jan 4 10:14:18 cephhost kernel: [ 3844.503679] [<ffffffff811017f3>] wait_on_page_bit+0x73/0x80
Jan 4 10:14:18 cephhost kernel: [ 3844.503681] [<ffffffff8107f650>] ? wake_bit_function+0x0/0x40
Jan 4 10:14:18 cephhost kernel: [ 3844.503684] [<ffffffff8110bee5>] ? pagevec_lookup_tag+0x25/0x40
Jan 4 10:14:18 cephhost kernel: [ 3844.503686] [<ffffffff81101cbd>] filemap_fdatawait_range+0x10d/0x1a0
Jan 4 10:14:18 cephhost kernel: [ 3844.503689] [<ffffffff81101e50>] filemap_write_and_wait_range+0x70/0x80
Jan 4 10:14:18 cephhost kernel: [ 3844.503692] [<ffffffff8117a39a>] vfs_fsync_range+0x5a/0xa0
Jan 4 10:14:18 cephhost kernel: [ 3844.503695] [<ffffffff8117a44c>] vfs_fsync+0x1c/0x20
Jan 4 10:14:18 cephhost kernel: [ 3844.503697] [<ffffffff8117a48a>] do_fsync+0x3a/0x60
Jan 4 10:14:18 cephhost kernel: [ 3844.503699] [<ffffffff8117a4e0>] sys_fsync+0x10/0x20
Jan 4 10:14:18 cephhost kernel: [ 3844.503702] [<ffffffff810461a0>] sysenter_dispatch+0x7/0x2e
Jan 4 10:14:36 cephhost kernel: [ 3862.052078] libceph: tid 17654 timed out on osd1, will reset osd
Updated by changping Wu over 13 years ago
Hi ,i tried to modify some of codes which can decrease the timeout issue;
For instance:
mount.ceph monip:6789/ /mnt/ceph
1. reduce the tiotest I/O stress
example:
-----------
#!/bin/bash
while echo `date`;do ./tiotest f 1 -t 32 -d /mnt/ceph -T -D -r 10;done
--------------
2.
tiotest R/W random data,the performance is not good ,R/W and sync 3000M data into OSD, then ,
"#define CEPH_OSD_KEEPALIVE_DEFAULT 5" as OSD timeout schedule time is not enough, we can change timeout schedule time to "#define CEPH_OSD_TIMEOUT_DEFAULT 60 ";
3.handle_timeout:
1. osd keepalive, some times , printk "socked closed",i have to keep osd alive before sending timeout requests.
1. handle req->r_resend request;
2. If osd isn't up or addr is changed, reset osd ,send timeout request;
3. still fail, reset osd ,resend request;
4. still fail , abort any requests on the OSDs, to make sure that don't printk OOPS.
and i also wonder that : if add the timer at the osd , how to make sure that each of requests had been handled completely? maybe , it will loss some of requests.
and
Could it add the timer at each of requests ??
Jeff Wu
Updated by Colin McCabe about 13 years ago
- Assignee changed from Colin McCabe to Samuel Just
Updated by Sage Weil about 13 years ago
- Status changed from New to Can't reproduce
- Translation missing: en.field_position set to 1
- Translation missing: en.field_position changed from 1 to 594