Project

General

Profile

Actions

Bug #674

closed

tiobench stress test , OSD timeout

Added by changping Wu over 13 years ago. Updated about 13 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,
we do multi-thread stress test for ceph 0.23.1 , ceph client printk osd timeout.

1. test tool: tiobench-0.3.3 ,test command:
./tiobench-0.3.3/tiotest -t 4 -f 4096 -r 1000 -b 131072 -d /mnt/ceph -T

2. ceph server version: 0.23-1
3. ceph client: git from git://ceph.newdeam.net/git/ceph-client-standalone.git, unstable-backport.
4. ceph server hosts OS: ubuntu 10.04 server x86_64 ,kernel:2.6.32-21-server
5. ceph client host OS:ubuntu 10.04 server x86_64,kernel: 2.6.32-21-server
6. ceph config: ceph-none, two OSD server:osd0 and osd1 , one MDS server and one MON server

===========================================
1. tiobench log:

===================================================================================

Run #1: ./tiobench-0.3.3/tiotest -t 4 -f 4096 -r 1000 -b 131072 -d /mnt/ceph -T

Unit information

================

File size = megabytes

Blk Size = bytes

Rate = megabytes per second

CPU% = percentage of CPU used during the test

Latency = milliseconds

Lat% = percent of requests that took longer than X seconds

CPU Eff = Rate divided by CPU% - throughput per cpu load

File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU

Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff

---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----

File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU

Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff

---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----

File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU

Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff

---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----

File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU

Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff

---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----

File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU

Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff

---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----

File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU

Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff

---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- -----

Sequential Reads

2.6.32-21-server 16384 4096 4 66.62 18.81% 0.227 251.11 0.00000 0.00000 354

2.6.32-21-server 16384 8192 4 65.77 16.16% 0.452 458.29 0.00000 0.00000 407

2.6.32-21-server 16384 16384 4 63.92 16.11% 0.968 559.74 0.00000 0.00000 397

2.6.32-21-server 16384 32768 4 64.61 14.39% 1.815 1046.43 0.00000 0.00000 449

2.6.32-21-server 16384 65536 4 66.26 15.32% 3.765 837.34 0.00000 0.00000 432

2.6.32-21-server 16384 13107 4 64.53 14.43% 7.566 1349.89 0.00000 0.00000 447

Random Reads

2.6.32-21-server 16384 4096 4 0.75 1.108% 20.405 106.68 0.00000 0.00000 68

2.6.32-21-server 16384 8192 4 1.66 0.797% 17.845 109.55 0.00000 0.00000 208

2.6.32-21-server 16384 16384 4 2.90 1.812% 20.977 333.91 0.00000 0.00000 160

2.6.32-21-server 16384 32768 4 6.25 1.849% 18.955 279.49 0.00000 0.00000 338

2.6.32-21-server 16384 65536 4 11.87 3.702% 20.782 439.90 0.00000 0.00000 321

2.6.32-21-server 16384 13107 4 22.14 6.199% 22.256 367.77 0.00000 0.00000 357

Sequential Writes

2.6.32-21-server 16384 4096 4 17.18 7.038% 0.862 14455.88 0.01473 0.00026 244

2.6.32-21-server 16384 8192 4 16.09 5.656% 1.831 18047.37 0.02990 0.00076 284

2.6.32-21-server 16384 16384 4 16.19 5.111% 3.671 13165.22 0.06485 0.00057 317

2.6.32-21-server 16384 32768 4 13.84 4.163% 8.497 24246.45 0.14687 0.00782 332

2.6.32-21-server 16384 65536 4 13.27 3.808% 17.584 24053.48 0.29259 0.01717 348

2.6.32-21-server 16384 13107 4 13.45 3.814% 35.383 23944.59 0.62485 0.03281 353

Random Writes

2.6.32-21-server 16384 4096 4 0.60 0.880% 0.220 330.88 0.00000 0.00000 68

2.6.32-21-server 16384 8192 4 1.00 0.669% 0.945 3338.01 0.02500 0.00000 149

2.6.32-21-server 16384 16384 4 1.41 0.699% 0.853 2497.44 0.02500 0.00000 202

2.6.32-21-server 16384 32768 4 2.61 0.814% 0.264 493.69 0.00000 0.00000 321

2.6.32-21-server 16384 65536 4 0.77 0.353% 1.176 1520.99 0.00000 0.00000 217

2.6.32-21-server 16384 13107 4 1.44 0.610% 0.838 1001.35 0.00000 0.00000 236

2. dmesg log:

========================================================

..............................................

..............................................

[75004.800179] INFO: task tiotest:2337 blocked for more than 120 seconds.

[75004.800213] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

[75004.800240] tiotest D 00000000ffffffff 0 2337 2180 0x00000008

[75004.800243] ffff880116257cc8 0000000000000082 0000000000015bc0 0000000000015bc0

[75004.800246] ffff880115825f80 ffff880116257fd8 0000000000015bc0 ffff880115825bc0

[75004.800249] 0000000000015bc0 ffff880116257fd8 0000000000015bc0 ffff880115825f80

[75004.800252] Call Trace:

[75004.800254] [<ffffffff810f3460>] ? sync_page+0x0/0x50

[75004.800257] [<ffffffff815555c7>] io_schedule+0x47/0x70

[75004.800259] [<ffffffff810f349d>] sync_page+0x3d/0x50

[75004.800262] [<ffffffff81555bef>] __wait_on_bit+0x5f/0x90

[75004.800264] [<ffffffff810f3653>] wait_on_page_bit+0x73/0x80

[75004.800267] [<ffffffff81084fe0>] ? wake_bit_function+0x0/0x40

[75004.800269] [<ffffffff810fd9f5>] ? pagevec_lookup_tag+0x25/0x40

[75004.800272] [<ffffffff810f3ae5>] wait_on_page_writeback_range+0xf5/0x190

[75004.800275] [<ffffffff810f3cb8>] filemap_write_and_wait_range+0x78/0x90

[75004.800277] [<ffffffff8116a62e>] vfs_fsync_range+0x7e/0xe0

[75004.800280] [<ffffffff8116a6fd>] vfs_fsync+0x1d/0x20

[75004.800282] [<ffffffff8116a73e>] do_fsync+0x3e/0x60

[75004.800284] [<ffffffff8116a790>] sys_fsync+0x10/0x20

[75004.800286] [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b

[75058.860051] libceph: tid 232244 timed out on osd0, will reset osd

[75058.861555] libceph: tid 232902 timed out on osd1, will reset osd

[75118.860035] libceph: tid 232596 timed out on osd0, will reset osd

[75118.860925] libceph: tid 233897 timed out on osd1, will reset osd

[75178.860021] libceph: tid 233467 timed out on osd0, will reset osd

[75846.350031] libceph: tid 280625 timed out on osd0, will reset osd

[75911.350030] libceph: tid 280786 timed out on osd0, will reset osd

[75986.350025] libceph: tid 281143 timed out on osd0, will reset osd

[76046.350037] libceph: tid 281253 timed out on osd0, will reset osd

[76106.350039] libceph: tid 281459 timed out on osd0, will reset osd

[76166.350038] libceph: tid 281661 timed out on osd0, will reset osd

[76226.350038] libceph: tid 281859 timed out on osd0, will reset osd

[76286.360026] libceph: tid 282082 timed out on osd0, will reset osd

[76346.360029] libceph: tid 282286 timed out on osd0, will reset osd

[76406.360025] libceph: tid 282478 timed out on osd0, will reset osd

[76466.360033] libceph: tid 282702 timed out on osd0, will reset osd

[76526.360029] libceph: tid 282899 timed out on osd0, will reset osd

[76586.360035] libceph: tid 283090 timed out on osd0, will reset osd

[76646.360033] libceph: tid 283304 timed out on osd0, will reset osd

[76754.770199] libceph: tid 283819 timed out on osd0, will reset osd

[76754.773797] libceph: tid 284035 timed out on osd1, will reset osd

[76814.770080] libceph: tid 284624 timed out on osd0, will reset osd

[76814.772596] libceph: tid 285401 timed out on osd1, will reset osd

[76874.770052] libceph: tid 285191 timed out on osd0, will reset osd

[76874.771749] libceph: tid 286271 timed out on osd1, will reset osd

[76934.770035] libceph: tid 285730 timed out on osd0, will reset osd

[76934.771057] libceph: tid 287155 timed out on osd1, will reset osd

[76994.770023] libceph: tid 286576 timed out on osd0, will reset osd

[77612.270030] libceph: tid 328771 timed out on osd0, will reset osd

-==========

1. test tool attached, test command,for instance: ./tiobench.sh /mnt/ceph 4096


Files

tiobench-0.3.3.tgz (827 KB) tiobench-0.3.3.tgz changping Wu, 12/27/2010 01:45 AM
tiotest_osd_timeout.txt (8.62 KB) tiotest_osd_timeout.txt changping Wu, 12/27/2010 01:45 AM
tiotest (32.6 KB) tiotest changping Wu, 01/03/2011 06:09 PM
Actions

Also available in: Atom PDF