Project

General

Profile

Actions

Bug #13134

closed

old leveldb is slow (firefly->hammer upgrade leads to slow requests)

Added by Corin Langosch over 8 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Yesterday I upgraded a firefly cluster (started with argonaut several years ago) to hammer 0.94.3. The cluster was completely healthy and seemed to work fine.

During the night at 2015-09-17 04:51:14 osd.0 suddenly died. The log is attached. I only noticed it this morning, when recovery was already finished. But ceph still reported "284 requests are blocked > 32 sec":

ceph -w
    cluster 4ac0e21b-6ea2-4ac7-8114-122bd9ba55d6
     health HEALTH_WARN
            284 requests are blocked > 32 sec
     monmap e5: 3 mons at {a=10.0.0.5:6789/0,b=10.0.0.6:6789/0,c=10.0.0.7:6789/0}
            election epoch 798, quorum 0,1,2 a,b,c
     osdmap e21073: 19 osds: 9 up, 9 in
      pgmap v73120784: 4096 pgs, 1 pools, 3183 GB data, 797 kobjects
            6373 GB used, 2077 GB / 8451 GB avail
                4096 active+clean
  client io 29236 kB/s rd, 4152 kB/s wr, 1647 op/s

ceph health detail
HEALTH_WARN 284 requests are blocked > 32 sec; 1 osds have slow requests
284 ops are blocked > 4194.3 sec
284 ops are blocked > 4194.3 sec on osd.9
1 osds have slow requests

Looking at osd.9 everything looked normal: cpu, memory, iostat (0% - 5%). I waited a few more minutes but nothing changed. A few qemu machines were unresponsive during that period. As the cluster was healthy expect for that I decided to simply restart osd.9 - and yes, the problem was solved and the cluster recovered within a few seconds to state HEALTHY. Logs of osd.9 are attached too. Now around 1h later the cluster is still HEALTHY.

Looking at the logs I assume there's probably a locking bug in the ceph filestore backend? Probably it's also a "tuning bug" of my ceph config?

Configuration:
  • hammer 0.94.3
  • full ssd, no spinning disks
  • 19 osds in total but normally only 10 in/ up (currently 9 because of the one which died this night)
  • only used for qemu vms (around 200 and growing)
  • no errors in syslog/dmesg, iostat looks good
ID  WEIGHT   TYPE NAME            UP/DOWN REWEIGHT PRIMARY-AFFINITY 
 -4 10.00000 root ssd                                               
 -2  2.00000     host r-ch102-ssd                                   
  0  1.00000         osd.0           down        0          1.00000 
 15  1.00000         osd.15            up  1.00000          1.00000 
 -5  2.00000     host r-ch103-ssd                                   
  3  1.00000         osd.3             up  1.00000          1.00000 
 16  1.00000         osd.16            up  1.00000          1.00000 
 -6  2.00000     host r-ch104-ssd                                   
  6  1.00000         osd.6             up  1.00000          1.00000 
 17  1.00000         osd.17            up  1.00000          1.00000 
 -7  2.00000     host r-ch105-ssd                                   
  9  1.00000         osd.9             up  1.00000          1.00000 
 18  1.00000         osd.18            up  1.00000          1.00000 
 -8  2.00000     host r-ch106-ssd                                   
 12  1.00000         osd.12            up  1.00000          1.00000 
 14  1.00000         osd.14            up  1.00000          1.00000 
 -1 26.09995 root hdd                                               
 -3  5.79999     host r-ch102-hdd                                   
  1  2.89999         osd.1           down        0          1.00000 
  2  2.89999         osd.2           down        0          1.00000 
 -9  5.79999     host r-ch103-hdd                                   
  4  2.89999         osd.4           down        0          1.00000 
  5  2.89999         osd.5           down        0          1.00000 
-10  5.79999     host r-ch104-hdd                                   
  7  2.89999         osd.7           down        0          1.00000 
  8  2.89999         osd.8           down        0          1.00000 
-11  5.79999     host r-ch105-hdd                                   
 10  2.89999         osd.10          down        0          1.00000 
 11  2.89999         osd.11          down        0          1.00000 
-12  2.89999     host r-ch106-hdd                                   
 13  2.89999         osd.13          down        0          1.00000 
[global]
  max open files = 65536
  auth cluster required = cephx
  auth service required = cephx
  auth client required = cephx
  cephx require signatures = true
  public network = 10.0.0.0/24
  cluster network = 10.0.0.0/24
  debug mon = 0
  debug paxos = 0
  mon osd down out interval = 3600

[client]
  rbd cache = true
  rbd cache size = 33554432
  rbd cache max dirty = 25165824
  rbd cache target dirty = 16777216
  rbd cache max dirty age = 3

[mon]
  osd pool default flag hashpspool = true
  mon data avail warn = 15
  mon data avail crit = 5

[osd]
  osd journal size = 1000
  osd journal dio = true
  osd journal aio = true
  osd op threads = 8
  osd min pg log entries = 250
  osd max pg log entries = 1000
  osd crush update on start = false
  filestore op threads = 16
  filestore max sync interval = 10
  filestore min sync interval = 3

[mon.a]
  host = r-ch103
  mon addr = 10.0.0.5:6789

[mon.b]
  host = r-ch104
  mon addr = 10.0.0.6:6789

[mon.c]
  host = r-ch105
  mon addr = 10.0.0.7:6789

[osd.0]
  host = r-ch102

[osd.1]
  host = r-ch102

[osd.2]
  host = r-ch102

[osd.3]
  host = r-ch103

[osd.4]
  host = r-ch103

[osd.5]
  host = r-ch103

[osd.6]
  host = r-ch104

[osd.7]
  host = r-ch104

[osd.8]
  host = r-ch104

[osd.9]
  host = r-ch105

[osd.10]
  host = r-ch105

[osd.11]
  host = r-ch105

[osd.12]
  host = r-ch106

[osd.13]
  host = r-ch106

[osd.14]
  host = r-ch106

[osd.15]
  host = r-ch102

[osd.16]
  host = r-ch103

[osd.17]
  host = r-ch104

[osd.18]
  host = r-ch105

Sidenotes: Upgrading pgs on old osds took very long (20 minutes per osd), while upgrading pgs on newer osds was very fast (15s per osd). IO of the particular OSD disk was idle during that time, but the osd process took constantly 100%. After everything was healthy again, I restarted again and then the osd booted fast. SO I think it had to clean up a lot of old gargabe then upgrading and in the end everything was fine.

Firefly was running really stable and I'm not quite concerned there's something wrong with my cluster:
  • why did osd.0 suddenly die?
  • why do blocked requests exist even system resources are idle?

Files

logs.tar.gz (791 KB) logs.tar.gz Corin Langosch, 09/17/2015 06:58 AM
Actions #1

Updated by Corin Langosch over 8 years ago

It seems I found a way to reliably reproduce blocked requests. A simple "rbd info" on this image always takes around 43 seconds. "rbd info" on any other image (of the same pool, I only have 1 pool) takes only a few milliseconds. All osds are almost idle, the cluster is completely healthy. But during this particular "rbd info" the cluster reports slow operations...

~# date; rbd info 3e59ad5a-6bcb-4679-9f92-1f7c107f7f40
Thu Sep 17 10:28:01 CEST 2015
rbd image '3e59ad5a-6bcb-4679-9f92-1f7c107f7f40':
    size 10240 MB in 2560 objects
    order 22 (4096 kB objects)
    block_name_prefix: rbd_data.193202eb141f2
    format: 2
    features: layering, striping
    flags: 
    stripe unit: 4096 kB
    stripe count: 1
Thu Sep 17 10:28:44 CEST 2015
root@r-ch106:~# date; ceph -w
Thu Sep 17 10:27:56 CEST 2015
    cluster 4ac0e21b-6ea2-4ac7-8114-122bd9ba55d6
     health HEALTH_OK
     monmap e5: 3 mons at {a=10.0.0.5:6789/0,b=10.0.0.6:6789/0,c=10.0.0.7:6789/0}
            election epoch 804, quorum 0,1,2 a,b,c
     osdmap e21077: 19 osds: 9 up, 9 in
      pgmap v73125278: 4096 pgs, 1 pools, 3183 GB data, 797 kobjects
            6373 GB used, 2077 GB / 8451 GB avail
                4096 active+clean
  client io 34482 kB/s rd, 2483 kB/s wr, 588 op/s

2015-09-17 10:27:53.386983 mon.0 [INF] pgmap v73125277: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 72520 kB/s rd, 2939 kB/s wr, 1421 op/s
2015-09-17 10:27:56.279605 mon.0 [INF] pgmap v73125278: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 34482 kB/s rd, 2483 kB/s wr, 588 op/s
2015-09-17 10:27:57.292504 mon.0 [INF] pgmap v73125279: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 23209 kB/s rd, 2007 kB/s wr, 422 op/s
2015-09-17 10:27:58.320158 mon.0 [INF] pgmap v73125280: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 63944 kB/s rd, 6970 kB/s wr, 1411 op/s
2015-09-17 10:27:59.329315 mon.0 [INF] pgmap v73125281: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 51018 kB/s rd, 8907 kB/s wr, 1107 op/s
2015-09-17 10:28:01.348203 mon.0 [INF] pgmap v73125282: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 13451 kB/s rd, 2079 kB/s wr, 232 op/s
2015-09-17 10:28:02.366314 mon.0 [INF] pgmap v73125283: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 15647 kB/s rd, 577 kB/s wr, 305 op/s
2015-09-17 10:28:03.418008 mon.0 [INF] pgmap v73125284: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 55268 kB/s rd, 4311 kB/s wr, 1037 op/s
2015-09-17 10:28:06.287864 mon.0 [INF] pgmap v73125285: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 25854 kB/s rd, 2205 kB/s wr, 470 op/s
2015-09-17 10:28:07.302170 mon.0 [INF] pgmap v73125286: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 14179 kB/s rd, 780 kB/s wr, 295 op/s
2015-09-17 10:28:08.324754 mon.0 [INF] pgmap v73125287: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 34326 kB/s rd, 4316 kB/s wr, 890 op/s
2015-09-17 10:28:09.327834 mon.0 [INF] pgmap v73125288: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 29329 kB/s rd, 4349 kB/s wr, 724 op/s
2015-09-17 10:28:11.347225 mon.0 [INF] pgmap v73125289: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 9738 kB/s rd, 1015 kB/s wr, 188 op/s
2015-09-17 10:28:12.363401 mon.0 [INF] pgmap v73125290: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 7025 kB/s rd, 2237 kB/s wr, 263 op/s
2015-09-17 10:28:13.385668 mon.0 [INF] pgmap v73125291: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 20703 kB/s rd, 11170 kB/s wr, 755 op/s
2015-09-17 10:28:16.285629 mon.0 [INF] pgmap v73125292: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 7030 kB/s rd, 4370 kB/s wr, 253 op/s
2015-09-17 10:28:17.289644 mon.0 [INF] pgmap v73125293: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 700 kB/s rd, 293 kB/s wr, 58 op/s
2015-09-17 10:28:18.316729 mon.0 [INF] pgmap v73125294: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 7655 kB/s rd, 1588 kB/s wr, 339 op/s
2015-09-17 10:28:19.317385 mon.0 [INF] pgmap v73125295: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 6285 kB/s rd, 1524 kB/s wr, 285 op/s
2015-09-17 10:28:21.338014 mon.0 [INF] pgmap v73125296: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 1333 kB/s rd, 507 kB/s wr, 96 op/s
2015-09-17 10:28:22.349435 mon.0 [INF] pgmap v73125297: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 3822 kB/s rd, 745 kB/s wr, 217 op/s
2015-09-17 10:28:23.371238 mon.0 [INF] pgmap v73125298: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 11290 kB/s rd, 3649 kB/s wr, 800 op/s
2015-09-17 10:28:26.283661 mon.0 [INF] pgmap v73125299: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 5648 kB/s rd, 1765 kB/s wr, 381 op/s
2015-09-17 10:28:27.297412 mon.0 [INF] pgmap v73125300: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 5168 kB/s rd, 522 kB/s wr, 178 op/s
2015-09-17 10:28:28.315345 mon.0 [INF] pgmap v73125301: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 13038 kB/s rd, 2345 kB/s wr, 543 op/s
2015-09-17 10:28:29.315664 mon.0 [INF] pgmap v73125302: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 8068 kB/s rd, 2027 kB/s wr, 399 op/s
2015-09-17 10:28:31.334383 mon.0 [INF] pgmap v73125303: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 4496 kB/s rd, 364 kB/s wr, 116 op/s
2015-09-17 10:28:32.345808 mon.0 [INF] pgmap v73125304: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 9767 kB/s rd, 1122 kB/s wr, 338 op/s
2015-09-17 10:28:33.380041 mon.0 [INF] pgmap v73125305: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 28904 kB/s rd, 3447 kB/s wr, 1065 op/s
2015-09-17 10:28:36.285225 mon.0 [INF] pgmap v73125306: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 10043 kB/s rd, 1226 kB/s wr, 384 op/s
2015-09-17 10:28:37.291872 mon.0 [INF] pgmap v73125307: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 486 kB/s rd, 438 kB/s wr, 68 op/s
2015-09-17 10:28:38.317072 mon.0 [INF] pgmap v73125308: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 5387 kB/s rd, 2291 kB/s wr, 262 op/s
2015-09-17 10:28:39.437826 mon.0 [INF] pgmap v73125309: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 5011 kB/s rd, 1909 kB/s wr, 243 op/s
2015-09-17 10:28:41.618586 mon.0 [INF] pgmap v73125310: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 268 kB/s rd, 285 kB/s wr, 56 op/s
2015-09-17 10:28:42.632947 mon.0 [INF] pgmap v73125311: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 940 kB/s rd, 678 kB/s wr, 118 op/s
2015-09-17 10:28:43.648385 mon.0 [INF] pgmap v73125312: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 4112 kB/s rd, 1514 kB/s wr, 299 op/s
2015-09-17 10:28:39.058878 osd.3 [WRN] 3 slow requests, 3 included below; oldest blocked for > 30.827032 secs
2015-09-17 10:28:39.058942 osd.3 [WRN] slow request 30.827032 seconds old, received at 2015-09-17 10:28:08.231560: osd_op(client.1947038.0:3706449 rbd_data.b926f3d1b58ba.0000000000000400 [set-alloc-hint object_size 4194304 write_size 4194304,write 1839104~4096] 5.7f0fe221 ack+ondisk+write e21077) currently waiting for subops from 14
2015-09-17 10:28:39.058957 osd.3 [WRN] slow request 30.017233 seconds old, received at 2015-09-17 10:28:09.041360: osd_op(client.1558293.0:88136706 rbd_data.c3fc12eb141f2.0000000000000000 [set-alloc-hint object_size 4194304 write_size 4194304,write 1097728~8192] 5.440460d2 ack+ondisk+write e21077) currently no flag points reached
2015-09-17 10:28:39.058968 osd.3 [WRN] slow request 30.016593 seconds old, received at 2015-09-17 10:28:09.042000: osd_op(client.1558293.0:88136707 rbd_data.c3fc12eb141f2.0000000000000000 [set-alloc-hint object_size 4194304 write_size 4194304,write 1110016~4096] 5.440460d2 ack+ondisk+write e21077) currently no flag points reached
2015-09-17 10:28:41.059611 osd.3 [WRN] 5 slow requests, 2 included below; oldest blocked for > 32.827882 secs
2015-09-17 10:28:41.059626 osd.3 [WRN] slow request 30.026765 seconds old, received at 2015-09-17 10:28:11.032678: osd_op(client.1947024.0:395679389 rbd_data.c5e0238e1f29.00000000000013d0 [set-alloc-hint object_size 4194304 write_size 4194304,write 3629056~4096] 5.da9eaa0a ack+ondisk+write e21077) currently waiting for subops from 12
2015-09-17 10:28:41.059639 osd.3 [WRN] slow request 30.025517 seconds old, received at 2015-09-17 10:28:11.033926: osd_op(client.1947024.0:395679409 rbd_data.c5e0238e1f29.0000000000000e0a [set-alloc-hint object_size 4194304 write_size 4194304,write 1548288~4096] 5.471fd195 ack+ondisk+write e21077) currently no flag points reached
2015-09-17 10:28:45.060796 osd.3 [WRN] 6 slow requests, 1 included below; oldest blocked for > 36.829131 secs
2015-09-17 10:28:45.060820 osd.3 [WRN] slow request 30.877864 seconds old, received at 2015-09-17 10:28:14.182828: osd_op(client.1947039.0:719533 rbd_data.192ed3d1b58ba.0000000000000426 [set-alloc-hint object_size 4194304 write_size 4194304,write 3440640~16384] 5.e2d8b35c ack+ondisk+write e21077) currently waiting for subops from 18
2015-09-17 10:28:46.660312 mon.0 [INF] pgmap v73125313: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 1991 kB/s rd, 521 kB/s wr, 98 op/s
2015-09-17 10:28:47.684900 mon.0 [INF] pgmap v73125314: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 4205 kB/s rd, 1050 kB/s wr, 200 op/s
2015-09-17 10:28:48.701272 mon.0 [INF] pgmap v73125315: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 14017 kB/s rd, 3620 kB/s wr, 922 op/s
2015-09-17 10:28:51.292972 mon.0 [INF] pgmap v73125316: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 5205 kB/s rd, 1436 kB/s wr, 477 op/s
2015-09-17 10:28:52.310169 mon.0 [INF] pgmap v73125317: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 3777 kB/s rd, 1172 kB/s wr, 489 op/s
2015-09-17 10:28:48.319915 osd.15 [INF] 5.cab scrub starts
2015-09-17 10:28:48.788769 osd.15 [INF] 5.cab scrub ok
2015-09-17 10:28:50.319070 osd.15 [INF] 5.cb6 scrub starts
2015-09-17 10:28:50.830432 osd.15 [INF] 5.cb6 scrub ok
2015-09-17 10:28:53.341418 mon.0 [INF] pgmap v73125318: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 34322 kB/s rd, 5197 kB/s wr, 2018 op/s
2015-09-17 10:28:54.343829 mon.0 [INF] pgmap v73125319: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 31042 kB/s rd, 6442 kB/s wr, 1705 op/s
2015-09-17 10:28:46.061163 osd.3 [WRN] 7 slow requests, 1 included below; oldest blocked for > 37.829468 secs
2015-09-17 10:28:46.061211 osd.3 [WRN] slow request 30.744623 seconds old, received at 2015-09-17 10:28:15.316406: osd_op(client.2043956.0:109 rbd_header.193202eb141f2 [watch ping cookie 140165049747648] 5.168f9a14 ondisk+write+known_if_redirected e21077) currently no flag points reached
2015-09-17 10:28:56.367811 mon.0 [INF] pgmap v73125320: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 10306 kB/s rd, 2187 kB/s wr, 498 op/s
2015-09-17 10:28:57.391893 mon.0 [INF] pgmap v73125321: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 14949 kB/s rd, 1815 kB/s wr, 1007 op/s
2015-09-17 10:28:58.497070 mon.0 [INF] pgmap v73125322: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 29024 kB/s rd, 9487 kB/s wr, 2754 op/s
2015-09-17 10:29:01.303379 mon.0 [INF] pgmap v73125323: 4096 pgs: 4096 active+clean; 3183 GB data, 6373 GB used, 2077 GB / 8451 GB avail; 13976 kB/s rd, 5324 kB/s wr, 1026 op/s
root@r-ch103:~# date; iostat -x -d 1
Thu Sep 17 10:27:56 CEST 2015
Linux 3.11.0-26-generic (r-ch103)     09/17/2015     _x86_64_    (12 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.01     5.63   12.52  108.78  1290.20  2868.91    68.58     0.48    3.99    0.81    4.35   0.18   2.12
sdb               0.01     1.42   11.32   80.89  1190.96  2186.45    73.25     0.97   10.51    6.67   11.04   0.62   5.69

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     2.00    0.00   25.00     0.00   675.50    54.04     0.00    0.16    0.00    0.16   0.16   0.40
sdb               0.00     0.00    0.00   10.00     0.00   280.00    56.00     0.01    1.20    0.00    1.20   1.20   1.20

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     5.00    0.00   28.00     0.00   438.50    31.32     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    1.00   14.00     4.00   260.00    35.20     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     5.00    1.00   27.00     4.00   651.50    46.82     0.00    0.14    0.00    0.15   0.14   0.40
sdb               0.00     0.00    1.00   18.00     4.00   344.00    36.63     0.01    0.42    0.00    0.44   0.42   0.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     6.00    0.00   22.00     0.00   292.50    26.59     0.00    0.18    0.00    0.18   0.18   0.40
sdb               0.00     0.00    0.00   19.00     0.00   396.00    41.68     0.00    0.21    0.00    0.21   0.21   0.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     2.00    0.00   12.00     0.00   953.00   158.83     0.01    1.00    0.00    1.00   0.67   0.80
sdb               0.00     0.00    0.00    7.00     0.00    60.00    17.14     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     3.00    0.00   27.00     0.00   349.00    25.85     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    9.00     0.00   108.00    24.00     0.00    0.44    0.00    0.44   0.44   0.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     4.00    3.00   23.00    12.00   463.00    36.54     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     3.00    4.00  226.00    16.00  8951.00    77.97     4.18   18.17    0.00   18.50   0.16   3.60

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     6.00    0.00   27.00     0.00   497.50    36.85     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    1.00   17.00     4.00   236.00    26.67     0.00    0.22    0.00    0.24   0.22   0.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     2.00    4.00   19.00    20.00  1493.50   131.61     0.02    0.87    1.00    0.84   0.52   1.20
sdb               0.00     0.00    0.00   18.00     0.00   240.00    26.67     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     2.00    0.00  160.00     0.00  4678.50    58.48     1.36    8.50    0.00    8.50   0.15   2.40
sdb               0.00     0.00    1.00   17.00     4.00   280.00    31.56     0.00    0.22    0.00    0.24   0.22   0.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     9.00    4.00   22.00    16.00   275.00    22.38     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    1.00   10.00     4.00   212.00    39.27     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     5.00    0.00   30.00     0.00  2044.00   136.27     0.02    0.53    0.00    0.53   0.40   1.20
sdb               0.00     0.00    0.00   16.00     0.00   300.00    37.50     0.00    0.25    0.00    0.25   0.25   0.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     5.00    4.00   40.00    20.00  1573.00    72.41     0.01    0.27    0.00    0.30   0.27   1.20
sdb               0.00     0.00    0.00   10.00     0.00   156.00    31.20     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     6.00    0.00   15.00     0.00   195.00    26.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    7.00     0.00   152.00    43.43     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     2.00    1.00   29.00     4.00   560.50    37.63     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    3.00   32.00    12.00  4428.00   253.71     0.10    2.97    1.33    3.12   0.69   2.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     2.00    0.00   18.00     0.00   264.00    29.33     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    5.00     0.00    72.00    28.80     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     7.00    0.00   28.00     0.00   597.00    42.64     0.00    0.14    0.00    0.14   0.14   0.40
sdb               0.00     2.00    0.00  155.00     0.00  5962.50    76.94     0.96    6.19    0.00    6.19   0.18   2.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     4.00    0.00   20.00     0.00   923.50    92.35     0.01    0.60    0.00    0.60   0.40   0.80
sdb               0.00     0.00    0.00    4.00     0.00    80.00    40.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     2.00    0.00   23.00     0.00   291.00    25.30     0.00    0.17    0.00    0.17   0.17   0.40
sdb               0.00     0.00    6.00   25.00   208.00   388.00    38.45     0.00    0.13    0.00    0.16   0.13   0.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     1.00    0.00  158.00     0.00  5063.00    64.09     1.63   10.30    0.00   10.30   0.15   2.40
sdb               0.00     0.00    0.00    3.00     0.00   112.00    74.67     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00    15.00    5.00  537.00    72.00  3420.50    12.89     2.24    4.13    0.80    4.16   0.05   2.80
sdb               0.00     0.00    2.00    4.00     8.00   112.00    40.00     0.00    0.67    0.00    1.00   0.67   0.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     6.00    1.00  100.00     4.00  2567.50    50.92     0.01    0.12    0.00    0.12   0.08   0.80
sdb               0.00     0.00    0.00   33.00     0.00   360.00    21.82     0.01    0.24    0.00    0.24   0.24   0.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     4.00    0.00   32.00     0.00   553.50    34.59     0.00    0.12    0.00    0.12   0.12   0.40
sdb               0.00     0.00    0.00   17.00     0.00   272.00    32.00     0.00    0.24    0.00    0.24   0.24   0.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     5.00    0.00   28.00     0.00   375.50    26.82     0.00    0.14    0.00    0.14   0.14   0.40
sdb               0.00     0.00    0.00    9.00     0.00   108.00    24.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     2.00    0.00   33.00     0.00   689.00    41.76     0.01    0.36    0.00    0.36   0.36   1.20
sdb               0.00     0.00    0.00   11.00     0.00    96.00    17.45     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     2.00    5.00   13.00    20.00   201.00    24.56     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    6.00     0.00   220.00    73.33     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     6.00    2.00   25.00     8.00   421.00    31.78     0.01    0.30    0.00    0.32   0.30   0.80
sdb               0.00     1.00    0.00  147.00     0.00  2438.00    33.17     0.49    3.32    0.00    3.32   0.08   1.20

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     7.00    0.00   31.00     0.00   642.00    41.42     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00   25.00     0.00   320.00    25.60     0.01    0.32    0.00    0.32   0.32   0.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     2.00    0.00   13.00     0.00  1035.00   159.23     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    8.00     0.00   320.00    80.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     4.00    0.00  265.00     0.00  5425.00    40.94     2.45    9.24    0.00    9.24   0.11   2.80
sdb               0.00    18.00    0.00  346.00     0.00  2148.00    12.42     1.40    4.03    0.00    4.03   0.05   1.60

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     2.00    0.00   17.00     0.00   377.00    44.35     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    8.00     0.00   328.00    82.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     6.00    1.00   19.00     4.00   188.50    19.25     0.00    0.20    0.00    0.21   0.20   0.40
sdb               0.00     0.00    2.00    5.00     8.00    40.00    13.71     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     6.00    0.00   26.00     0.00   405.00    31.15     0.00    0.15    0.00    0.15   0.15   0.40
sdb               0.00     0.00    0.00    7.00     0.00    56.00    16.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     5.00    1.00   20.00     4.00   292.00    28.19     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00   16.00     0.00   320.00    40.00     0.00    0.25    0.00    0.25   0.25   0.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     2.00    0.00   51.00     0.00   791.50    31.04     0.00    0.08    0.00    0.08   0.08   0.40
sdb               0.00     0.00    0.00   21.00     0.00   288.00    27.43     0.00    0.19    0.00    0.19   0.19   0.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     2.00    0.00   36.00     0.00   626.00    34.78     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    4.00   36.00    16.00  1832.00    92.40     0.00    0.10    0.00    0.11   0.10   0.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     6.00    0.00   21.00     0.00   329.00    31.33     0.00    0.19    0.00    0.19   0.19   0.40
sdb               0.00     2.00    0.00  221.00     0.00  3095.00    28.01     1.44    6.53    0.00    6.53   0.07   1.60

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     7.00    0.00   15.00     0.00   367.50    49.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     3.00    0.00   17.00     0.00  1850.50   217.71     0.01    0.47    0.00    0.47   0.24   0.40
sdb               0.00     0.00    0.00   15.00     0.00   128.00    17.07     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     1.00    0.00  188.00     0.00  4645.50    49.42     1.26    6.68    0.00    6.68   0.11   2.00
sdb               0.00     0.00    2.00   10.00     8.00   268.00    46.00     0.01    0.67    2.00    0.40   0.67   0.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     3.00    0.00   14.00     0.00   333.50    47.64     0.01    0.86    0.00    0.86   0.86   1.20
sdb               0.00     0.00    0.00    6.00     0.00   224.00    74.67     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     6.00    0.00   31.00     0.00   433.00    27.94     0.00    0.13    0.00    0.13   0.13   0.40
sdb               0.00     0.00    0.00   15.00     0.00   236.00    31.47     0.00    0.27    0.00    0.27   0.27   0.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     7.00    0.00   23.00     0.00   369.00    32.09     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00   11.00     0.00   308.00    56.00     0.01    0.73    0.00    0.73   0.73   0.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     4.00   20.00   20.00   204.00   237.00    22.05     0.71   17.80    1.00   34.60   7.50  30.00
sdb               0.00     0.00    0.00   10.00     0.00   228.00    45.60     0.00    0.40    0.00    0.40   0.40   0.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     2.00    0.00   19.00     0.00   347.00    36.53     0.00    0.21    0.00    0.21   0.21   0.40
sdb               0.00     0.00    0.00   10.00     0.00   124.00    24.80     0.01    0.80    0.00    0.80   0.80   0.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     2.00    0.00   17.00     0.00   340.50    40.06     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00   10.00     0.00   164.00    32.80     0.01    0.80    0.00    0.80   0.80   0.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     4.00    0.00   30.00     0.00   625.50    41.70     0.01    0.27    0.00    0.27   0.27   0.80
sdb               0.00     2.00    0.00  124.00     0.00  1964.50    31.69     0.48    3.84    0.00    3.84   0.16   2.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00    17.00    0.00   28.00     0.00   497.50    35.54     0.00    0.14    0.00    0.14   0.14   0.40
sdb               0.00     0.00    0.00   13.00     0.00   648.00    99.69     0.01    0.62    0.00    0.62   0.62   0.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     2.00    0.00   16.00     0.00   337.00    42.12     0.00    0.25    0.00    0.25   0.25   0.40
sdb               0.00     0.00    0.00    8.00     0.00    72.00    18.00     0.01    1.00    0.00    1.00   1.00   0.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     6.00    1.00  170.00     8.00  2876.00    33.73     0.92    5.36    0.00    5.39   0.07   1.20
sdb               0.00     0.00    0.00    3.00     0.00    96.00    64.00     0.00    1.33    0.00    1.33   1.33   0.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00    12.00    0.00  490.00     0.00  3305.00    13.49     1.79    3.66    0.00    3.66   0.03   1.60
sdb               0.00     0.00    5.00   34.00    28.00  1512.00    78.97     0.05    1.23    2.40    1.06   1.23   4.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     5.00    0.00   57.00     0.00  1030.00    36.14     0.01    0.14    0.00    0.14   0.14   0.80
sdb               0.00     0.00    0.00  118.00     0.00  1592.00    26.98     0.10    0.85    0.00    0.85   0.85  10.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     4.00    0.00   21.00     0.00   362.00    34.48     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    1.00    6.00     4.00    84.00    25.14     0.00    0.57    0.00    0.67   0.57   0.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     8.00    0.00   20.00     0.00  2710.50   271.05     0.04    1.80    0.00    1.80   0.60   1.20
sdb               0.00     0.00    0.00    6.00     0.00    88.00    29.33     0.01    1.33    0.00    1.33   1.33   0.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00  119.00     0.00  1528.00    25.68     0.01    0.07    0.00    0.07   0.07   0.80
sdb               0.00     0.00  282.00   79.00  1520.00  1400.00    16.18     0.34    0.94    0.99    0.76   0.93  33.60

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     2.00    0.00   65.00     0.00   762.50    23.46     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00  143.00     0.00  6166.50    86.24     0.26    1.85    0.00    1.85   0.64   9.20

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     6.00    0.00  256.00     0.00  6303.00    49.24     0.08    0.30    0.00    0.30   0.11   2.80
sdb               0.00     3.00    0.00  796.00     0.00 11837.00    29.74    17.36   21.80    0.00   21.80   0.31  24.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     5.00    0.00   84.00     0.00  4876.50   116.11     0.24    2.90    0.00    2.90   0.33   2.80
sdb               0.00     0.00    0.00   10.00     0.00   128.00    25.60     0.01    1.20    0.00    1.20   1.20   1.20

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     4.00    0.00  171.00     0.00  1481.00    17.32     0.10    0.56    0.00    0.56   0.09   1.60
sdb               0.00     0.00    0.00   36.00     0.00   572.00    31.78     0.02    0.56    0.00    0.56   0.56   2.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     9.00    0.00  505.00     0.00 16820.50    66.62     4.22    8.36    0.00    8.36   0.15   7.60
sdb               0.00    33.00    0.00  413.00     0.00  2717.50    13.16     2.74    6.25    0.00    6.25   0.22   9.20

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     5.00    3.00  143.00    16.00  2515.00    34.67     0.01    0.08    0.00    0.08   0.08   1.20
sdb               0.00     0.00    0.00   52.00     0.00   424.00    16.31     0.05    4.15    0.00    4.15   0.15   0.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     4.00    1.00  168.00     4.00  1781.00    21.12     0.00    0.02    0.00    0.02   0.02   0.40
sdb               0.00     0.00    0.00   31.00     0.00   348.00    22.45     0.03    1.03    0.00    1.03   1.03   3.20

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00    30.00    0.00  217.00     0.00 46273.00   426.48     0.93    4.28    0.00    4.28   0.87  18.80
sdb               0.00     0.00    0.00  124.00     0.00  5804.00    93.61     0.20    1.58    0.00    1.58   0.87  10.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     2.00    0.00   25.00     0.00   358.00    28.64     0.01    0.32    0.00    0.32   0.32   0.80
sdb               0.00     0.00    1.00   45.00     4.00   612.00    26.78     0.04    0.78    0.00    0.80   0.78   3.60
Actions #2

Updated by Corin Langosch over 8 years ago

As discussed with sjusthm I ran the following commands. It's strange, but this time "ceph -w" didn't show any warnings, but the "rbd info" command was as slow as before (around 40s). The logs of osd.3 are here http://www.netskin.com/system/ceph-osd.3.log.gz

root@r-ch104:~# ceph tell osd.3 injectargs --debug-osd 20 --debug-filestore 20 --debug-ms 1
debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1

root@r-ch104:~# date; rbd info numatrix-ssd/3e59ad5a-6bcb-4679-9f92-1f7c107f7f40; date
Thu Sep 17 20:12:34 CEST 2015
rbd image '3e59ad5a-6bcb-4679-9f92-1f7c107f7f40':
  size 10240 MB in 2560 objects
  order 22 (4096 kB objects)
  block_name_prefix: rbd_data.193202eb141f2
  format: 2
  features: layering, striping
  flags:
  stripe unit: 4096 kB
  stripe count: 1
Thu Sep 17 20:13:15 CEST 2015

root@r-ch104:~# ceph tell osd.3 injectargs --debug-osd 0 --debug-filestore 1 --debug-ms 0
debug_osd=0/0 debug_filestore=1/1 debug_ms=0/0
Actions #3

Updated by Corin Langosch over 8 years ago

I deleted the logs of osd.3, restarted it and tried the "rbd info" again. Same as before:

2015-09-17 20:41:05.136212 7f6336f48780  0 osd.3 21077 crush map has features 33816576, adjusting msgr requires for clients
2015-09-17 20:41:05.136230 7f6336f48780  0 osd.3 21077 crush map has features 33816576 was 8705, adjusting msgr requires for mons
2015-09-17 20:41:05.136241 7f6336f48780  0 osd.3 21077 crush map has features 33816576, adjusting msgr requires for osds
2015-09-17 20:41:05.136273 7f6336f48780  0 osd.3 21077 load_pgs
2015-09-17 20:41:08.411932 7f6336f48780  0 osd.3 21077 load_pgs opened 869 pgs
2015-09-17 20:41:08.415196 7f6336f48780 -1 osd.3 21077 log_to_monitors {default=true}
2015-09-17 20:41:08.419860 7f632012f700  0 osd.3 21077 ignoring osdmap until we have initialized
2015-09-17 20:41:08.420028 7f632012f700  0 osd.3 21077 ignoring osdmap until we have initialized
2015-09-17 20:41:08.521550 7f6336f48780  0 osd.3 21077 done with init, starting boot process
2015-09-17 20:41:12.523118 7f6301283700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.5:6802/19380 pipe(0x14f97000 sd=245 :6806 s=0 pgs=0 cs=0 l=0 c=0x14fa0000).accept connect_seq 0 vs existing 0 state wait
2015-09-17 20:41:51.126226 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:51.126261 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:51.895525 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:51.895562 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:51.947040 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:51.947090 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:52.447481 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:52.448064 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:52.996809 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:52.997145 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:53.240467 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:53.241384 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:54.435066 7f63333af700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:54.538215 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:54.538412 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:54.697703 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:54.697751 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:54.748404 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:54.748582 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:55.045169 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:55.045221 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:55.353572 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:55.353613 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:56.140959 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:56.141038 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:56.238829 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:56.238916 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:56.999183 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:56.999223 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:57.026940 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:57.026979 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:57.054248 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:57.054680 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:57.939558 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:57.939615 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:57.945518 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:57.946211 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:59.435244 7f63333af700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:59.449118 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:59.449238 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:59.641858 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:41:59.641886 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:00.846524 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:00.846572 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:01.347313 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:01.347353 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:02.300370 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:02.300441 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:02.328337 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:02.328414 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:02.955103 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:02.955132 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:03.240421 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:03.241690 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:03.741095 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:03.741345 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:03.742310 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:03.742769 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:04.149846 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:04.149909 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:04.435410 7f63333af700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:04.842234 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:04.842283 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:05.443214 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:05.443262 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:05.856205 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:05.856257 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:06.048227 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:06.048493 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:06.401338 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:06.401473 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:06.430464 7f632e2b8700  0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 30.768873 secs
2015-09-17 20:42:06.430479 7f632e2b8700  0 log_channel(cluster) log [WRN] : slow request 30.768873 seconds old, received at 2015-09-17 20:41:35.661425: osd_op(client.2055025.0:5 rbd_header.193202eb141f2 [call rbd.get_stripe_unit_count] 5.168f9a14 ack+read+known_if_redirected e21081) currently started
2015-09-17 20:42:07.029628 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:07.029666 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:08.250748 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:08.251164 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:08.342858 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:08.343185 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:09.301827 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:09.301864 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:09.351381 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:09.351431 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:09.435574 7f63333af700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:09.544204 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:09.544354 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:09.957232 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:09.957319 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:10.149038 7f631c127700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:10.149318 7f631a924700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:10.369577 7f6308bf5700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6801/25510 pipe(0x13263000 sd=59 :58949 s=2 pgs=55 cs=1 l=0 c=0x132346e0).fault with nothing to send, going to standby
2015-09-17 20:42:10.370203 7f6308ff9700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.6:6805/29196 pipe(0x13239000 sd=44 :42173 s=2 pgs=57 cs=1 l=0 c=0x13234160).fault with nothing to send, going to standby
2015-09-17 20:42:10.371043 7f6308df7700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6805/26244 pipe(0x13267500 sd=73 :47201 s=2 pgs=35 cs=1 l=0 c=0x13234840).fault with nothing to send, going to standby
2015-09-17 20:42:10.371395 7f6308ef8700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6801/25818 pipe(0x1320e500 sd=43 :45630 s=2 pgs=40 cs=1 l=0 c=0x13234000).fault with nothing to send, going to standby
2015-09-17 20:42:10.372003 7f63089f3700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.6:6801/26538 pipe(0x1323d500 sd=45 :49919 s=2 pgs=49 cs=1 l=0 c=0x132342c0).fault with nothing to send, going to standby
2015-09-17 20:42:10.372302 7f6308af4700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6805/21465 pipe(0x13250500 sd=53 :43745 s=2 pgs=11 cs=1 l=0 c=0x13234580).fault with nothing to send, going to standby
2015-09-17 20:42:10.373347 7f6308cf6700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.4:6808/10767 pipe(0x1324c000 sd=49 :50745 s=2 pgs=31 cs=1 l=0 c=0x13234420).fault with nothing to send, going to standby
2015-09-17 20:42:10.492426 7f6301283700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.5:6802/19380 pipe(0x14f97000 sd=245 :6806 s=2 pgs=46 cs=1 l=0 c=0x14fa0160).fault with nothing to send, going to standby
2015-09-17 20:42:11.076474 7f6313115700  0 -- 10.0.0.5:6805/19649 submit_message osd_op_reply(5 rbd_header.193202eb141f2 [call rbd.get_stripe_unit_count] v0'0 uv11442 ondisk = 0) v6 remote, 10.0.0.6:0/1007094, failed lossy con, dropping message 0x14469b80
2015-09-17 20:42:11.076565 7f6313115700  1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15
2015-09-17 20:42:11.080919 7f63092fc700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6805/21465 pipe(0x13250500 sd=53 :43787 s=1 pgs=11 cs=2 l=0 c=0x13234580).connect got RESETSESSION
2015-09-17 20:42:11.082839 7f6309700700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6801/25818 pipe(0x1320e500 sd=50 :45675 s=1 pgs=40 cs=2 l=0 c=0x13234000).connect got RESETSESSION
2015-09-17 20:42:11.084617 7f63090fa700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6805/26244 pipe(0x13267500 sd=63 :47243 s=1 pgs=35 cs=2 l=0 c=0x13234840).connect got RESETSESSION
2015-09-17 20:42:11.085039 7f63094fe700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.6:6801/26538 pipe(0x1323d500 sd=45 :49964 s=1 pgs=49 cs=2 l=0 c=0x132342c0).connect got RESETSESSION
2015-09-17 20:42:11.085723 7f63093fd700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.4:6808/10767 pipe(0x1324c000 sd=49 :50791 s=1 pgs=31 cs=2 l=0 c=0x13234420).connect got RESETSESSION
2015-09-17 20:42:11.087806 7f63095ff700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.6:6805/29196 pipe(0x13239000 sd=44 :42223 s=1 pgs=57 cs=2 l=0 c=0x13234160).connect got RESETSESSION
2015-09-17 20:42:11.089523 7f63091fb700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6801/25510 pipe(0x13263000 sd=59 :58995 s=1 pgs=55 cs=2 l=0 c=0x132346e0).connect got RESETSESSION
2015-09-17 20:42:11.090641 7f6308cf6700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6801/25818 pipe(0x1320e500 sd=50 :45675 s=2 pgs=41 cs=1 l=0 c=0x13234000).fault, initiating reconnect
2015-09-17 20:42:11.090660 7f6308af4700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6805/21465 pipe(0x13250500 sd=53 :43787 s=2 pgs=12 cs=1 l=0 c=0x13234580).fault, initiating reconnect
2015-09-17 20:42:11.091143 7f6308ef8700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6801/25510 pipe(0x13263000 sd=59 :58995 s=2 pgs=56 cs=1 l=0 c=0x132346e0).fault, initiating reconnect
2015-09-17 20:42:11.091180 7f6308df7700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6805/26244 pipe(0x13267500 sd=63 :47243 s=2 pgs=36 cs=1 l=0 c=0x13234840).fault, initiating reconnect
2015-09-17 20:42:11.091480 7f6308ff9700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.6:6805/29196 pipe(0x13239000 sd=44 :42223 s=2 pgs=58 cs=1 l=0 c=0x13234160).fault, initiating reconnect
2015-09-17 20:42:11.091687 7f6309700700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6801/25818 pipe(0x1320e500 sd=53 :45683 s=1 pgs=41 cs=2 l=0 c=0x13234000).connect got RESETSESSION
2015-09-17 20:42:11.091754 7f6308bf5700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.4:6808/10767 pipe(0x1324c000 sd=49 :50791 s=2 pgs=32 cs=1 l=0 c=0x13234420).fault, initiating reconnect
2015-09-17 20:42:11.091980 7f63092fc700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6805/21465 pipe(0x13250500 sd=50 :43793 s=1 pgs=12 cs=2 l=0 c=0x13234580).connect got RESETSESSION
2015-09-17 20:42:11.092402 7f63091fb700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6801/25510 pipe(0x13263000 sd=59 :58998 s=1 pgs=56 cs=2 l=0 c=0x132346e0).connect got RESETSESSION
2015-09-17 20:42:11.092477 7f63095ff700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.6:6805/29196 pipe(0x13239000 sd=44 :42229 s=1 pgs=58 cs=2 l=0 c=0x13234160).connect got RESETSESSION
2015-09-17 20:42:11.092566 7f63090fa700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6805/26244 pipe(0x13267500 sd=63 :47250 s=1 pgs=36 cs=2 l=0 c=0x13234840).connect got RESETSESSION
2015-09-17 20:42:11.094263 7f63093fd700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.4:6808/10767 pipe(0x1324c000 sd=49 :50799 s=1 pgs=32 cs=2 l=0 c=0x13234420).connect got RESETSESSION
2015-09-17 20:42:11.095850 7f6308af4700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6805/21465 pipe(0x13250500 sd=50 :43793 s=2 pgs=13 cs=1 l=0 c=0x13234580).fault, initiating reconnect
2015-09-17 20:42:11.095890 7f6308cf6700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6801/25818 pipe(0x1320e500 sd=53 :45683 s=2 pgs=42 cs=1 l=0 c=0x13234000).fault, initiating reconnect
2015-09-17 20:42:11.095972 7f6308df7700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6805/26244 pipe(0x13267500 sd=63 :47250 s=2 pgs=37 cs=1 l=0 c=0x13234840).fault, initiating reconnect
2015-09-17 20:42:11.096604 7f6308bf5700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.4:6808/10767 pipe(0x1324c000 sd=49 :50799 s=2 pgs=33 cs=1 l=0 c=0x13234420).fault, initiating reconnect
2015-09-17 20:42:11.096949 7f6308ff9700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.6:6805/29196 pipe(0x13239000 sd=44 :42229 s=2 pgs=59 cs=1 l=0 c=0x13234160).fault, initiating reconnect
2015-09-17 20:42:11.097030 7f63090fa700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6805/26244 pipe(0x13267500 sd=63 :47255 s=1 pgs=37 cs=2 l=0 c=0x13234840).connect got RESETSESSION
2015-09-17 20:42:11.097242 7f63092fc700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6805/21465 pipe(0x13250500 sd=50 :43800 s=1 pgs=13 cs=2 l=0 c=0x13234580).connect got RESETSESSION
2015-09-17 20:42:11.097282 7f6308ef8700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6801/25510 pipe(0x13263000 sd=59 :58998 s=2 pgs=57 cs=1 l=0 c=0x132346e0).fault, initiating reconnect
2015-09-17 20:42:11.097519 7f6309700700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6801/25818 pipe(0x1320e500 sd=53 :45688 s=1 pgs=42 cs=2 l=0 c=0x13234000).connect got RESETSESSION
2015-09-17 20:42:11.098212 7f6308df7700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6805/26244 pipe(0x13267500 sd=63 :47255 s=2 pgs=38 cs=1 l=0 c=0x13234840).fault, initiating reconnect
2015-09-17 20:42:11.098445 7f63095ff700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.6:6805/29196 pipe(0x13239000 sd=44 :42235 s=1 pgs=59 cs=2 l=0 c=0x13234160).connect got RESETSESSION
2015-09-17 20:42:11.098499 7f63091fb700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6801/25510 pipe(0x13263000 sd=59 :59007 s=1 pgs=57 cs=2 l=0 c=0x132346e0).connect got RESETSESSION
2015-09-17 20:42:11.098728 7f6308cf6700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6801/25818 pipe(0x1320e500 sd=53 :45688 s=2 pgs=43 cs=1 l=0 c=0x13234000).fault, initiating reconnect
2015-09-17 20:42:11.098796 7f63093fd700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.4:6808/10767 pipe(0x1324c000 sd=49 :50803 s=1 pgs=33 cs=2 l=0 c=0x13234420).connect got RESETSESSION
2015-09-17 20:42:11.099418 7f63090fa700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6805/26244 pipe(0x13267500 sd=63 :47259 s=1 pgs=38 cs=2 l=0 c=0x13234840).connect got RESETSESSION
2015-09-17 20:42:11.099508 7f6309700700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6801/25818 pipe(0x1320e500 sd=53 :45695 s=1 pgs=43 cs=2 l=0 c=0x13234000).connect got RESETSESSION
2015-09-17 20:42:11.099609 7f6308af4700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6805/21465 pipe(0x13250500 sd=50 :43800 s=2 pgs=14 cs=1 l=0 c=0x13234580).fault, initiating reconnect
2015-09-17 20:42:11.099985 7f6308ff9700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.6:6805/29196 pipe(0x13239000 sd=44 :42235 s=2 pgs=60 cs=1 l=0 c=0x13234160).fault, initiating reconnect
2015-09-17 20:42:11.100254 7f6308ef8700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6801/25510 pipe(0x13263000 sd=59 :59007 s=2 pgs=58 cs=1 l=0 c=0x132346e0).fault, initiating reconnect
2015-09-17 20:42:11.100450 7f6308df7700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6805/26244 pipe(0x13267500 sd=63 :47259 s=2 pgs=39 cs=1 l=0 c=0x13234840).fault, initiating reconnect
2015-09-17 20:42:11.100674 7f6308bf5700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.4:6808/10767 pipe(0x1324c000 sd=49 :50803 s=2 pgs=34 cs=1 l=0 c=0x13234420).fault, initiating reconnect
2015-09-17 20:42:11.101017 7f63092fc700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6805/21465 pipe(0x13250500 sd=50 :43807 s=1 pgs=14 cs=2 l=0 c=0x13234580).connect got RESETSESSION
2015-09-17 20:42:11.101273 7f63095ff700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.6:6805/29196 pipe(0x13239000 sd=44 :42240 s=1 pgs=60 cs=2 l=0 c=0x13234160).connect got RESETSESSION
2015-09-17 20:42:11.101517 7f63091fb700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6801/25510 pipe(0x13263000 sd=59 :59012 s=1 pgs=58 cs=2 l=0 c=0x132346e0).connect got RESETSESSION
2015-09-17 20:42:11.101671 7f6308cf6700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6801/25818 pipe(0x1320e500 sd=53 :45695 s=2 pgs=44 cs=1 l=0 c=0x13234000).fault, initiating reconnect
2015-09-17 20:42:11.101782 7f63090fa700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6805/26244 pipe(0x13267500 sd=113 :47265 s=1 pgs=39 cs=2 l=0 c=0x13234840).connect got RESETSESSION
2015-09-17 20:42:11.102423 7f6308af4700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6805/21465 pipe(0x13250500 sd=50 :43807 s=2 pgs=15 cs=1 l=0 c=0x13234580).fault, initiating reconnect
2015-09-17 20:42:11.102616 7f6308df7700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6805/26244 pipe(0x13267500 sd=113 :47265 s=2 pgs=40 cs=1 l=0 c=0x13234840).fault, initiating reconnect
2015-09-17 20:42:11.102840 7f6309700700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6801/25818 pipe(0x1320e500 sd=53 :45701 s=1 pgs=44 cs=2 l=0 c=0x13234000).connect got RESETSESSION
2015-09-17 20:42:11.102836 7f63093fd700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.4:6808/10767 pipe(0x1324c000 sd=63 :50811 s=1 pgs=34 cs=2 l=0 c=0x13234420).connect got RESETSESSION
2015-09-17 20:42:11.103709 7f63092fc700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6805/21465 pipe(0x13250500 sd=50 :43813 s=1 pgs=15 cs=2 l=0 c=0x13234580).connect got RESETSESSION
2015-09-17 20:42:11.103724 7f63090fa700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6805/26244 pipe(0x13267500 sd=113 :47268 s=1 pgs=40 cs=2 l=0 c=0x13234840).connect got RESETSESSION
2015-09-17 20:42:11.104743 7f6308df7700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6805/26244 pipe(0x13267500 sd=113 :47268 s=2 pgs=41 cs=1 l=0 c=0x13234840).fault, initiating reconnect
2015-09-17 20:42:11.105024 7f6308af4700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6805/21465 pipe(0x13250500 sd=50 :43813 s=2 pgs=16 cs=1 l=0 c=0x13234580).fault, initiating reconnect
2015-09-17 20:42:11.105291 7f6308bf5700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.4:6808/10767 pipe(0x1324c000 sd=63 :50811 s=2 pgs=35 cs=1 l=0 c=0x13234420).fault, initiating reconnect
2015-09-17 20:42:11.105701 7f6308cf6700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6801/25818 pipe(0x1320e500 sd=53 :45701 s=2 pgs=45 cs=1 l=0 c=0x13234000).fault, initiating reconnect
2015-09-17 20:42:11.105956 7f63090fa700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6805/26244 pipe(0x13267500 sd=113 :47269 s=1 pgs=41 cs=2 l=0 c=0x13234840).connect got RESETSESSION
2015-09-17 20:42:11.106301 7f63092fc700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6805/21465 pipe(0x13250500 sd=50 :43816 s=1 pgs=16 cs=2 l=0 c=0x13234580).connect got RESETSESSION
2015-09-17 20:42:11.106992 7f6309700700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6801/25818 pipe(0x1320e500 sd=53 :45707 s=1 pgs=45 cs=2 l=0 c=0x13234000).connect got RESETSESSION
2015-09-17 20:42:11.107245 7f6308df7700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6805/26244 pipe(0x13267500 sd=113 :47269 s=2 pgs=42 cs=1 l=0 c=0x13234840).fault, initiating reconnect
2015-09-17 20:42:11.107467 7f63093fd700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.4:6808/10767 pipe(0x1324c000 sd=63 :50818 s=1 pgs=35 cs=2 l=0 c=0x13234420).connect got RESETSESSION
2015-09-17 20:42:11.108005 7f6308af4700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6805/21465 pipe(0x13250500 sd=50 :43816 s=2 pgs=17 cs=1 l=0 c=0x13234580).fault, initiating reconnect
2015-09-17 20:42:11.108262 7f6308cf6700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6801/25818 pipe(0x1320e500 sd=53 :45707 s=2 pgs=46 cs=1 l=0 c=0x13234000).fault, initiating reconnect
2015-09-17 20:42:11.109226 7f6309700700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6801/25818 pipe(0x1320e500 sd=53 :45710 s=1 pgs=46 cs=2 l=0 c=0x13234000).connect got RESETSESSION
2015-09-17 20:42:11.109574 7f63090fa700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6805/26244 pipe(0x13267500 sd=113 :47273 s=1 pgs=42 cs=2 l=0 c=0x13234840).connect got RESETSESSION
2015-09-17 20:42:11.109840 7f63092fc700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6805/21465 pipe(0x13250500 sd=50 :43820 s=1 pgs=17 cs=2 l=0 c=0x13234580).connect got RESETSESSION
2015-09-17 20:42:11.110056 7f6308bf5700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.4:6808/10767 pipe(0x1324c000 sd=63 :50818 s=2 pgs=36 cs=1 l=0 c=0x13234420).fault, initiating reconnect
2015-09-17 20:42:11.110435 7f6308cf6700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6801/25818 pipe(0x1320e500 sd=53 :45710 s=2 pgs=47 cs=1 l=0 c=0x13234000).fault, initiating reconnect
2015-09-17 20:42:11.110939 7f6308df7700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6805/26244 pipe(0x13267500 sd=113 :47273 s=2 pgs=43 cs=1 l=0 c=0x13234840).fault, initiating reconnect
2015-09-17 20:42:11.111456 7f6308af4700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6805/21465 pipe(0x13250500 sd=50 :43820 s=2 pgs=18 cs=1 l=0 c=0x13234580).fault, initiating reconnect
2015-09-17 20:42:11.111585 7f6309700700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6801/25818 pipe(0x1320e500 sd=53 :45712 s=1 pgs=47 cs=2 l=0 c=0x13234000).connect got RESETSESSION
2015-09-17 20:42:11.112296 7f63090fa700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6805/26244 pipe(0x13267500 sd=113 :47278 s=1 pgs=43 cs=2 l=0 c=0x13234840).connect got RESETSESSION
2015-09-17 20:42:11.112623 7f63093fd700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.4:6808/10767 pipe(0x1324c000 sd=63 :50823 s=1 pgs=36 cs=2 l=0 c=0x13234420).connect got RESETSESSION
2015-09-17 20:42:11.112650 7f6308cf6700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6801/25818 pipe(0x1320e500 sd=53 :45712 s=2 pgs=48 cs=1 l=0 c=0x13234000).fault, initiating reconnect
2015-09-17 20:42:11.113552 7f63092fc700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6805/21465 pipe(0x13250500 sd=50 :43825 s=1 pgs=18 cs=2 l=0 c=0x13234580).connect got RESETSESSION
2015-09-17 20:42:11.113716 7f6309700700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6801/25818 pipe(0x1320e500 sd=53 :45715 s=1 pgs=48 cs=2 l=0 c=0x13234000).connect got RESETSESSION
2015-09-17 20:42:11.114725 7f6308cf6700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6801/25818 pipe(0x1320e500 sd=53 :45715 s=2 pgs=49 cs=1 l=0 c=0x13234000).fault, initiating reconnect
2015-09-17 20:42:11.114905 7f6308af4700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6805/21465 pipe(0x13250500 sd=50 :43825 s=2 pgs=19 cs=1 l=0 c=0x13234580).fault, initiating reconnect
2015-09-17 20:42:11.115155 7f6308bf5700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.4:6808/10767 pipe(0x1324c000 sd=63 :50823 s=2 pgs=37 cs=1 l=0 c=0x13234420).fault, initiating reconnect
2015-09-17 20:42:11.115837 7f6309700700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.8:6801/25818 pipe(0x1320e500 sd=53 :45716 s=1 pgs=49 cs=2 l=0 c=0x13234000).connect got RESETSESSION
2015-09-17 20:42:11.116210 7f63092fc700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.7:6805/21465 pipe(0x13250500 sd=50 :43828 s=1 pgs=19 cs=2 l=0 c=0x13234580).connect got RESETSESSION
2015-09-17 20:42:11.117253 7f63093fd700  0 -- 10.0.0.5:6806/19649 >> 10.0.0.4:6808/10767 pipe(0x1324c000 sd=63 :50830 s=1 pgs=37 cs=2 l=0 c=0x13234420).connect got RESETSESSION
2015-09-17 20:42:13.260676 7f632012f700  0 log_channel(cluster) log [WRN] : map e21083 wrongly marked me down
2015-09-17 20:42:13.267702 7f6304dbe700  0 -- 10.0.0.5:0/19649 >> 10.0.0.7:6803/25510 pipe(0x13587500 sd=137 :45537 s=4 pgs=0 cs=0 l=1 c=0x13d97dc0).connect got RESETSESSION but no longer connecting
2015-09-17 20:42:14.484224 7f63039aa700  0 -- 10.0.0.5:6801/1019649 >> 10.0.0.5:6802/19380 pipe(0x13648000 sd=206 :6801 s=0 pgs=0 cs=0 l=0 c=0x1373b2c0).accept connect_seq 0 vs existing 0 state connecting
Actions #4

Updated by Corin Langosch over 8 years ago

I just deleted the logs of osd.3, restarted it and tried the "rbd info" again. This time I ran again these commands:

root@r-ch104:~# ceph tell osd.3 injectargs --debug-osd 20 --debug-filestore 20 --debug-ms 1
debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1 

root@r-ch104:~# date; rbd info numatrix-ssd/3e59ad5a-6bcb-4679-9f92-1f7c107f7f40; date
Thu Sep 17 20:46:21 CEST 2015
rbd image '3e59ad5a-6bcb-4679-9f92-1f7c107f7f40':
    size 10240 MB in 2560 objects
    order 22 (4096 kB objects)
    block_name_prefix: rbd_data.193202eb141f2
    format: 2
    features: layering, striping
    flags: 
    stripe unit: 4096 kB
    stripe count: 1
Thu Sep 17 20:47:01 CEST 2015

root@r-ch104:~# ceph tell osd.3 injectargs --debug-osd 0 --debug-filestore 1 --debug-ms 0
debug_osd=0/0 debug_filestore=1/1 debug_ms=0/0 

As slow as before, but again no "ceph -w" output as with the first run when debug output was enabled?! The log can be found here: http://www.netskin.com/system/ceph-osd-again.3.log.gz

Actions #5

Updated by Sage Weil over 8 years ago

  • Subject changed from blocked requests after upgrade firefly -> hammer to old leveldb is slow (firefly->hammer upgrade leads to slow requests)

2015-09-17 20:42:08.250748 7f631a924700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6313115700' had timed out after 15

suggests your disks are slow.. but not necessarily.

oh, as discussed on irc, your leveldb stores are very slow but it is unclear why. suggest recycling your osds.

Actions #6

Updated by Corin Langosch over 8 years ago

Yes I meanwhile, and now it seems fine. So it can be closed I guess.

Actions #7

Updated by Samuel Just over 8 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF