Project

General

Profile

Bug #7765

Bogus bandwidth statistics during pg creation

Added by John Spray about 10 years ago. Updated about 9 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Noticed this while growing the number of PGs in an existing pool on a cluster running firefly branch.

ceph version 0.77-904-g76beb5c (76beb5ce82d6e7c6bd6dd3b7f844851ad67db18d)
root@gravel1:~# ceph -w
    cluster 1de82a3e-7ac7-4ca2-a123-95c21f525bfb
     health HEALTH_WARN 96 pgs stuck inactive; 96 pgs stuck unclean; pool data pg_num 736 > pgp_num 64; clock skew detected on mon.gravel2, mon.gravel3
     monmap e1: 3 mons at {gravel1=192.168.18.1:6789/0,gravel2=192.168.18.2:6789/0,gravel3=192.168.18.3:6789/0}, election epoch 76, quorum 0,1,2 gravel1,gravel2,gravel3
     mdsmap e37: 1/1/1 up {0=gravel1=up:active}
     osdmap e146: 3 osds: 3 up, 3 in
      pgmap v1600: 864 pgs, 3 pools, 320 GB data, 83652 objects
            97840 MB used, 2682 GB / 2778 GB avail
                  96 creating
                 768 active+clean

2014-03-18 15:05:29.852257 mon.0 [INF] osdmap e146: 3 osds: 3 up, 3 in
2014-03-18 15:05:29.943042 mon.0 [INF] pgmap v1600: 864 pgs: 96 creating, 768 active+clean; 320 GB data, 97840 MB used, 2682 GB / 2778 GB avail
2014-03-18 15:05:58.695455 mon.0 [INF] pgmap v1601: 864 pgs: 96 creating, 34 peering, 734 active+clean; 320 GB data, 97835 MB used, 2682 GB / 2778 GB avail
2014-03-18 15:05:59.861263 mon.0 [INF] pgmap v1602: 864 pgs: 96 creating, 54 peering, 714 active+clean; 320 GB data, 97831 MB used, 2682 GB / 2778 GB avail
2014-03-18 15:06:03.895804 mon.0 [INF] pgmap v1603: 864 pgs: 95 creating, 55 peering, 714 active+clean; 320 GB data, 97832 MB used, 2682 GB / 2778 GB avail; 100417 kB/s wr, 27 op/s
2014-03-18 15:06:05.087930 mon.0 [INF] pgmap v1604: 864 pgs: 94 creating, 98 peering, 672 active+clean; 321 GB data, 97832 MB used, 2682 GB / 2778 GB avail; 199 MB/s wr, 56 op/s
2014-03-18 15:06:07.243323 mon.0 [INF] pgmap v1605: 864 pgs: 89 creating, 103 peering, 672 active+clean; 323 GB data, 97833 MB used, 2682 GB / 2778 GB avail; 970 MB/s wr, 276 op/s
2014-03-18 15:06:09.402235 mon.0 [INF] pgmap v1606: 864 pgs: 88 creating, 104 peering, 672 active+clean; 323 GB data, 97833 MB used, 2682 GB / 2778 GB avail; 755 MB/s wr, 216 op/s
2014-03-18 15:06:10.576723 mon.0 [INF] pgmap v1607: 864 pgs: 84 creating, 108 peering, 672 active+clean; 326 GB data, 97833 MB used, 2682 GB / 2778 GB avail; 908 MB/s wr, 254 op/s

This is on 3 1TB SATA OSDs, so the actual ~1GByte/s of writes reported is impossible. The disks are actually seeing ~150-200kB of writes per second during this period, and there is no significant traffic on frontend or backend networks.

I'm guessing this is an accounting issue, where the data being reassigned from one PG to another during splitting is being erroneously counted in the overall I/O figures. This is the first time I've noticed this, but I haven't compared the firefly code with anything older for this specific case.

As well as being misleading in itself, this output has a nasty side effect: it causes problems for anyone plotting charts of I/O, because once an invalid max value has appeared, it stretches the chart y axis out so that the real I/O rates later are much harder to see (e.g. it pushes my graph to a 0-1GB/s axis, while the valid data are only in the range 0-50MB/s so appear as a squiggle across the bottom - see attached).

Screen Shot 2014-03-18 at 15.11.25.png View (22.9 KB) John Spray, 03/18/2014 08:13 AM

Screen Shot 2014-05-14 at 16.37.56.png View (17.1 KB) John Spray, 05/14/2014 08:37 AM

History

#1 Updated by John Spray about 10 years ago

On the attachment, around 09:40 is some actual client I/O maxing out the cluster. Around 09:55 is where I start creating PGs: see that it spikes in a way that dwarfs the real throughput numbers.

The data plotted here is the sum of "num_write_kb" for all pools output in "ceph -f json-pretty pg dump pools". This corresponds with the "MB/s wr" numbers at end of the lines in "ceph -w" in the output pasted in the report.

#2 Updated by Samuel Just about 10 years ago

  • Priority changed from Normal to High

#3 Updated by John Spray almost 10 years ago

Confirmed this is still an issue on master, completely breaks the calamari iops chart with the spike generated :-/

#4 Updated by Ian Colle almost 10 years ago

  • Status changed from New to 12

#5 Updated by Samuel Just about 9 years ago

  • Status changed from 12 to Can't reproduce

Also available in: Atom PDF