Bug #7765
Bogus bandwidth statistics during pg creation
0%
Description
Noticed this while growing the number of PGs in an existing pool on a cluster running firefly branch.
ceph version 0.77-904-g76beb5c (76beb5ce82d6e7c6bd6dd3b7f844851ad67db18d) root@gravel1:~# ceph -w cluster 1de82a3e-7ac7-4ca2-a123-95c21f525bfb health HEALTH_WARN 96 pgs stuck inactive; 96 pgs stuck unclean; pool data pg_num 736 > pgp_num 64; clock skew detected on mon.gravel2, mon.gravel3 monmap e1: 3 mons at {gravel1=192.168.18.1:6789/0,gravel2=192.168.18.2:6789/0,gravel3=192.168.18.3:6789/0}, election epoch 76, quorum 0,1,2 gravel1,gravel2,gravel3 mdsmap e37: 1/1/1 up {0=gravel1=up:active} osdmap e146: 3 osds: 3 up, 3 in pgmap v1600: 864 pgs, 3 pools, 320 GB data, 83652 objects 97840 MB used, 2682 GB / 2778 GB avail 96 creating 768 active+clean 2014-03-18 15:05:29.852257 mon.0 [INF] osdmap e146: 3 osds: 3 up, 3 in 2014-03-18 15:05:29.943042 mon.0 [INF] pgmap v1600: 864 pgs: 96 creating, 768 active+clean; 320 GB data, 97840 MB used, 2682 GB / 2778 GB avail 2014-03-18 15:05:58.695455 mon.0 [INF] pgmap v1601: 864 pgs: 96 creating, 34 peering, 734 active+clean; 320 GB data, 97835 MB used, 2682 GB / 2778 GB avail 2014-03-18 15:05:59.861263 mon.0 [INF] pgmap v1602: 864 pgs: 96 creating, 54 peering, 714 active+clean; 320 GB data, 97831 MB used, 2682 GB / 2778 GB avail 2014-03-18 15:06:03.895804 mon.0 [INF] pgmap v1603: 864 pgs: 95 creating, 55 peering, 714 active+clean; 320 GB data, 97832 MB used, 2682 GB / 2778 GB avail; 100417 kB/s wr, 27 op/s 2014-03-18 15:06:05.087930 mon.0 [INF] pgmap v1604: 864 pgs: 94 creating, 98 peering, 672 active+clean; 321 GB data, 97832 MB used, 2682 GB / 2778 GB avail; 199 MB/s wr, 56 op/s 2014-03-18 15:06:07.243323 mon.0 [INF] pgmap v1605: 864 pgs: 89 creating, 103 peering, 672 active+clean; 323 GB data, 97833 MB used, 2682 GB / 2778 GB avail; 970 MB/s wr, 276 op/s 2014-03-18 15:06:09.402235 mon.0 [INF] pgmap v1606: 864 pgs: 88 creating, 104 peering, 672 active+clean; 323 GB data, 97833 MB used, 2682 GB / 2778 GB avail; 755 MB/s wr, 216 op/s 2014-03-18 15:06:10.576723 mon.0 [INF] pgmap v1607: 864 pgs: 84 creating, 108 peering, 672 active+clean; 326 GB data, 97833 MB used, 2682 GB / 2778 GB avail; 908 MB/s wr, 254 op/s
This is on 3 1TB SATA OSDs, so the actual ~1GByte/s of writes reported is impossible. The disks are actually seeing ~150-200kB of writes per second during this period, and there is no significant traffic on frontend or backend networks.
I'm guessing this is an accounting issue, where the data being reassigned from one PG to another during splitting is being erroneously counted in the overall I/O figures. This is the first time I've noticed this, but I haven't compared the firefly code with anything older for this specific case.
As well as being misleading in itself, this output has a nasty side effect: it causes problems for anyone plotting charts of I/O, because once an invalid max value has appeared, it stretches the chart y axis out so that the real I/O rates later are much harder to see (e.g. it pushes my graph to a 0-1GB/s axis, while the valid data are only in the range 0-50MB/s so appear as a squiggle across the bottom - see attached).
History
#1 Updated by John Spray about 10 years ago
On the attachment, around 09:40 is some actual client I/O maxing out the cluster. Around 09:55 is where I start creating PGs: see that it spikes in a way that dwarfs the real throughput numbers.
The data plotted here is the sum of "num_write_kb" for all pools output in "ceph -f json-pretty pg dump pools". This corresponds with the "MB/s wr" numbers at end of the lines in "ceph -w" in the output pasted in the report.
#2 Updated by Samuel Just about 10 years ago
- Priority changed from Normal to High
#3 Updated by John Spray almost 10 years ago
- File Screen Shot 2014-05-14 at 16.37.56.png View added
Confirmed this is still an issue on master, completely breaks the calamari iops chart with the spike generated :-/
#4 Updated by Ian Colle almost 10 years ago
- Status changed from New to 12
#5 Updated by Samuel Just about 9 years ago
- Status changed from 12 to Can't reproduce