Project

General

Profile

Actions

Bug #13939

closed

PGs stuck in "creating" when created just after deleting another pool

Added by John Spray over 8 years ago. Updated about 7 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.ceph.com/jspray-2015-12-01_03:22:35-fs:recovery-wip-testing-jcsp---basic-multi/1164775

The PG status is stuck at "24 pgs: 5 creating+peering, 7 creating, 12 active+clean; 0 bytes data, 60099 MB used, 1681 GB / 1832 GB avail"

The sequence of admin operations from audit log is:

2015-12-01 03:45:08.338782 mon.0 10.214.134.24:6789/0 144 : audit [INF] from='client.? 10.214.134.24:0/230392284' entity='client.manila' cmd=[{"prefix": "osd pool create", "pg_num": 2997, "pool": "fsvolume_volid", "format": "json"}]: dispatch
2015-12-01 03:45:08.569992 mon.0 10.214.134.24:6789/0 145 : audit [INF] from='client.? 10.214.134.24:0/230392284' entity='client.manila' cmd='[{"prefix": "osd pool create", "pg_num": 2997, "pool": "fsvolume_volid", "format": "json"}]': finished
2015-12-01 03:45:08.640734 mon.0 10.214.134.24:6789/0 149 : audit [INF] from='client.? 10.214.134.24:0/230392284' entity='client.manila' cmd=[{"prefix": "mds add_data_pool", "pool": "fsvolume_volid", "format": "json"}]: dispatch
2015-12-01 03:45:09.573157 mon.0 10.214.134.24:6789/0 151 : audit [INF] from='client.? 10.214.134.24:0/230392284' entity='client.manila' cmd='[{"prefix": "mds add_data_pool", "pool": "fsvolume_volid", "format": "json"}]': finished
2015-12-01 03:45:20.841502 mon.0 10.214.134.24:6789/0 159 : audit [INF] from='client.? 10.214.134.24:0/3993649872' entity='client.admin' cmd=[{"prefix": "mds cluster_down"}]: dispatch
2015-12-01 03:45:21.670012 mon.0 10.214.134.24:6789/0 160 : audit [INF] from='client.? 10.214.134.24:0/3993649872' entity='client.admin' cmd='[{"prefix": "mds cluster_down"}]': finished
2015-12-01 03:45:22.000448 mon.0 10.214.134.24:6789/0 162 : audit [INF] from='client.? 10.214.134.24:0/1610017905' entity='client.admin' cmd=[{"prefix": "mds fail", "who": "a"}]: dispatch
2015-12-01 03:45:22.007578 mon.0 10.214.134.24:6789/0 164 : audit [INF] from='client.? 10.214.134.24:0/3671013995' entity='client.admin' cmd=[{"prefix": "mds fail", "who": "b"}]: dispatch
2015-12-01 03:45:22.672821 mon.0 10.214.134.24:6789/0 168 : audit [INF] from='client.? 10.214.134.24:0/1610017905' entity='client.admin' cmd='[{"prefix": "mds fail", "who": "a"}]': finished
2015-12-01 03:45:22.672927 mon.0 10.214.134.24:6789/0 169 : audit [INF] from='client.? 10.214.134.24:0/3671013995' entity='client.admin' cmd='[{"prefix": "mds fail", "who": "b"}]': finished
2015-12-01 03:45:23.242781 mon.0 10.214.134.24:6789/0 172 : audit [INF] from='client.? 10.214.134.24:0/3315988170' entity='client.admin' cmd=[{"prefix": "fs rm", "sure": "--yes-i-really-mean-it", "fs_name": "default"}]: dispatch
2015-12-01 03:45:23.674214 mon.0 10.214.134.24:6789/0 173 : audit [INF] from='client.? 10.214.134.24:0/3315988170' entity='client.admin' cmd='[{"prefix": "fs rm", "sure": "--yes-i-really-mean-it", "fs_name": "default"}]': finished
2015-12-01 03:45:23.937452 mon.0 10.214.134.24:6789/0 175 : audit [INF] from='client.? 10.214.134.24:0/858233497' entity='client.admin' cmd=[{"pool2": "metadata", "prefix": "osd pool delete", "sure": "--yes-i-really-really-mean-it", "pool": "metadata"}]: dispatch
2015-12-01 03:45:24.679272 mon.0 10.214.134.24:6789/0 176 : audit [INF] from='client.? 10.214.134.24:0/858233497' entity='client.admin' cmd='[{"pool2": "metadata", "prefix": "osd pool delete", "sure": "--yes-i-really-really-mean-it", "pool": "metadata"}]': finished
2015-12-01 03:45:24.933647 mon.0 10.214.134.24:6789/0 179 : audit [INF] from='client.? 10.214.134.24:0/1223047985' entity='client.admin' cmd=[{"pool2": "data", "prefix": "osd pool delete", "sure": "--yes-i-really-really-mean-it", "pool": "data"}]: dispatch
2015-12-01 03:45:25.687323 mon.0 10.214.134.24:6789/0 180 : audit [INF] from='client.? 10.214.134.24:0/1223047985' entity='client.admin' cmd='[{"pool2": "data", "prefix": "osd pool delete", "sure": "--yes-i-really-really-mean-it", "pool": "data"}]': finished
2015-12-01 03:45:25.944247 mon.0 10.214.134.24:6789/0 183 : audit [INF] from='client.? 10.214.134.24:0/2892973683' entity='client.admin' cmd=[{"pool2": "fsvolume_volid", "prefix": "osd pool delete", "sure": "--yes-i-really-really-mean-it", "pool": "fsvolume_volid"}]: dispatch
2015-12-01 03:45:26.694666 mon.0 10.214.134.24:6789/0 184 : audit [INF] from='client.? 10.214.134.24:0/2892973683' entity='client.admin' cmd='[{"pool2": "fsvolume_volid", "prefix": "osd pool delete", "sure": "--yes-i-really-really-mean-it", "pool": "fsvolume_volid"}]': finished
2015-12-01 03:45:27.051385 mon.0 10.214.134.24:6789/0 187 : audit [INF] from='client.? 10.214.134.24:0/3168311384' entity='client.admin' cmd=[{"prefix": "osd pool create", "pg_num": 6, "pool": "metadata"}]: dispatch
2015-12-01 03:45:27.838922 mon.0 10.214.134.24:6789/0 188 : audit [INF] from='client.? 10.214.134.24:0/3168311384' entity='client.admin' cmd='[{"prefix": "osd pool create", "pg_num": 6, "pool": "metadata"}]': finished
2015-12-01 03:45:28.098285 mon.0 10.214.134.24:6789/0 191 : audit [INF] from='client.? 10.214.134.24:0/3452224738' entity='client.admin' cmd=[{"prefix": "osd pool create", "pg_num": 6, "pool": "data"}]: dispatch
2015-12-01 03:45:28.842732 mon.0 10.214.134.24:6789/0 192 : audit [INF] from='client.? 10.214.134.24:0/3452224738' entity='client.admin' cmd='[{"prefix": "osd pool create", "pg_num": 6, "pool": "data"}]': finished

The resulting cluster log is:

2015-12-01 03:45:02.235418 mon.0 10.214.134.24:6789/0 139 : cluster [INF] pgmap v26: 24 pgs: 24 active+clean; 4694 bytes data, 60047 MB used, 1681 GB / 1832 GB avail; 1603 B/s wr, 6 op/s
2015-12-01 03:45:08.631426 mon.0 10.214.134.24:6789/0 146 : cluster [INF] osdmap e16: 3 osds: 3 up, 3 in
2015-12-01 03:45:08.682657 mon.0 10.214.134.24:6789/0 150 : cluster [INF] pgmap v27: 3021 pgs: 2997 creating, 24 active+clean; 4694 bytes data, 60047 MB used, 1681 GB / 1832 GB avail; 461 B/s wr, 0 op/s
2015-12-01 03:45:09.573297 mon.0 10.214.134.24:6789/0 152 : cluster [INF] mdsmap e14: 1/1/1 up {0=b=up:active}, 1 up:standby
2015-12-01 03:45:11.753375 mon.0 10.214.134.24:6789/0 153 : cluster [INF] pgmap v28: 3021 pgs: 673 creating+peering, 2324 creating, 24 active+clean; 4694 bytes data, 60049 MB used, 1681 GB / 1832 GB avail
2015-12-01 03:45:12.748227 mon.0 10.214.134.24:6789/0 154 : cluster [INF] pgmap v29: 3021 pgs: 792 creating+peering, 2205 creating, 24 active+clean; 10196 bytes data, 60052 MB used, 1681 GB / 1832 GB avail; 1478 B/s wr, 0 op/s
2015-12-01 03:45:16.426829 mon.0 10.214.134.24:6789/0 156 : cluster [INF] pgmap v30: 3021 pgs: 1078 creating+peering, 1919 creating, 24 active+clean; 10196 bytes data, 60086 MB used, 1681 GB / 1832 GB avail; 1311 B/s wr, 0 op/s
2015-12-01 03:45:17.430091 mon.0 10.214.134.24:6789/0 157 : cluster [INF] pgmap v31: 3021 pgs: 1101 creating+peering, 1896 creating, 24 active+clean; 18567 bytes data, 60089 MB used, 1681 GB / 1832 GB avail; 2406 B/s wr, 0 op/s
2015-12-01 03:45:21.670090 mon.0 10.214.134.24:6789/0 161 : cluster [INF] mdsmap e15: 1/1/1 up {0=b=up:active}, 1 up:standby
2015-12-01 03:45:22.007443 mon.0 10.214.134.24:6789/0 163 : cluster [INF] osdmap e17: 3 osds: 3 up, 3 in
2015-12-01 03:45:22.008114 mon.0 10.214.134.24:6789/0 165 : cluster [INF] pgmap v32: 3021 pgs: 1101 creating+peering, 1896 creating, 24 active+clean; 18567 bytes data, 60089 MB used, 1681 GB / 1832 GB avail; 2014 B/s wr, 0 op/s
2015-12-01 03:45:22.014833 mon.0 10.214.134.24:6789/0 166 : cluster [INF] osdmap e18: 3 osds: 3 up, 3 in
2015-12-01 03:45:22.015510 mon.0 10.214.134.24:6789/0 167 : cluster [INF] pgmap v33: 3021 pgs: 1101 creating+peering, 1896 creating, 24 active+clean; 18567 bytes data, 60089 MB used, 1681 GB / 1832 GB avail
2015-12-01 03:45:22.672981 mon.0 10.214.134.24:6789/0 170 : cluster [INF] mdsmap e16: 0/1/1 up, 1 failed
2015-12-01 03:45:23.674292 mon.0 10.214.134.24:6789/0 174 : cluster [INF] mdsmap e17: 0/0/0 up
2015-12-01 03:45:24.685010 mon.0 10.214.134.24:6789/0 177 : cluster [INF] osdmap e19: 3 osds: 3 up, 3 in
2015-12-01 03:45:24.685637 mon.0 10.214.134.24:6789/0 178 : cluster [INF] pgmap v34: 3015 pgs: 1101 creating+peering, 1896 creating, 18 active+clean; 0 bytes data, 60089 MB used, 1681 GB / 1832 GB avail
2015-12-01 03:45:25.692352 mon.0 10.214.134.24:6789/0 181 : cluster [INF] osdmap e20: 3 osds: 3 up, 3 in
2015-12-01 03:45:25.693011 mon.0 10.214.134.24:6789/0 182 : cluster [INF] pgmap v35: 3009 pgs: 1101 creating+peering, 1896 creating, 12 active+clean; 0 bytes data, 60089 MB used, 1681 GB / 1832 GB avail
2015-12-01 03:45:26.814425 mon.0 10.214.134.24:6789/0 185 : cluster [INF] osdmap e21: 3 osds: 3 up, 3 in
2015-12-01 03:45:26.844237 mon.0 10.214.134.24:6789/0 186 : cluster [INF] pgmap v36: 12 pgs: 12 active+clean; 0 bytes data, 60089 MB used, 1681 GB / 1832 GB avail
2015-12-01 03:45:27.840194 mon.0 10.214.134.24:6789/0 189 : cluster [INF] osdmap e22: 3 osds: 3 up, 3 in
2015-12-01 03:45:27.841210 mon.0 10.214.134.24:6789/0 190 : cluster [INF] pgmap v37: 18 pgs: 6 creating, 12 active+clean; 0 bytes data, 60089 MB used, 1681 GB / 1832 GB avail
2015-12-01 03:45:28.844349 mon.0 10.214.134.24:6789/0 193 : cluster [INF] osdmap e23: 3 osds: 3 up, 3 in
2015-12-01 03:45:28.845289 mon.0 10.214.134.24:6789/0 194 : cluster [INF] pgmap v38: 24 pgs: 12 creating, 12 active+clean; 0 bytes data, 60089 MB used, 1681 GB / 1832 GB avail
2015-12-01 03:45:29.096513 mon.0 10.214.134.24:6789/0 196 : cluster [INF] osdmap e24: 3 osds: 3 up, 3 in
2015-12-01 03:45:29.097590 mon.0 10.214.134.24:6789/0 197 : cluster [INF] pgmap v39: 24 pgs: 12 creating, 12 active+clean; 0 bytes data, 60089 MB used, 1681 GB / 1832 GB avail
2015-12-01 03:45:29.846241 mon.0 10.214.134.24:6789/0 199 : cluster [INF] mdsmap e18: 0/0/1 up
2015-12-01 03:45:30.850410 mon.0 10.214.134.24:6789/0 204 : cluster [INF] osdmap e25: 3 osds: 3 up, 3 in
2015-12-01 03:45:30.850947 mon.0 10.214.134.24:6789/0 205 : cluster [INF] pgmap v40: 24 pgs: 12 creating, 12 active+clean; 0 bytes data, 60089 MB used, 1681 GB / 1832 GB avail
2015-12-01 03:45:31.931398 mon.0 10.214.134.24:6789/0 208 : cluster [INF] osdmap e26: 3 osds: 3 up, 3 in
2015-12-01 03:45:31.932053 mon.0 10.214.134.24:6789/0 209 : cluster [INF] pgmap v41: 24 pgs: 5 creating+peering, 7 creating, 12 active+clean; 0 bytes data, 60099 MB used, 1681 GB / 1832 GB avail
2015-12-01 03:45:31.932646 mon.0 10.214.134.24:6789/0 210 : cluster [INF] pgmap v42: 24 pgs: 5 creating+peering, 7 creating, 12 active+clean; 0 bytes data, 60099 MB used, 1681 GB / 1832 GB avail
2015-12-01 03:45:38.941661 mon.0 10.214.134.24:6789/0 227 : cluster [INF] mds.? 10.214.134.24:6804/25754 up:boot
2015-12-01 03:45:38.941753 mon.0 10.214.134.24:6789/0 228 : cluster [INF] mds.? 10.214.134.24:6805/25755 up:boot
2015-12-01 03:45:38.942392 mon.0 10.214.134.24:6789/0 229 : cluster [INF] mdsmap e19: 0/0/1 up, 2 up:standby
2015-12-01 03:45:38.943092 mon.0 10.214.134.24:6789/0 230 : cluster [INF] mdsmap e20: 1/1/1 up {0=a=up:creating}, 1 up:standby
2015-12-01 03:46:09.064535 osd.0 10.214.134.24:6800/22555 1 : cluster [WRN] 5 slow requests, 5 included below; oldest blocked for > 30.120034 secs
2015-12-01 03:46:09.064539 osd.0 10.214.134.24:6800/22555 2 : cluster [WRN] slow request 30.119714 seconds old, received at 2015-12-01 03:45:38.944757: osd_op(mds.0.3:9 6.2a8b6bd0 (undecoded) ondisk+write+known_if_redirected+full_force e26) currently waiting for peered
2015-12-01 03:46:09.064543 osd.0 10.214.134.24:6800/22555 3 : cluster [WRN] slow request 30.119681 seconds old, received at 2015-12-01 03:45:38.944790: osd_op(mds.0.3:10 6.2a8b6bd0 (undecoded) ondisk+write+known_if_redirected+full_force e26) currently waiting for peered
2015-12-01 03:46:09.064546 osd.0 10.214.134.24:6800/22555 4 : cluster [WRN] slow request 30.119646 seconds old, received at 2015-12-01 03:45:38.944825: osd_op(mds.0.3:2 6.64e96f8f (undecoded) ondisk+write+known_if_redirected+full_force e26) currently waiting for peered
2015-12-01 03:46:09.064549 osd.0 10.214.134.24:6800/22555 5 : cluster [WRN] slow request 30.119611 seconds old, received at 2015-12-01 03:45:38.944860: osd_op(mds.0.3:11 6.7a4d91b0 (undecoded) ondisk+write+known_if_redirected+full_force e26) currently waiting for peered
2015-12-01 03:46:09.064552 osd.0 10.214.134.24:6800/22555 6 : cluster [WRN] slow request 30.119554 seconds old, received at 2015-12-01 03:45:38.944916: osd_op(mds.0.3:12 6.7a4d91b0 (undecoded) ondisk+write+known_if_redirected+full_force e26) currently waiting for peered
2015-12-01 03:46:10.064776 osd.0 10.214.134.24:6800/22555 7 : cluster [WRN] 10 slow requests, 5 included below; oldest blocked for > 31.120289 secs

Actions #1

Updated by John Spray over 8 years ago

Seen this twice in a row when scheduling wip-testing-jcsp (branch of same name in ceph and ceph-qa-suite) for the "fs/recovery" suite.

Actions #2

Updated by John Spray over 8 years ago

  • Project changed from CephFS to Ceph
  • Category set to OSD
Actions #3

Updated by Sage Weil over 8 years ago

  • Status changed from New to In Progress
  • Assignee set to Sage Weil
  • Priority changed from Normal to Urgent
Actions #4

Updated by Sage Weil over 8 years ago

This looks like a thread in teh OSD got stuck.. specifically, osd.1's last message on thread 7fc8fc428700 is

2015-12-01 03:45:08.688664 7fc8fc428700 1 -- 10.214.132.11:6801/11666 <== osd.0 10.214.134.24:6801/22555 69 ==== osd_map(16..16 src has 1..16) v3 ==== 467+0+0 (3038103570 0 0) 0x561e8acc53c0 con 0x561e8ab07780

and thereafter it sees nothing from osd.0, and peering is consequently stuck. I dont see any obvious culprits in wip-testing-jcsp, though... :/

Actions #5

Updated by John Spray over 8 years ago

Updated: after rebasing test branch on latest master I saw a run where we didn't hit the issue, so either fixed or intermittent (http://pulpito.ceph.com/jspray-2015-12-02_18:28:43-fs:recovery-wip-testing-jcsp---basic-multi/1166586/)

Another run should reach front of queue later today...

Actions #6

Updated by John Spray over 8 years ago

Nope, issue is still there, showed up again in next run.
http://pulpito.ceph.com/jspray-2015-12-03_04:11:11-fs:recovery-wip-testing-jcsp---basic-multi/1166586/

I notice in this case that the stuck OSD ops actually do eventually complete (at 2015-12-03 16:07:24).

The mons aren't getting osd_stat messages from OSDs for the period while we're stuck:

2015-12-03 16:03:10.356268 7fad351d7700 10 mon.a@0(leader).pg v29  got osd.1 osd_stat(25333 MB used, 410 GB avail, 458 GB total, peers [0,2]/[] op hist [0,0,0,0,0,0,0,0,0,0,0,1,3,1]) (was osd_stat(25330 MB used, 410 GB avail, 458 GB total, peers [0,2]/[] op hist []))
2015-12-03 16:03:24.685819 7fad351d7700 10 mon.a@0(leader).pg v37  got osd.0 osd_stat(62239 MB used, 808 GB avail, 916 GB total, peers [1,2]/[] op hist []) (was osd_stat(62234 MB used, 808 GB avail, 916 GB total, peers [1,2]/[] op hist []))
2015-12-03 16:07:07.583402 7fad351d7700 10 mon.a@0(leader).pg v40  got osd.2 osd_stat(25385 MB used, 410 GB avail, 458 GB total, peers [0,1]/[] op hist [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1]) (was osd_stat(25332 MB used, 410 GB avail, 458 GB total, peers [0,1]/[] op hist [0,0,0,0,0,0,0,0,0,1,0,0,1]))
2015-12-03 16:07:07.591963 7fad351d7700 10 mon.a@0(leader).pg v40  got osd.2 osd_stat(25385 MB used, 410 GB avail, 458 GB total, peers [0,1]/[] op hist [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1]) (was osd_stat(25332 MB used, 410 GB avail, 458 GB total, peers [0,1]/[] op hist [0,0,0,0,0,0,0,0,0,1,0,0,1]))

During the dead period, OSD 0 is still communicating with the mon, but OSDs 1 and 2 are both sat there trying to re-establish a connection they've marked down when not seeing timely pgstat acks. I'm not familiar enough with all the mechanics here to quickly say which part is getting stuck, so it may or may not be the exact same case that sage noted above.

Actions #7

Updated by John Spray over 8 years ago

I think one of the tricks to this must have been that we were creating a rudely large number of PGs for the number of OSDs (my test was using "mon pg warn max per osd" to guide its PG count, but that is forced to 10000 by default in teuthology). Having overloaded (too many pgs) OSDs is probably necessary to reproduce this.

Actions #8

Updated by Greg Farnum over 8 years ago

Is there any reason to think this isn't just the general slowness that comes from overloading OSDs, then? I don't think we have any reports of this on our regular branches.

Actions #9

Updated by John Spray over 8 years ago

It appears that one or more threads is getting entirely stuck, while some others are continuing, so it doesn't on the face of it seem like general slowness. Hard to say though.

Actions #10

Updated by Sage Weil about 8 years ago

John, have you seen this again? What's weird is I don't think it's happening in the rados suite...

Actions #11

Updated by Greg Farnum about 8 years ago

  • Priority changed from Urgent to Normal

We dropped down the number of PGs to a more polite level; anybody wanting to reproduce it would probably have to subject RADOS to that extreme workload.

I'm dropping the priority, but not closing the bug.

Actions #12

Updated by Sage Weil over 7 years ago

  • Status changed from In Progress to Need More Info
  • Assignee deleted (Sage Weil)

Need to verify this still happens.

Actions #13

Updated by Sage Weil about 7 years ago

  • Status changed from Need More Info to Can't reproduce
Actions

Also available in: Atom PDF