Bug #4675
closedmon: pg creations don't get queued on mon startup
0%
Description
PGMonitor::send_pg_creates also divvies up pg creations among the current osds they map to. This happens from update_from_paxos(), and presumably also when osdmaps update. On startup, it happens from preinit() -> init_paxos(), which calls PGMonitor::update_from_paxos() before teh OSDMOnitor, which means the OSDMap is not loaded and everything maps to no OSD. Until there is an osdmap update, a reconnecting osd will fail to see creations queued for it.
The fix is probably to break the divvying out of send_pg_creates(), and then ensure that it is called at some other point during startup.
The result is that a pool creation that races with a mon restart will hang. Some other path also gets in this state, or there is a different bug, since it was triggered by the job below (osd thrashing only). In any case, after that hang, restarting the mon got into this buggy state, so it should get fixed regardless.
ubuntu@teuthology:/a/sage-2013-04-06_09:10:56-rados-wip-osd-throttle-testing-basic/9833$ cat orig.config.yaml kernel: kdb: true sha1: b0bb70d12c365872547f10d185bf88eba3ed6083 machine_type: plana nuke-on-error: true overrides: ceph: conf: global: ms inject delay max: 1 ms inject delay probability: 0.005 ms inject delay type: osd ms inject socket failures: 2500 mon: debug mon: 20 debug ms: 20 debug paxos: 20 fs: ext4 log-whitelist: - slow request sha1: aca0aea1bfbafba9cab1b2c693760b824bd82d30 s3tests: branch: master workunit: sha1: aca0aea1bfbafba9cab1b2c693760b824bd82d30 roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - client.0 tasks: - chef: null - clock.check: null - install: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: chance_pgnum_grow: 1 chance_pgpnum_fix: 1 timeout: 1200 - ceph-fuse: null - workunit: clients: client.0: - rados/test.sh