Okay, the problem seems to just be that the PG went into a backfill state but didn't tell the mon. e.g., in run
/a/sage-2016-12-29_20:50:13-rados-wip-sage-testing---basic-smithi/675453
pg 0.f did
2016-12-30 00:49:35.607752 7fe5d2c70700 15 osd.2 pg_epoch: 14 pg[0.f( v 11'1324 (11'1300,11'1324] local-les=14 n=1324 ec=1 les/c/f 14/9/0 13/13/5) [0,1]/[2,3] r=0 lpr=13 pi=8-12/2 bft=0,1 crt=11'1324 lcod 11'1323 mlcod 0'0 active+remapped] publish_stats_to_osd 14: no change since 2016-12-30 00:49:35.607509
...
2016-12-30 00:49:35.612789 7fe5d3471700 10 osd.2 pg_epoch: 14 pg[0.f( v 11'1324 (11'1300,11'1324] local-les=14 n=1324 ec=1 les/c/f 14/9/0 13/13/5) [0,1]/[2,3] r=0 lpr=13 pi=8-12/2 bft=0,1 crt=11'1324 lcod 11'1323 mlcod 0'0 active+remapped+backfill_wait] queue_recovery -- queuing
but no stat updates that reflect the backfill_wait state bit.
https://github.com/ceph/ceph/pull/12727