unrecoverable 0.*p* PGs
Several times, with 0.25.1, presumably because of random btrfs hangs (now strongly linked with an active local ceph mount), I've had these p PGs go nuts: they become unrecoverable.
The scenario involved 3-replica data (and 4-replica metadata, unused 4-replica casdata and rbd), actively uploading about 1TB to 8 OSDs (two large ones each on one host, 6 much smaller on a third host, divided in 2 groups of 3 for the 4-replica stuff, and regarded as a single group for the 3-replica stuff). Now, how I got to it might seem relevant, but I can't tell myself, and I no longer have those logs, but what is relevant is that not even 0.26 with its new recovery algorithm managed to bring these PGs back to an active+clean state.
One of them remained in peering, some in active+degraded, some in active+degraded+peering, and others in crashed+down+something+else, and although everything else recovered fine after disk failures, these didn't. I tried bringing disks down and out one by one, or entire groups at a time, to no avail. The best I could get was some movement about between the degraded/down categories, but after bringing all disks back up, they returned to this state.
It was particularly odd that some of the PGs listed only one OSD (say ) and it remained at that, even after I changed the replication count for data down to 1, and then back up. Other of these PGs listed OSDs that were out as active (say [5,0,1] when 5 was out). It was also odd that only the data (0.) PGs ran into this.
I ended up re-creating the filesystem, for I couldn't figure out how to recover from this. Now, I hope the logs are not important to figure this out, because I no longer have them. Anyhow, the focus of this report is not so much how we got into such an odd state, but rather on having recovery code to get out of it.