To add more context to the problem:
Min_size was set to 1 and replication size is 2.
There was a flaky power connection to one of the enclosures. With min_size 1, we were able to continue the IO's, and recovery was active once the power comes back. But if there is a power failure again when recovery is in progress, some of the PGs are going to down+peering state.
Extract from pg query.
$ ceph pg 1.143 query
{ "state": "down+peering",
"snap_trimq": "[]",
"epoch": 3918,
"up": [
17],
"acting": [
17],
"info": { "pgid": "1.143",
"last_update": "3166'40424",
"last_complete": "3166'40424",
"log_tail": "2577'36847",
"last_user_version": 40424,
"last_backfill": "MAX",
"purged_snaps": "[]",
...... "recovery_state": [
{ "name": "Started\/Primary\/Peering\/GetInfo",
"enter_time": "2015-07-15 12:48:51.372676",
"requested_info_from": []},
{ "name": "Started\/Primary\/Peering",
"enter_time": "2015-07-15 12:48:51.372675",
"past_intervals": [
{ "first": 3147,
"last": 3166,
"maybe_went_rw": 1,
"up": [
17,
4],
"acting": [
17,
4],
"primary": 17,
"up_primary": 17},
{ "first": 3167,
"last": 3167,
"maybe_went_rw": 0,
"up": [
10,
20],
"acting": [
10,
20],
"primary": 10,
"up_primary": 10},
{ "first": 3168,
"last": 3181,
"maybe_went_rw": 1,
"up": [
10,
20],
"acting": [
10,
4],
"primary": 10,
"up_primary": 10},
{ "first": 3182,
"last": 3184,
"maybe_went_rw": 0,
"up": [
20],
"acting": [
4],
"primary": 4,
"up_primary": 20},
{ "first": 3185,
"last": 3188,
"maybe_went_rw": 1,
"up": [
20],
"acting": [
20],
"primary": 20,
"up_primary": 20}],
"probing_osds": [
"17",
"20"],
"blocked": "peering is blocked due to down osds",
"down_osds_we_would_probe": [
4,
10],
"peering_blocked_by": [
{ "osd": 4,
"current_lost_at": 0,
"comment": "starting or marking this osd lost may let us proceed"},
{ "osd": 10,
"current_lost_at": 0,
"comment": "starting or marking this osd lost may let us proceed"}]},
{ "name": "Started",
"enter_time": "2015-07-15 12:48:51.372671"}],
"agent_state": {}}
And Pgs are not coming to active+clean till power is resumed again. During this period no IOs are allowed to the cluster. Not able to follow why the PGs are ending up in peering state? Each Pg has two copies in both the enclosures. If one of enclosure is down for some time, should be able to serve IO's from the second one. That was true, if no recovery IO is involved. In case of any recovery, we are ending up some Pg's in down and peering state.