Bug #7014
closedrados: stuck degraded, possibly related to acting_backfill changes
0%
Description
End of ceph.log:
2013-12-15 23:51:19.781079 mon.0 10.214.131.3:6789/0 2397 : [INF] pgmap v1364: 213 pgs: 212 active+clean, 1 active+degraded; 2658 MB data, 729 MB used, 928 GB / 931 GB avail
2013-12-15 23:51:19.944286 mon.0 10.214.131.3:6789/0 2398 : [INF] osdmap e950: 6 osds: 5 up, 2 in
2013-12-15 23:51:20.061608 mon.0 10.214.131.3:6789/0 2399 : [INF] pgmap v1365: 213 pgs: 212 active+clean, 1 active+degraded; 2658 MB data, 729 MB used, 928 GB / 931 GB avail
2013-12-15 23:51:21.153941 mon.0 10.214.131.3:6789/0 2400 : [INF] osdmap e951: 6 osds: 5 up, 2 in
2013-12-15 23:51:20.061608 mon.0 10.214.131.3:6789/0 2399 : [INF] pgmap v1365: 213 pgs: 212 active+clean, 1 active+degraded; 2658 MB data, 729 MB used, 928 GB / 931 GB avail
2013-12-15 23:51:21.153941 mon.0 10.214.131.3:6789/0 2400 : [INF] osdmap e951: 6 osds: 5 up, 2 in
2013-12-15 23:51:21.305460 mon.0 10.214.131.3:6789/0 2401 : [INF] pgmap v1366: 202 pgs: 201 active+clean, 1 active+degraded; 23058 bytes data, 729 MB used, 928 GB / 931 GB avail
2013-12-15 23:51:22.384573 mon.0 10.214.131.3:6789/0 2402 : [INF] pgmap v1367: 202 pgs: 201 active+clean, 1 active+degraded; 23058 bytes data, 736 MB used, 928 GB / 931 GB avail
...
2013-12-16 00:09:59.981507 mon.0 10.214.131.3:6789/0 2434 : [INF] pgmap v1399: 202 pgs: 201 active+clean, 1 active+degraded; 23058 bytes data, 693 MB used, 928 GB / 931 GB avail
2013-12-16 00:09:59.981507 mon.0 10.214.131.3:6789/0 2434 : [INF] pgmap v1399: 202 pgs: 201 active+clean, 1 active+degraded; 23058 bytes data, 693 MB used, 928 GB / 931 GB avail
2013-12-16 00:10:23.917536 mon.0 10.214.131.3:6789/0 2435 : [INF] pgmap v1400: 202 pgs: 201 active+clean, 1 active+degraded; 23058 bytes data, 693 MB used, 928 GB / 931 GB avail
2013-12-16 00:10:23.917536 mon.0 10.214.131.3:6789/0 2435 : [INF] pgmap v1400: 202 pgs: 201 active+clean, 1 active+degraded; 23058 bytes data, 693 MB used, 928 GB / 931 GB avail
5/6 were up. The dead osd appears to be dead due to the min_size test, not a crash. I suggest grabbing the latest osdmap from the monstore to determine how the pgs were mapped to start with. I suspect there was a temp pg mapping lingering for one of the pgs.
ubuntu@teuthology:/a/teuthology-2013-12-15_23:00:15-rados-master-testing-basic-plana/4634/remote
Updated by Samuel Just over 10 years ago
Another option would be to reproduce with logging. If you catch it before it gets cleaned up, it should be pretty obvious what's going on.
Updated by David Zafman over 10 years ago
- Status changed from New to Can't reproduce
This might have been fixed by fix for #6905 which is to increase the timeout in suites/rados/thrash/thrashers/mapgap.yaml.