Bug #7014: rados: stuck degraded, possibly related to acting_backfill changes - Ceph - Ceph

Actions

Copy link

Bug #7014

closed

rados: stuck degraded, possibly related to acting_backfill changes

Added by Samuel Just over 10 years ago. Updated over 10 years ago.

Status:

Can't reproduce

Priority:

Urgent

Assignee:

David Zafman

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

End of ceph.log:
2013-12-15 23:51:19.781079 mon.0 10.214.131.3:6789/0 2397 : [INF] pgmap v1364: 213 pgs: 212 active+clean, 1 active+degraded; 2658 MB data, 729 MB used, 928 GB / 931 GB avail
2013-12-15 23:51:19.944286 mon.0 10.214.131.3:6789/0 2398 : [INF] osdmap e950: 6 osds: 5 up, 2 in
2013-12-15 23:51:20.061608 mon.0 10.214.131.3:6789/0 2399 : [INF] pgmap v1365: 213 pgs: 212 active+clean, 1 active+degraded; 2658 MB data, 729 MB used, 928 GB / 931 GB avail
2013-12-15 23:51:21.153941 mon.0 10.214.131.3:6789/0 2400 : [INF] osdmap e951: 6 osds: 5 up, 2 in
2013-12-15 23:51:20.061608 mon.0 10.214.131.3:6789/0 2399 : [INF] pgmap v1365: 213 pgs: 212 active+clean, 1 active+degraded; 2658 MB data, 729 MB used, 928 GB / 931 GB avail
2013-12-15 23:51:21.153941 mon.0 10.214.131.3:6789/0 2400 : [INF] osdmap e951: 6 osds: 5 up, 2 in
2013-12-15 23:51:21.305460 mon.0 10.214.131.3:6789/0 2401 : [INF] pgmap v1366: 202 pgs: 201 active+clean, 1 active+degraded; 23058 bytes data, 729 MB used, 928 GB / 931 GB avail
2013-12-15 23:51:22.384573 mon.0 10.214.131.3:6789/0 2402 : [INF] pgmap v1367: 202 pgs: 201 active+clean, 1 active+degraded; 23058 bytes data, 736 MB used, 928 GB / 931 GB avail
...
2013-12-16 00:09:59.981507 mon.0 10.214.131.3:6789/0 2434 : [INF] pgmap v1399: 202 pgs: 201 active+clean, 1 active+degraded; 23058 bytes data, 693 MB used, 928 GB / 931 GB avail
2013-12-16 00:09:59.981507 mon.0 10.214.131.3:6789/0 2434 : [INF] pgmap v1399: 202 pgs: 201 active+clean, 1 active+degraded; 23058 bytes data, 693 MB used, 928 GB / 931 GB avail
2013-12-16 00:10:23.917536 mon.0 10.214.131.3:6789/0 2435 : [INF] pgmap v1400: 202 pgs: 201 active+clean, 1 active+degraded; 23058 bytes data, 693 MB used, 928 GB / 931 GB avail
2013-12-16 00:10:23.917536 mon.0 10.214.131.3:6789/0 2435 : [INF] pgmap v1400: 202 pgs: 201 active+clean, 1 active+degraded; 23058 bytes data, 693 MB used, 928 GB / 931 GB avail

5/6 were up. The dead osd appears to be dead due to the min_size test, not a crash. I suggest grabbing the latest osdmap from the monstore to determine how the pgs were mapped to start with. I suspect there was a temp pg mapping lingering for one of the pgs.

ubuntu@teuthology:/a/teuthology-2013-12-15_23:00:15-rados-master-testing-basic-plana/4634/remote