Actions
Bug #15313
closedcluster stuck and thrashosd waiting for it to be clean
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
http://167.114.252.97:8081/ubuntu-2016-03-29_19:53:23-rados-wip-15171---basic-openstack/
The test above succeeded only after logging in one of the instances and setting the primary affinity to all OSD to 1.
The logs from one of the targets are at http://teuthology-logs.public.ceph.com/logs-15313/ and the rest was lost because the job ran with "archive-on-error" and succeeded.
ubuntu@target167114227065:~$ sudo ceph -s cluster 5d792e64-8050-4f6e-908c-8df3060d7e8d health HEALTH_WARN 1 pgs backfill 1 pgs backfilling 3 pgs degraded 1 pgs recovering 2 pgs recovery_wait 3 pgs stuck degraded 5 pgs stuck unclean recovery 5859/23854 objects degraded (24.562%) recovery 3145/23854 objects misplaced (13.184%) pool rbd pg_num 64 > pgp_num 34 mon.b has mon_osd_down_out_interval set to 0 monmap e1: 3 mons at {a=167.114.227.65:6789/0,b=167.114.227.66:6789/0,c=167.114.227.65:6790/0} election epoch 4, quorum 0,1,2 a,b,c osdmap e102: 6 osds: 6 up, 6 in; 2 remapped pgs pgmap v1691: 72 pgs, 3 pools, 24548 MB data, 8609 objects 44129 MB used, 136 GB / 179 GB avail 5859/23854 objects degraded (24.562%) 3145/23854 objects misplaced (13.184%) 67 active+clean 2 active+recovery_wait+degraded 1 active+recovering+degraded 1 active+remapped+backfilling 1 active+remapped+wait_backfill recovery io 12294 kB/s, 4 objects/s ubuntu@target167114227065:~$ sudo ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 6.00000 root default -3 6.00000 rack localrack -2 6.00000 host localhost 0 1.00000 osd.0 up 1.00000 1.00000 1 1.00000 osd.1 up 1.00000 0.21001 2 1.00000 osd.2 up 1.00000 0 3 1.00000 osd.3 up 1.00000 1.00000 4 1.00000 osd.4 up 1.00000 0 5 1.00000 osd.5 up 1.00000 1.00000
ubuntu@target167114227065:~$ sudo ceph -s cluster 5d792e64-8050-4f6e-908c-8df3060d7e8d health HEALTH_WARN 1 pgs backfill 2 pgs degraded 2 pgs recovering 1 pgs stuck degraded 3 pgs stuck unclean recovery 2769/22869 objects degraded (12.108%) recovery 1970/22869 objects misplaced (8.614%) pool rbd pg_num 64 > pgp_num 34 mon.c has mon_osd_down_out_interval set to 0 monmap e1: 3 mons at {a=167.114.227.65:6789/0,b=167.114.227.66:6789/0,c=167.114.227.65:6790/0} election epoch 4, quorum 0,1,2 a,b,c osdmap e110: 6 osds: 6 up, 6 in; 1 remapped pgs pgmap v1858: 72 pgs, 3 pools, 24548 MB data, 8609 objects 41379 MB used, 139 GB / 179 GB avail 2769/22869 objects degraded (12.108%) 1970/22869 objects misplaced (8.614%) 69 active+clean 2 active+recovering+degraded 1 active+remapped+wait_backfill
2016-03-29 21:57:58.630797 mon.0 [INF] pgmap v1869: 72 pgs: 1 active+remapped+wait_backfill, 69 active+clean, 2 active+recovering+degraded; 24548 MB data, 41734 MB used, 139 GB / 179 GB avail; 2549/22869 objects degraded (11.146%); 1970/22869 objects misplaced (8.614%) 2016-03-29 21:58:00.863474 mon.0 [INF] pgmap v1870: 72 pgs: 1 active+remapped+wait_backfill, 69 active+clean, 2 active+recovering+degraded; 24548 MB data, 41734 MB used, 139 GB / 179 GB avail; 2523/22869 objects degraded (11.032%); 1970/22869 objects misplaced (8.614%) 2016-03-29 21:57:54.640063 osd.5 [INF] 0.10 scrub starts 2016-03-29 21:57:54.706198 osd.5 [INF] 0.10 scrub ok 2016-03-29 21:58:02.339356 osd.3 [INF] 0.13 scrub starts 2016-03-29 21:58:02.354004 osd.3 [INF] 0.13 scrub ok
bjects/s recovering 2016-03-29 22:09:12.497228 osd.4 [INF] 0.2d scrub starts 2016-03-29 22:09:12.498801 osd.4 [INF] 0.2d scrub ok 2016-03-29 22:09:14.705764 mon.0 [INF] pgmap v2234: 72 pgs: 1 active+remapped+backfilling, 71 active+clean; 24548 MB data, 42626 MB used, 138 GB / 179 GB avail; 1150/22869 objects misplaced (5.029%) 2016-03-29 22:09:07.106207 osd.3 [INF] 2.0 scrub ok
ubuntu@target167114227065:~$ sudo ceph -w cluster 5d792e64-8050-4f6e-908c-8df3060d7e8d health HEALTH_WARN pool rbd pg_num 64 > pgp_num 34 mon.b has mon_osd_down_out_interval set to 0 monmap e1: 3 mons at {a=167.114.227.65:6789/0,b=167.114.227.66:6789/0,c=167.114.227.65:6790/0} election epoch 4, quorum 0,1,2 a,b,c osdmap e112: 6 osds: 6 up, 6 in pgmap v2281: 72 pgs, 3 pools, 24548 MB data, 8609 objects 41910 MB used, 138 GB / 179 GB avail 71 active+clean 1 active+clean+scrubbing recovery io 13157 kB/s, 4 objects/s
Actions