Actions
Bug #12716
closedCluster health_warn stuck on active+remapped
Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
ceph cli
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Description
I ran a ceph osd reweight-by-utilization last week and partway through had a network interruption. After the network was restored the cluster continued to rebalance but eventually idled with active+remapped and degraded PG's. I added 2 OSD's to the cluster and ran another reweight-by-utilization and now the cluster is idle with 3 PG's active-remapped.
- ceph -s
cluster af859ff1-c394-4c9a-95e2-0e0e4c87445c
health HEALTH_WARN
3 pgs stuck unclean
recovery 24379/66089446 objects misplaced (0.037%)
monmap e24: 3 mons at {mon1=10.0.231.53:6789/0,mon2=10.0.231.54:6789/0,mon3=10.0.231.55:6789/0}
election epoch 268, quorum 0,1,2 mon1,mon2,mon3
osdmap e186553: 102 osds: 102 up, 102 in; 3 remapped pgs
pgmap v3178336: 4144 pgs, 7 pools, 125 TB data, 32270 kobjects
251 TB used, 118 TB / 370 TB avail
24379/66089446 objects misplaced (0.037%)
4141 active+clean
3 active+remapped
- ceph health detail
HEALTH_WARN 3 pgs stuck unclean; recovery 24379/66089446 objects misplaced (0.037%)
pg 2.e7f is stuck unclean for 517058.124297, current state active+remapped, last acting [58,5]
pg 2.b16 is stuck unclean for 434261.024579, current state active+remapped, last acting [40,90]
pg 2.782 is stuck unclean for 307997.053475, current state active+remapped, last acting [76,101]
recovery 24379/66089446 objects misplaced (0.037%)
- ceph pg 2.e7f query|head
{
"state": "active+remapped",
"snap_trimq": "[]",
"epoch": 186553,
"up": [
58
],
"acting": [
58,
5
[root@ceph1 media]# ceph pg 2.b16 query|head {
"state": "active+remapped",
"snap_trimq": "[]",
"epoch": 186553,
"up": [
40
],
"acting": [
40,
90
[root@ceph1 media]# ceph pg 2.782 query|head {
"state": "active+remapped",
"snap_trimq": "[]",
"epoch": 186553,
"up": [
76
],
"acting": [
76,
101
Full pg queries and crushmap, and above osd debug logs attached.
Files
Updated by Sage Weil over 8 years ago
- Status changed from New to Need More Info
My guess is you need to set the vary_r tunable? Or, can you attach the osdmap so we can see why those pgs are only getting 2 replicas.
Updated by Sage Weil about 7 years ago
- Status changed from Need More Info to Can't reproduce
Actions