Support #23050
closedPG doesn't move to down state in replica pool
0%
Description
Hello,
Environment used - 3 node cluster
Replication - 3
#ceph osd pool ls detail
pool 16 'cdvr_ec' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 2119 flags hashpspool stripe_width 0 application freeform
Scenario :
1. For a specific PG (example 16.29e) i have stopped all respective OSDs.
[root@pl12-cn1 ~]# ceph pg dump | grep 16.29e dumped all 16.29e 0 0 0 0 0 0 0 0 active+clean 2018-02-20 10:06:54.392885 0'0 2241:89 [15,6,28] 15 [15,6,28] 15 0'0 2018-02-20 06:02:53.117922 0'0 2018-02-20 06:02:53.117922 [root@pl12-cn1 ~]#
cluster: id: c36fb424-038a-4c38-84a4-1469481ad5c8 health: HEALTH_WARN 3 osds down Reduced data availability: 4 pgs inactive Degraded data redundancy: 140 pgs unclean, 230 pgs degraded services: mon: 3 daemons, quorum pl12-cn1,pl12-cn2,pl12-cn3 mgr: pl12-cn3(active), standbys: pl12-cn1, pl12-cn2 osd: 36 osds: 33 up, 36 in data: pools: 1 pools, 1024 pgs objects: 0 objects, 0 bytes usage: 41063 MB used, 196 TB / 196 TB avail pgs: 1.465% pgs not active 794 active+clean 215 active+undersized+degraded 13 undersized+degraded+peered 2 stale+undersized+degraded+peered
2. After stopping all 3 OSDs (replica 3) i can see the respective pg is marked as stale. No PG is marked as down.
OSd.28 was stopped as last.
[root@pl12-cn1 ~]# ceph pg dump | grep 16.29e dumped all 16.29e 0 0 0 0 0 0 0 0 stale+undersized+degraded+peered 2018-02-20 10:00:44.999756 0'0 2233:80 [28] 28 [28] 28 0'0 2018-02-20 06:02:53.117922 0'0 2018-02-20 06:02:53.117922
3. I stopped more OSDs across all nodes and i see the same behavior. PGs are marked as stale but not down.
cluster:
id: c36fb424-038a-4c38-84a4-1469481ad5c8
health: HEALTH_WARN
18 osds down
Reduced data availability: 431 pgs inactive, 72 pgs stale
Degraded data redundancy: 861 pgs unclean, 889 pgs degraded, 708 pgs undersized
services:
mon: 3 daemons, quorum pl12-cn1,pl12-cn2,pl12-cn3
mgr: pl12-cn3(active), standbys: pl12-cn1, pl12-cn2
osd: 36 osds: 18 up, 36 in
data:
pools: 1 pools, 1024 pgs
objects: 0 objects, 0 bytes
usage: 40739 MB used, 196 TB / 196 TB avail
pgs: 47.363% pgs not active
404 active+undersized+degraded
360 undersized+degraded+peered
135 active+clean
125 stale+undersized+degraded+peered
For a replica pool, do we not expect to see PGs with down status?
Updated by Nokia ceph-users about 6 years ago
Please let me know of the required logs/info to be added if any.
Updated by Josh Durgin about 6 years ago
- Tracker changed from Bug to Support
- Project changed from Ceph to RADOS
- Status changed from New to Closed
'stale' means there haven't been any reports from the primary in a while. Since there's no osd to report the status of a pg, these stay stale.