Actions
Bug #5884
closednegative num_objects_degraded in pool stats
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
On `mira103` I'm seeing a negative value for `num_objects_degraded` for `pool 5`. I run `ceph pg dump pools` and see the following:
{ "poolid": 5, "stat_sum": { "num_bytes": 20053257, "num_objects": 136364, "num_object_clones": 0, "num_object_copies": 513847, "num_objects_missing_on_primary": 0, "num_objects_degraded": -192762, "num_objects_unfound": 0, "num_read": 12897042, "num_read_kb": 7765301, "num_write": 10580921, "num_write_kb": 2694858, "num_scrub_errors": 0, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 0, "num_objects_recovered": 7351, "num_bytes_recovered": 1085823, "num_keys_recovered": 0}, "stat_cat_sum": {}, "log_size": 9987861, "ondisk_log_size": 9987861},
Files
Actions
#1
Updated by John Spray about 10 years ago
- File 1.3_query.json 1.3_query.json added
- File 1.5_query.json 1.5_query.json added
- File 1.32_query.json 1.32_query.json added
- File 1.33_query.json 1.33_query.json added
Seen on a cluster that's been running for the past 2 weeks on the firefly branch.
Potentially noteworthy things that have happened to this cluster in its brief lifetime:
- reaching full_ratio and doing some writes beyond from the MDS (while testing fix for #7780).
- running
while(true) ; do ceph osd reweight 1 0.9 ; ceph osd reweight 1 1.0 ; done
for a number of hours.
# ceph --version ceph version 0.78-393-gc5682e7 (c5682e78e9fb9d49c03ca983e0b4100696055eb7) # ceph -w cluster 1de82a3e-7ac7-4ca2-a123-95c21f525bfb health HEALTH_WARN 127 pgs stuck unclean; recovery -10/627627 objects degraded (-0.002%); 3 near full osd(s) monmap e1: 3 mons at {gravel1=192.168.18.1:6789/0,gravel2=192.168.18.2:6789/0,gravel3=192.168.18.3:6789/0}, election epoch 140, quorum 0,1,2 gravel1,gravel2,gravel3 mdsmap e92: 1/1/1 up {0=gravel1=up:active} osdmap e21001: 3 osds: 3 up, 3 in pgmap v48042: 1152 pgs, 3 pools, 796 GB data, 204 kobjects 2393 GB used, 385 GB / 2778 GB avail -10/627627 objects degraded (-0.002%) 127 active+remapped 1025 active+clean 2014-03-28 15:44:29.850076 mon.0 [INF] pgmap v48042: 1152 pgs: 127 active+remapped, 1025 active+clean; 796 GB data, 2393 GB used, 385 GB / 2778 GB avail; -10/627627 objects degraded (-0.002%)
Four PGs report negative
1.3 19 0 0 0 4194304 3002 3002 active+clean 2014-03-28 15:26:34.952046 20989'11015 21001:20812 [1,2,0] 1 [1,2,0] 1 20989'11015 2014-03-28 15:19:01.605414 20975'11014 2014-03-27 13:11:02.998676 1.32 20 0 -2 0 8388608 3001 3001 active+remapped 2014-03-28 15:26:25.015938 21000'13468 21001:46267 [0,2] 0 [0,2,1] 0 183'24 2014-03-25 21:27:15.307672 183'24 2014-03-25 21:27:15.307672 1.33 18 0 -2 0 0 3001 3001 active+remapped 2014-03-28 15:26:32.145577 21000'15726 21001:51508 [2,0] 2 [2,0,1] 2 165'1407 2014-03-20 19:06:16.965426 88'1 2014-03-17 16:40:00.949961 1.5 29 0 -4 0 0 3001 3001 active+remapped 2014-03-28 15:26:27.262403 21000'15679 21001:40825 [2,0] 2 [2,0,1] 2 154'1413 2014-03-20 19:03:44.868298 0'0 2014-03-17 16:39:09.928912
Updated by John Spray about 10 years ago
- File 1.30_query.json 1.30_query.json added
Oops, replace 1.3 with 1.30 in previous message.
1.30 27 0 -2 0 0 3001 3001 active+remapped 2014-03-28 15:26:32.123438 21000'11044 21002:34170 [2,0] 2 [2,0,1] 2 154'1413 2014-03-20 19:06:15.976037 88'2 2014-03-17 16:39:58.949699
Updated by Quentin M about 9 years ago
Hello there,
Same issue here on a small & (almost) empty 3-nodes cluster with MON+OSD on each. One of the node is currently down.
# ceph --version ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)
t
# ceph -s cluster 8bd6398b-65a2-4254-bb00-1ff2468d2806 health HEALTH_WARN 768 pgs degraded; 768 pgs stuck unclean; recovery -20/15 objects degraded (-133.333%); 1 mons down, quorum 0,1 ceph-1,ceph-2 monmap e3: 3 mons at {ceph-1=192.168.1.1:6789/0,ceph-2=192.168.1.2:6789/0,ceph-3=192.168.1.3:6789/0}, election epoch 150, quorum 0,1 ceph-1,ceph-2 osdmap e320: 3 osds: 2 up, 2 in pgmap v2375: 768 pgs, 6 pools, 32 bytes data, 5 objects 10333 MB used, 340 GB / 350 GB avail -20/15 objects degraded (-133.333%) 768 active+degraded
# ceph osd tree # id weight type name up/down reweight -1 0.51 root default -2 0.17 host ceph-1 1 0.17 osd.1 up 1 -3 0.17 host ceph-2 2 0.17 osd.2 up 1 -4 0.17 host ceph-3 4 0.17 osd.4 down 0
# ceph osd dump epoch 320 fsid 8bd6398b-65a2-4254-bb00-1ff2468d2806 created 2015-01-26 21:25:52.864081 modified 2015-02-11 17:01:31.634283 flags pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 79 flags hashpspool crash_replay_interval 45 stripe_width 0 pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 77 flags hashpspool stripe_width 0 pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 81 flags hashpspool stripe_width 0 pool 3 'images' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 313 flags hashpspool stripe_width 0 removed_snaps [1~3] pool 4 'volumes' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 296 flags hashpspool stripe_width 0 pool 5 'vms' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 302 flags hashpspool stripe_width 0 max_osd 7 osd.1 up in weight 1 up_from 318 up_thru 318 down_at 310 last_clean_interval [293,303) 192.168.1.1:6800/25249 172.16.64.1:6800/25249 172.16.64.1:6801/25249 192.168.1.1:6801/25249 exists,up c24bdef1-d279-4176-966a-1931a51d5315 osd.2 up in weight 1 up_from 316 up_thru 318 down_at 315 last_clean_interval [308,313) 192.168.1.2:6800/2422 172.16.64.2:6800/2422 172.16.64.2:6801/2422 192.168.1.2:6801/2422 exists,up abe9ca5c-f060-4cd9-a05a-207ebd6ef2fc osd.4 down out weight 0 up_from 305 up_thru 310 down_at 314 last_clean_interval [284,303) 192.168.1.3:6800/3662 172.16.64.3:6800/3662 172.16.64.3:6801/3662 192.168.1.3:6801/3662 autoout,exists 402cc9bc-fbfb-40bd-a51f-84626ff39f05
Have Fun.
Actions