Project

General

Profile

Bug #5884

negative num_objects_degraded in pool stats

Added by Noah Watkins almost 7 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

On `mira103` I'm seeing a negative value for `num_objects_degraded` for `pool 5`. I run `ceph pg dump pools` and see the following:

    { "poolid": 5,
      "stat_sum": { "num_bytes": 20053257,
          "num_objects": 136364,
          "num_object_clones": 0,
          "num_object_copies": 513847,
          "num_objects_missing_on_primary": 0,
          "num_objects_degraded": -192762,
          "num_objects_unfound": 0,
          "num_read": 12897042,
          "num_read_kb": 7765301,
          "num_write": 10580921,
          "num_write_kb": 2694858,
          "num_scrub_errors": 0,
          "num_shallow_scrub_errors": 0,
          "num_deep_scrub_errors": 0,
          "num_objects_recovered": 7351,
          "num_bytes_recovered": 1085823,
          "num_keys_recovered": 0},
      "stat_cat_sum": {},
      "log_size": 9987861,
      "ondisk_log_size": 9987861},

1.3_query.json (11.1 KB) John Spray, 03/28/2014 08:50 AM

1.5_query.json (11 KB) John Spray, 03/28/2014 08:50 AM

1.32_query.json (11 KB) John Spray, 03/28/2014 08:50 AM

1.33_query.json (11 KB) John Spray, 03/28/2014 08:50 AM

1.30_query.json (11 KB) John Spray, 03/28/2014 08:53 AM


Related issues

Related to Ceph - Bug #7737: osd: deletes vs backfill makes degraded go negative Resolved 03/15/2014

History

#1 Updated by John Spray over 6 years ago

Seen on a cluster that's been running for the past 2 weeks on the firefly branch.

Potentially noteworthy things that have happened to this cluster in its brief lifetime:

  • reaching full_ratio and doing some writes beyond from the MDS (while testing fix for #7780).
  • running while(true) ; do ceph osd reweight 1 0.9 ; ceph osd reweight 1 1.0 ; done for a number of hours.
# ceph --version
ceph version 0.78-393-gc5682e7 (c5682e78e9fb9d49c03ca983e0b4100696055eb7)

# ceph -w
    cluster 1de82a3e-7ac7-4ca2-a123-95c21f525bfb
     health HEALTH_WARN 127 pgs stuck unclean; recovery -10/627627 objects degraded (-0.002%); 3 near full osd(s)
     monmap e1: 3 mons at {gravel1=192.168.18.1:6789/0,gravel2=192.168.18.2:6789/0,gravel3=192.168.18.3:6789/0}, election epoch 140, quorum 0,1,2 gravel1,gravel2,gravel3
     mdsmap e92: 1/1/1 up {0=gravel1=up:active}
     osdmap e21001: 3 osds: 3 up, 3 in
      pgmap v48042: 1152 pgs, 3 pools, 796 GB data, 204 kobjects
            2393 GB used, 385 GB / 2778 GB avail
            -10/627627 objects degraded (-0.002%)
                 127 active+remapped
                1025 active+clean

2014-03-28 15:44:29.850076 mon.0 [INF] pgmap v48042: 1152 pgs: 127 active+remapped, 1025 active+clean; 796 GB data, 2393 GB used, 385 GB / 2778 GB avail; -10/627627 objects degraded (-0.002%)

Four PGs report negative

1.3    19    0    0    0    4194304    3002    3002    active+clean    2014-03-28 15:26:34.952046    20989'11015    21001:20812    [1,2,0]    1    [1,2,0]    1    20989'11015    2014-03-28 15:19:01.605414    20975'11014    2014-03-27 13:11:02.998676
1.32    20    0    -2    0    8388608    3001    3001    active+remapped    2014-03-28 15:26:25.015938    21000'13468    21001:46267    [0,2]    0    [0,2,1]    0    183'24    2014-03-25 21:27:15.307672    183'24    2014-03-25 21:27:15.307672
1.33    18    0    -2    0    0    3001    3001    active+remapped    2014-03-28 15:26:32.145577    21000'15726    21001:51508    [2,0]    2    [2,0,1]    2    165'1407    2014-03-20 19:06:16.965426    88'1    2014-03-17 16:40:00.949961
1.5    29    0    -4    0    0    3001    3001    active+remapped    2014-03-28 15:26:27.262403    21000'15679    21001:40825    [2,0]    2    [2,0,1]    2    154'1413    2014-03-20 19:03:44.868298    0'0    2014-03-17 16:39:09.928912

#2 Updated by John Spray over 6 years ago

Oops, replace 1.3 with 1.30 in previous message.

1.30    27    0    -2    0    0    3001    3001    active+remapped    2014-03-28 15:26:32.123438    21000'11044    21002:34170    [2,0]    2    [2,0,1]    2    154'1413    2014-03-20 19:06:15.976037    88'2    2014-03-17 16:39:58.949699

#3 Updated by Quentin M over 5 years ago

Hello there,

Same issue here on a small & (almost) empty 3-nodes cluster with MON+OSD on each. One of the node is currently down.

# ceph --version
ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)

t
# ceph -s
    cluster 8bd6398b-65a2-4254-bb00-1ff2468d2806
     health HEALTH_WARN 768 pgs degraded; 768 pgs stuck unclean; recovery -20/15 objects degraded (-133.333%); 1 mons down, quorum 0,1 ceph-1,ceph-2
     monmap e3: 3 mons at {ceph-1=192.168.1.1:6789/0,ceph-2=192.168.1.2:6789/0,ceph-3=192.168.1.3:6789/0}, election epoch 150, quorum 0,1 ceph-1,ceph-2
     osdmap e320: 3 osds: 2 up, 2 in
      pgmap v2375: 768 pgs, 6 pools, 32 bytes data, 5 objects
            10333 MB used, 340 GB / 350 GB avail
            -20/15 objects degraded (-133.333%)
                 768 active+degraded
# ceph osd tree

# id    weight    type name    up/down    reweight
-1    0.51    root default
-2    0.17        host ceph-1
1    0.17            osd.1    up    1    
-3    0.17        host ceph-2
2    0.17            osd.2    up    1    
-4    0.17        host ceph-3
4    0.17            osd.4    down    0
# ceph osd dump
epoch 320
fsid 8bd6398b-65a2-4254-bb00-1ff2468d2806
created 2015-01-26 21:25:52.864081
modified 2015-02-11 17:01:31.634283
flags 
pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 79 flags hashpspool crash_replay_interval 45 stripe_width 0
pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 77 flags hashpspool stripe_width 0
pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 81 flags hashpspool stripe_width 0
pool 3 'images' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 313 flags hashpspool stripe_width 0
    removed_snaps [1~3]
pool 4 'volumes' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 296 flags hashpspool stripe_width 0
pool 5 'vms' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 302 flags hashpspool stripe_width 0
max_osd 7
osd.1 up   in  weight 1 up_from 318 up_thru 318 down_at 310 last_clean_interval [293,303) 192.168.1.1:6800/25249 172.16.64.1:6800/25249 172.16.64.1:6801/25249 192.168.1.1:6801/25249 exists,up c24bdef1-d279-4176-966a-1931a51d5315
osd.2 up   in  weight 1 up_from 316 up_thru 318 down_at 315 last_clean_interval [308,313) 192.168.1.2:6800/2422 172.16.64.2:6800/2422 172.16.64.2:6801/2422 192.168.1.2:6801/2422 exists,up abe9ca5c-f060-4cd9-a05a-207ebd6ef2fc
osd.4 down out weight 0 up_from 305 up_thru 310 down_at 314 last_clean_interval [284,303) 192.168.1.3:6800/3662 172.16.64.3:6800/3662 172.16.64.3:6801/3662 192.168.1.3:6801/3662 autoout,exists 402cc9bc-fbfb-40bd-a51f-84626ff39f05

Have Fun.

#4 Updated by Sage Weil over 3 years ago

  • Status changed from New to Resolved

Also available in: Atom PDF