Bug #5884: negative num_objects_degraded in pool stats - Ceph - Ceph

Actions

Copy link

Bug #5884

closed

negative num_objects_degraded in pool stats

Added by Noah Watkins over 10 years ago. Updated about 7 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

OSD

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

On `mira103` I'm seeing a negative value for `num_objects_degraded` for `pool 5`. I run `ceph pg dump pools` and see the following:

    { "poolid": 5,
      "stat_sum": { "num_bytes": 20053257,
          "num_objects": 136364,
          "num_object_clones": 0,
          "num_object_copies": 513847,
          "num_objects_missing_on_primary": 0,
          "num_objects_degraded": -192762,
          "num_objects_unfound": 0,
          "num_read": 12897042,
          "num_read_kb": 7765301,
          "num_write": 10580921,
          "num_write_kb": 2694858,
          "num_scrub_errors": 0,
          "num_shallow_scrub_errors": 0,
          "num_deep_scrub_errors": 0,
          "num_objects_recovered": 7351,
          "num_bytes_recovered": 1085823,
          "num_keys_recovered": 0},
      "stat_cat_sum": {},
      "log_size": 9987861,
      "ondisk_log_size": 9987861},

Files

Download all files

1.3_query.json (11.1 KB) 1.3_query.json		John Spray, 03/28/2014 08:50 AM
1.5_query.json (11 KB) 1.5_query.json		John Spray, 03/28/2014 08:50 AM
1.32_query.json (11 KB) 1.32_query.json		John Spray, 03/28/2014 08:50 AM
1.33_query.json (11 KB) 1.33_query.json		John Spray, 03/28/2014 08:50 AM
1.30_query.json (11 KB) 1.30_query.json		John Spray, 03/28/2014 08:53 AM

Related issues 1 (0 open — 1 closed)

Actions

Copy link Download all files

Updated by John Spray about 10 years ago

File 1.3_query.json 1.3_query.json added
File 1.5_query.json 1.5_query.json added
File 1.32_query.json 1.32_query.json added
File 1.33_query.json 1.33_query.json added

Seen on a cluster that's been running for the past 2 weeks on the firefly branch.

Potentially noteworthy things that have happened to this cluster in its brief lifetime:

reaching full_ratio and doing some writes beyond from the MDS (while testing fix for #7780).
running while(true) ; do ceph osd reweight 1 0.9 ; ceph osd reweight 1 1.0 ; done for a number of hours.

# ceph --version
ceph version 0.78-393-gc5682e7 (c5682e78e9fb9d49c03ca983e0b4100696055eb7)

# ceph -w
    cluster 1de82a3e-7ac7-4ca2-a123-95c21f525bfb
     health HEALTH_WARN 127 pgs stuck unclean; recovery -10/627627 objects degraded (-0.002%); 3 near full osd(s)
     monmap e1: 3 mons at {gravel1=192.168.18.1:6789/0,gravel2=192.168.18.2:6789/0,gravel3=192.168.18.3:6789/0}, election epoch 140, quorum 0,1,2 gravel1,gravel2,gravel3
     mdsmap e92: 1/1/1 up {0=gravel1=up:active}
     osdmap e21001: 3 osds: 3 up, 3 in
      pgmap v48042: 1152 pgs, 3 pools, 796 GB data, 204 kobjects
            2393 GB used, 385 GB / 2778 GB avail
            -10/627627 objects degraded (-0.002%)
                 127 active+remapped
                1025 active+clean

2014-03-28 15:44:29.850076 mon.0 [INF] pgmap v48042: 1152 pgs: 127 active+remapped, 1025 active+clean; 796 GB data, 2393 GB used, 385 GB / 2778 GB avail; -10/627627 objects degraded (-0.002%)

Four PGs report negative

1.3    19    0    0    0    4194304    3002    3002    active+clean    2014-03-28 15:26:34.952046    20989'11015    21001:20812    [1,2,0]    1    [1,2,0]    1    20989'11015    2014-03-28 15:19:01.605414    20975'11014    2014-03-27 13:11:02.998676
1.32    20    0    -2    0    8388608    3001    3001    active+remapped    2014-03-28 15:26:25.015938    21000'13468    21001:46267    [0,2]    0    [0,2,1]    0    183'24    2014-03-25 21:27:15.307672    183'24    2014-03-25 21:27:15.307672
1.33    18    0    -2    0    0    3001    3001    active+remapped    2014-03-28 15:26:32.145577    21000'15726    21001:51508    [2,0]    2    [2,0,1]    2    165'1407    2014-03-20 19:06:16.965426    88'1    2014-03-17 16:40:00.949961
1.5    29    0    -4    0    0    3001    3001    active+remapped    2014-03-28 15:26:27.262403    21000'15679    21001:40825    [2,0]    2    [2,0,1]    2    154'1413    2014-03-20 19:03:44.868298    0'0    2014-03-17 16:39:09.928912

Actions

Copy link

Updated by John Spray about 10 years ago

File 1.30_query.json 1.30_query.json added

Oops, replace 1.3 with 1.30 in previous message.

1.30    27    0    -2    0    0    3001    3001    active+remapped    2014-03-28 15:26:32.123438    21000'11044    21002:34170    [2,0]    2    [2,0,1]    2    154'1413    2014-03-20 19:06:15.976037    88'2    2014-03-17 16:39:58.949699

Actions

Copy link

Updated by Quentin M about 9 years ago

Hello there,

Same issue here on a small & (almost) empty 3-nodes cluster with MON+OSD on each. One of the node is currently down.

# ceph --version
ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)

# ceph -s
    cluster 8bd6398b-65a2-4254-bb00-1ff2468d2806
     health HEALTH_WARN 768 pgs degraded; 768 pgs stuck unclean; recovery -20/15 objects degraded (-133.333%); 1 mons down, quorum 0,1 ceph-1,ceph-2
     monmap e3: 3 mons at {ceph-1=192.168.1.1:6789/0,ceph-2=192.168.1.2:6789/0,ceph-3=192.168.1.3:6789/0}, election epoch 150, quorum 0,1 ceph-1,ceph-2
     osdmap e320: 3 osds: 2 up, 2 in
      pgmap v2375: 768 pgs, 6 pools, 32 bytes data, 5 objects
            10333 MB used, 340 GB / 350 GB avail
            -20/15 objects degraded (-133.333%)
                 768 active+degraded

# ceph osd tree

# id    weight    type name    up/down    reweight
-1    0.51    root default
-2    0.17        host ceph-1
1    0.17            osd.1    up    1    
-3    0.17        host ceph-2
2    0.17            osd.2    up    1    
-4    0.17        host ceph-3
4    0.17            osd.4    down    0

# ceph osd dump
epoch 320
fsid 8bd6398b-65a2-4254-bb00-1ff2468d2806
created 2015-01-26 21:25:52.864081
modified 2015-02-11 17:01:31.634283
flags 
pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 79 flags hashpspool crash_replay_interval 45 stripe_width 0
pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 77 flags hashpspool stripe_width 0
pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 81 flags hashpspool stripe_width 0
pool 3 'images' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 313 flags hashpspool stripe_width 0
    removed_snaps [1~3]
pool 4 'volumes' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 296 flags hashpspool stripe_width 0
pool 5 'vms' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 302 flags hashpspool stripe_width 0
max_osd 7
osd.1 up   in  weight 1 up_from 318 up_thru 318 down_at 310 last_clean_interval [293,303) 192.168.1.1:6800/25249 172.16.64.1:6800/25249 172.16.64.1:6801/25249 192.168.1.1:6801/25249 exists,up c24bdef1-d279-4176-966a-1931a51d5315
osd.2 up   in  weight 1 up_from 316 up_thru 318 down_at 315 last_clean_interval [308,313) 192.168.1.2:6800/2422 172.16.64.2:6800/2422 172.16.64.2:6801/2422 192.168.1.2:6801/2422 exists,up abe9ca5c-f060-4cd9-a05a-207ebd6ef2fc
osd.4 down out weight 0 up_from 305 up_thru 310 down_at 314 last_clean_interval [284,303) 192.168.1.3:6800/3662 172.16.64.3:6800/3662 172.16.64.3:6801/3662 192.168.1.3:6801/3662 autoout,exists 402cc9bc-fbfb-40bd-a51f-84626ff39f05

Have Fun.

Actions

Copy link

Updated by Sage Weil about 7 years ago

Status changed from New to Resolved

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #5884

negative num_objects_degraded in pool stats

Updated by John Spray about 10 years ago

Updated by John Spray about 10 years ago

Updated by Quentin M about 9 years ago

Updated by Sage Weil about 7 years ago