Project

General

Profile

Actions

Bug #35808

closed

ceph osd ok-to-stop result dosen't match the real situation

Added by frank lin over 5 years ago. Updated over 3 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
David Zafman
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The cluster is in healthy status, when I tried to run ceph osd ok-to-stop 0 it returns

Error EBUSY: 4 PGs are already degraded or might become unavailable

ceph -s show there is no pg in degraded status
 cluster:
    id:     6b204640-60fb-4ed6-bb06-fe67e3c2ac1f
    health: HEALTH_WARN
            noout flag(s) set

  services:
    mon:         1 daemons, quorum mnv001
    mgr:         mnv001(active)
    osd:         44 osds: 44 up, 44 in
                 flags noout
    tcmu-runner: 5 daemons active

  data:
    pools:   5 pools, 1472 pgs
    objects: 49241k objects, 192 TB
    usage:   288 TB used, 111 TB / 400 TB avail
    pgs:     1472 active+clean

after I set osd.0 out I get this result:

  cluster:
    id:     6b204640-60fb-4ed6-bb06-fe67e3c2ac1f
    health: HEALTH_WARN
            noout flag(s) set
            1 osds down
            6835800/302432789 objects misplaced (2.260%)
            Reduced data availability: 5 pgs incomplete
            Degraded data redundancy: 6889753/302432789 objects degraded (2.278%), 162 pgs degraded

  services:
    mon:         1 daemons, quorum mnv001
    mgr:         mnv001(active)
    osd:         44 osds: 43 up, 43 in; 142 remapped pgs
                 flags noout
    tcmu-runner: 5 daemons active

  data:
    pools:   5 pools, 1472 pgs
    objects: 49241k objects, 192 TB
    usage:   288 TB used, 111 TB / 400 TB avail
    pgs:     0.340% pgs not active
             6889753/302432789 objects degraded (2.278%)
             6835800/302432789 objects misplaced (2.260%)
             1156 active+clean
             160  active+undersized+degraded
             124  active+clean+remapped
             18   active+remapped+backfilling
             7    active+undersized
             5    incomplete
             2    active+recovering+degraded

  io:
    client:   2831 B/s rd, 2 op/s rd, 0 op/s wr
    recovery: 181 MB/s, 53 objects/s

ceph health detail shows 5 incomplete pgs are:

    pg 5.1 is incomplete, acting [33,2147483647,35] (reducing pool rbd_pool_test min_size from 3 may help; search ceph.com/docs for 'incomplete')
    pg 5.4 is incomplete, acting [45,2147483647,18] (reducing pool rbd_pool_test min_size from 3 may help; search ceph.com/docs for 'incomplete')
    pg 5.5 is incomplete, acting [2147483647,10,29] (reducing pool rbd_pool_test min_size from 3 may help; search ceph.com/docs for 'incomplete')
    pg 5.16 is incomplete, acting [15,9,2147483647] (reducing pool rbd_pool_test min_size from 3 may help; search ceph.com/docs for 'incomplete')
    pg 5.1e is incomplete, acting [2147483647,34,11] (reducing pool rbd_pool_test min_size from 3 may help; search ceph.com/docs for 'incomplete')

Is this a bug or expected?


Related issues 1 (0 open1 closed)

Follows RADOS - Bug #39099: Give recovery for inactive PGs a higher priorityResolvedDavid Zafman04/03/2019

Actions
Actions

Also available in: Atom PDF