Project

General

Profile

Actions

Bug #24224

closed

The cluster does not go into the OK state

Added by Vasilii Alekseenko almost 6 years ago. Updated almost 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Tags:
rule, rack
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-disk
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have a test cluster Ceph in a virtual environment.

  cluster:
    id:     22d6464d-f137-423e-b8aa-bec5e9219755
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum cn1,cn2,cn3
    mgr: cn1(active)
    osd: 12 osds: 12 up, 12 in

  data:
    pools:   1 pools, 256 pgs
    objects: 5 objects, 487 kB
    usage:   61914 MB used, 60845 MB / 119 GB avail
    pgs:     256 active+clean

$ ceph versions

{
    "mon": {
        "ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)": 3
    },
    "mgr": {
        "ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)": 1
    },
    "osd": {
        "ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)": 12
    },
    "mds": {},
    "overall": {
        "ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)": 16
    }
}

I created a hierarchy using commands

$ ceph osd crush add-bucket rack1 rack
$ ceph osd crush add-bucket rack2 rack
$ ceph osd crush move cn1 rack=rack1

...

Final configuration
$ ceph osd tree

ID  CLASS WEIGHT  TYPE NAME          STATUS REWEIGHT PRI-AFF 
 -1       0.11755 root default                               
-15       0.05878     rack rack1                             
 -2       0.01959         host cn1                           
  0   hdd 0.00980             osd.0      up  1.00000 1.00000 
  1   hdd 0.00980             osd.1      up  1.00000 1.00000 
 -4       0.01959         host cn3                           
  4   hdd 0.00980             osd.4      up  1.00000 1.00000 
  5   hdd 0.00980             osd.5      up  1.00000 1.00000 
 -6       0.01959         host cn5                           
  8   hdd 0.00980             osd.8      up  1.00000 1.00000 
  9   hdd 0.00980             osd.9      up  1.00000 1.00000 
-16       0.05878     rack rack2                             
 -3       0.01959         host cn2                           
  2   hdd 0.00980             osd.2      up  1.00000 1.00000 
  3   hdd 0.00980             osd.3      up  1.00000 1.00000 
 -5       0.01959         host cn4                           
  6   hdd 0.00980             osd.6      up  1.00000 1.00000 
  7   hdd 0.00980             osd.7      up  1.00000 1.00000 
 -7       0.01959         host cn6                           
 10   hdd 0.00980             osd.10     up  1.00000 1.00000 
 11   hdd 0.00980             osd.11     up  1.00000 1.00000

The current rule - "replicated_ruleset" with type:host

I created a new rule with a point of failure - "rack".

$ ceph osd crush rule create-replicated RackStar default rack
$ ceph osd pool set rbd crush_rule RackStar

set pool 0 crush_rule to RackStar

$ ceph osd dump | grep rule

pool 0 'rbd' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 256 pgp_num 256 last_change 291 flags hashpspool stripe_width 0 application rbd

My cluster does not go into the state "HEALTH_OK". I rebooted all the server's in my cluster, but the problem does not disappear.

 cluster:
    id:     22d6464d-f137-423e-b8aa-bec5e9219755
    health: HEALTH_WARN
            2/6 objects misplaced (33.333%)

  services:
    mon: 3 daemons, quorum cn1,cn2,cn3
    mgr: cn1(active)
    osd: 12 osds: 12 up, 12 in; 256 remapped pgs

  data:
    pools:   1 pools, 256 pgs
    objects: 2 objects, 19 bytes
    usage:   61918 MB used, 60841 MB / 119 GB avail
    pgs:     2/6 objects misplaced (33.333%)
             256 active+clean+remapped

video shows the problem in more detail [[https://youtu.be/UtM7vItjsWY]]

Actions #1

Updated by John Spray almost 6 years ago

  • Status changed from New to Closed

Thanks for the comprehensive information. In this case, you've created a rule that requests each copy is on a separate rack, and a pool that requires three copies. However, you only have two racks, so Ceph can't satisfy that -- you'd need a third rack to have three copies on separate racks.

If what you really want is two copies on one rack and one copy on another rack, then you can construct a slightly more complicated rule to do that -- ask on ceph-users for advice.

Actions

Also available in: Atom PDF