Project

General

Profile

Bug #43296

Ceph assimilate-conf results in config entries which can not be removed

Added by David Herselman 2 months ago. Updated 7 days ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
nautilus
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature:

Description

We assimilated our Ceph configuration file and subsequently have a minimal config file. We are subsequently not able to remove configuration entries.

Present on 7 Ceph 14.2.4 clusters and confirmed by Stefan Kooman to also have affected him when he upgraded to Mimic.

Commands:

cd /etc/pve;
ceph config assimilate-conf -i ceph.conf -o ceph.conf.new;
mv ceph.conf.new ceph.conf;

Minimal config file:

[global]
         cluster_network = 10.248.1.0/24
         filestore_xattr_use_omap = true
         fsid = 31f6ea46-12cb-47e8-a6f3-60fb6bbd1782
         mon_host = 10.248.1.60 10.248.1.61 10.248.1.62
         public_network = 10.248.1.0/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

Configuration dump:

[admin@kvm1b ~]# ceph config dump
WHO    MASK LEVEL    OPTION                             VALUE          RO
global      advanced auth_client_required               cephx          *
global      advanced auth_cluster_required              cephx          *
global      advanced auth_service_required              cephx          *
global      advanced cluster_network                    10.248.1.0/24  *
global      advanced debug_filestore                    0/0
global      advanced debug_journal                      0/0
global      advanced debug_ms                           0/0
global      advanced debug_osd                          0/0
global      basic    device_failure_prediction_mode     cloud
global      advanced mon_allow_pool_delete              true
global      advanced mon_osd_down_out_subtree_limit     host
global      advanced osd_deep_scrub_interval            1209600.000000
global      advanced osd_pool_default_min_size          2
global      advanced osd_pool_default_size              3
global      advanced osd_scrub_begin_hour               19
global      advanced osd_scrub_end_hour                 6
global      advanced osd_scrub_sleep                    0.100000
global      advanced public_network                     10.248.1.0/24  *
global      advanced rbd_default_features               7
global      advanced rbd_default_features               31
  mgr       advanced mgr/balancer/active                true
  mgr       advanced mgr/balancer/mode                  upmap
  mgr       advanced mgr/devicehealth/enable_monitoring true

Note the duplicate 'rdb_default_features' entry.

We're not able to remove the original 'rbd_default_features' entry that the assimilate-conf command created:

[admin@kvm1b ~]# ceph config dump | grep -e WHO -e rbd_default_features
WHO    MASK LEVEL    OPTION                             VALUE          RO
global      advanced rbd_default_features               7

[admin@kvm1b ~]# ceph config rm global rbd_default_features
[admin@kvm1b ~]# ceph config rm global rbd_default_features
[admin@kvm1b ~]# ceph config rm global rbd_default_features

[admin@kvm1b ~]# ceph config dump | grep -e WHO -e rbd_default_features
WHO    MASK LEVEL    OPTION                             VALUE          RO
global      advanced rbd_default_features               7

[admin@kvm1b ~]# ceph config set global rbd_default_features 31
[admin@kvm1b ~]# ceph config dump | grep -e WHO -e rbd_default_features
WHO    MASK LEVEL    OPTION                             VALUE          RO
global      advanced rbd_default_features               7
global      advanced rbd_default_features               31

[admin@kvm1b ~]# ceph config rm global rbd_default_features
[admin@kvm1b ~]# ceph config rm global rbd_default_features
[admin@kvm1b ~]# ceph config rm global rbd_default_features

[admin@kvm1b ~]# ceph config dump | grep -e WHO -e rbd_default_features
WHO    MASK LEVEL    OPTION                             VALUE          RO
global      advanced rbd_default_features               7


Related issues

Copied to RADOS - Backport #43822: nautilus: Ceph assimilate-conf results in config entries which can not be removed Resolved

History

#1 Updated by David Herselman 2 months ago

Setting debug_rdb to 5/5 unfortunately doesn't reveal anything:

Commands:

[root@kvm1a ~]# ceph config dump | grep -e WHO -e rbd_default_features
WHO    MASK LEVEL    OPTION                             VALUE          RO
global      advanced rbd_default_features               7
global      advanced rbd_default_features               31
[root@kvm1a ~]# ceph config rm global rbd_default_features
[root@kvm1a ~]# ceph config rm global rbd_default_features
[root@kvm1a ~]# ceph config rm global rbd_default_features
[root@kvm1a ~]# ceph config rm global rbd_default_features
[root@kvm1a ~]# ceph config dump | grep -e WHO -e rbd_default_features
WHO    MASK LEVEL    OPTION                             VALUE          RO
global      advanced rbd_default_features               7
[root@kvm1a ~]# ceph config set global rbd_default_features 31
[root@kvm1a ~]# ceph config dump | grep -e WHO -e rbd_default_features
WHO    MASK LEVEL    OPTION                             VALUE          RO
global      advanced rbd_default_features               7
global      advanced rbd_default_features               31

Monitor log - kvm1a:

2019-12-13 18:35:26.293 7feaac2f5700  0 mon.kvm1a@0(leader) e4 handle_command mon_command({"prefix": "config rm", "who": "global", "name": "rbd_default_features"} v 0) v1
2019-12-13 18:35:26.293 7feaac2f5700  0 log_channel(audit) log [INF] : from='client.? ' entity='client.admin' cmd=[{"prefix": "config rm", "who": "global", "name": "rbd_default_features"}]: dispatch
2019-12-13 18:35:26.305 7feaaaaf2700  0 log_channel(audit) log [INF] : from='client.? ' entity='client.admin' cmd='[{"prefix": "config rm", "who": "global", "name": "rbd_default_features"}]': finished
2019-12-13 18:35:30.401 7feaac2f5700  0 mon.kvm1a@0(leader) e4 handle_command mon_command({"prefix": "config rm", "who": "global", "name": "rbd_default_features"} v 0) v1
2019-12-13 18:35:30.401 7feaac2f5700  0 log_channel(audit) log [INF] : from='client.? 10.248.1.60:0/2894657506' entity='client.admin' cmd=[{"prefix": "config rm", "who": "global", "name": "rbd_default_features"}]: dispatch
2019-12-13 18:35:32.289 7feaac2f5700  0 mon.kvm1a@0(leader) e4 handle_command mon_command({"prefix": "config rm", "who": "global", "name": "rbd_default_features"} v 0) v1
2019-12-13 18:35:32.289 7feaac2f5700  0 log_channel(audit) log [INF] : from='client.? 10.248.1.60:0/3241273675' entity='client.admin' cmd=[{"prefix": "config rm", "who": "global", "name": "rbd_default_features"}]: dispatch
2019-12-13 18:35:33.833 7feaac2f5700  0 mon.kvm1a@0(leader) e4 handle_command mon_command({"prefix": "config rm", "who": "global", "name": "rbd_default_features"} v 0) v1
2019-12-13 18:35:33.833 7feaac2f5700  0 log_channel(audit) log [INF] : from='client.? ' entity='client.admin' cmd=[{"prefix": "config rm", "who": "global", "name": "rbd_default_features"}]: dispatch
2019-12-13 18:35:41.689 7feaac2f5700  0 mon.kvm1a@0(leader) e4 handle_command mon_command({"prefix": "config set", "who": "global", "name": "rbd_default_features", "value": "31"} v 0) v1
2019-12-13 18:35:41.689 7feaac2f5700  0 log_channel(audit) log [INF] : from='client.? 10.248.1.60:0/1930731897' entity='client.admin' cmd=[{"prefix": "config set", "who": "global", "name": "rbd_default_features", "value": "31"}]: dispatch
2019-12-13 18:35:41.705 7feaaaaf2700  0 log_channel(audit) log [INF] : from='client.? 10.248.1.60:0/1930731897' entity='client.admin' cmd='[{"prefix": "config set", "who": "global", "name": "rbd_default_features", "value": "31"}]': finished
2019-12-13 18:35:42.497 7feaac2f5700  0 mon.kvm1a@0(leader) e4 handle_command mon_command({"prefix": "config dump"} v 0) v1
2019-12-13 18:35:42.497 7feaac2f5700  0 log_channel(audit) log [DBG] : from='client.? 10.248.1.60:0/1087170905' entity='client.admin' cmd=[{"prefix": "config dump"}]: dispatch

Monitor log - kvm1b:

2019-12-13 18:35:26.294 7fc9eff72700  0 mon.kvm1b@1(peon) e4 handle_command mon_command({"prefix": "config rm", "who": "global", "name": "rbd_default_features"} v 0) v1
2019-12-13 18:35:26.294 7fc9eff72700  0 log_channel(audit) log [INF] : from='client.? 10.248.1.60:0/465427509' entity='client.admin' cmd=[{"prefix": "config rm", "who": "global", "name": "rbd_default_features"}]: dispatch
2019-12-13 18:35:40.826 7fc9eff72700  0 mon.kvm1b@1(peon) e4 handle_command mon_command({"prefix": "config dump"} v 0) v1
2019-12-13 18:35:40.826 7fc9eff72700  0 log_channel(audit) log [DBG] : from='client.? 10.248.1.60:0/3230089280' entity='client.admin' cmd=[{"prefix": "config dump"}]: dispatch

Monitor log - kvm1c:

2019-12-13 18:35:25.668 7f3dea759700  0 mon.kvm1c@2(peon) e4 handle_command mon_command({"prefix": "config dump"} v 0) v1
2019-12-13 18:35:25.668 7f3dea759700  0 log_channel(audit) log [DBG] : from='client.? 10.248.1.60:0/214678714' entity='client.admin' cmd=[{"prefix": "config dump"}]: dispatch
2019-12-13 18:35:33.836 7f3dea759700  0 mon.kvm1c@2(peon) e4 handle_command mon_command({"prefix": "config rm", "who": "global", "name": "rbd_default_features"} v 0) v1
2019-12-13 18:35:33.836 7f3dea759700  0 log_channel(audit) log [INF] : from='client.? 10.248.1.60:0/3381809958' entity='client.admin' cmd=[{"prefix": "config rm", "who": "global", "name": "rbd_default_features"}]: dispatch

Cluster is healthy:

[admin@kvm1c ~]# ceph -s
  cluster:
    id:     31f6ea46-12cb-47e8-a6f3-60fb6bbd1782
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum kvm1a,kvm1b,kvm1c (age 3d)
    mgr: kvm1c(active, since 3d), standbys: kvm1b, kvm1a
    mds: cephfs:1 {0=kvm1c=up:active} 2 up:standby
    osd: 30 osds: 30 up (since 14h), 30 in (since 3d)

  data:
    pools:   8 pools, 417 pgs
    objects: 1.49M objects, 5.4 TiB
    usage:   15 TiB used, 75 TiB / 90 TiB avail
    pgs:     289 active+clean
             91  active+clean+snaptrim_wait
             37  active+clean+snaptrim

  io:
    client:   2.6 MiB/s rd, 135 MiB/s wr, 43 op/s rd, 164 op/s wr
    cache:    130 MiB/s flush

#2 Updated by David Herselman 2 months ago

Alwin from Proxmox provided a work around but this still appears to be a bug:
https://forum.proxmox.com/threads/ceph-unable-to-remove-config-entry.61470/

[admin@kvm1a ~]# ceph config rm global rbd_default_features; ceph config-key rm config/global/rbd_default_features; ceph config dump | grep -e WHO -e rbd_default_features; ceph config set global rbd_default_features 31; ceph config dump | grep -e WHO -e rbd_default_features
key deleted
WHO    MASK LEVEL    OPTION                             VALUE          RO
WHO    MASK LEVEL    OPTION                             VALUE          RO
global      advanced rbd_default_features               31

#3 Updated by Greg Farnum 2 months ago

  • Project changed from Ceph to RADOS
  • Category deleted (common)

#4 Updated by Patrick Donnelly 2 months ago

  • Project changed from RADOS to Ceph

Might be related to #42964?

#5 Updated by Patrick Donnelly 2 months ago

  • Project changed from Ceph to RADOS
  • Component(RADOS) Monitor added

#6 Updated by Sage Weil 2 months ago

  • Status changed from New to Need More Info

Can you attach the (relevant) output from "ceph config-key dump | grep config"? I think the keys are being installed in the wrong location for the global options (things changed a bit between mimic and nautilus IIRC; the bug probably is related to that change).

#7 Updated by Sage Weil 2 months ago

  • Assignee set to Sage Weil
  • Priority changed from Normal to High

#8 Updated by Sage Weil about 1 month ago

  • Status changed from Need More Info to Fix Under Review
  • Pull request ID set to 32726

#9 Updated by Nathan Cutler about 1 month ago

  • Backport set to nautilus

adding nautilus backport since the bug was reported against that version

#10 Updated by Sage Weil 30 days ago

  • Pull request ID changed from 32726 to 32786

#11 Updated by Sage Weil 28 days ago

  • Status changed from Fix Under Review to Pending Backport

#12 Updated by Nathan Cutler 26 days ago

  • Copied to Backport #43822: nautilus: Ceph assimilate-conf results in config entries which can not be removed added

#13 Updated by Nathan Cutler 7 days ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF