Project

General

Profile

Bug #56626

"ceph fs volume create" fails with error ERANGE

Added by Victoria Martinez de la Cruz 5 months ago. Updated 4 months ago.

Status:
Closed
Priority:
Normal
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
mgr/volumes
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Trying to create a CephFS filesystem within a cluster deployed with cephadm fails

Steps followed

1. sudo cephadm --image "quincy" bootstrap --fsid <fsid> --config <bootstrap-config> --output-config <ceph-config> --output-keyring <keyring> --output-pub-ssh-key <pub-key> --allow-overwrite --allow-fqdn-hostname --skip-monitoring-stack --skip-dashboard --single-host-defaults --skip-firewalld --skip-mon-network --mon-ip <host-ip>

bootstrap.config

[global]
log to file = true
osd crush chooseleaf type = 0
osd_pool_default_pg_num = 8
osd_pool_default_pgp_num = 8
osd_pool_default_size = 1
[mon]
mon_warn_on_pool_no_redundancy = False
[osd]
osd_memory_target_autotune = true
osd_numa_auto_affinity = true
[mgr]
mgr/cephadm/autotune_memory_target_ratio = 0.2

2. sudo cephadm shell --fsid <fsid> --config <ceph_config> --keyring <keyring> -- ceph orch daemon add osd "<hostname>:<osd>"

3. sudo cephadm shell --fsid <fsid> --config <ceph_config> --keyring <keyring> -- ceph fs volume create <my-fs>

Expected output: new filesystem is created successfully, with the data and metadata pools

Actual output: Error ERANGE: 'pgp_num' must be greater than 0 and lower or equal than 'pg_num', which in this case is 1

Environment status:

root@quincy-cephadm:/# ceph -s
cluster:
id: 4cbcbb65-3109-47ee-bd9a-2aab3f4debe7
health: HEALTH_OK

services:
mon: 1 daemons, quorum quincy-cephadm (age 19m)
mgr: quincy-cephadm.jbzzql(active, since 18m), standbys: quincy-cephadm.telyok
osd: 1 osds: 1 up (since 17m), 1 in (since 18m)
data:
pools: 1 pools, 1 pgs
objects: 2 objects, 449 KiB
usage: 20 MiB used, 30 GiB / 30 GiB avail
pgs: 1 active+clean

Image used: v17.2.1 in Quay

History

#1 Updated by Patrick Donnelly 5 months ago

  • Category set to Correctness/Safety
  • Assignee set to Kotresh Hiremath Ravishankar
  • Target version changed from v17.2.2 to v18.0.0
  • Backport set to quincy,pacific
  • Component(FS) mgr/volumes added

Kotresh, PTAL.

#2 Updated by Patrick Donnelly 5 months ago

  • Status changed from New to Triaged

#3 Updated by Kotresh Hiremath Ravishankar 4 months ago

Hi Victoria,

I am not very familiar with the osd configs but as per code if 'osd_pool_default_pg_autoscale_mode' is 'on', the 'pg_num' is set to 1. Please see the code below

int OSDMonitor::prepare_new_pool(....)
{
...
...
  if (pg_num == 0) {
    auto pg_num_from_mode =
      [pg_num=g_conf().get_val<uint64_t>("osd_pool_default_pg_num")]
      (const string& mode) {
      return mode == "on" ? 1 : pg_num;
    };
    pg_num = pg_num_from_mode(
      pg_autoscale_mode.empty() ?
      g_conf().get_val<string>("osd_pool_default_pg_autoscale_mode") :
      pg_autoscale_mode);
  }
...
...

According to your config 'osd_pool_default_pgp_num = 8' which is greater than the 'pg_num' set (i.e 8 > 1). Setting 'osd_pool_default_pgp_num = 1' should fix the issue.
Please see the code below.

int OSDMonitor::prepare_new_pool(....)
{
...
...
  if (pgp_num > pg_num) {
    *ss << "'pgp_num' must be greater than 0 and lower or equal than 'pg_num'" 
        << ", which in this case is " << pg_num;
    return -ERANGE;
  }
... 
</prev>

#4 Updated by Kotresh Hiremath Ravishankar 4 months ago

  • Status changed from Triaged to In Progress

#5 Updated by Ramana Raja 4 months ago

  • Status changed from In Progress to Need More Info

This does not seem like a bug in the mgr/volumes code. The mgr/volumes module creates FS pools using `osd pool create <pool-name>` with no additional CLI arguments. As Kotresh mentioned, the bootstrap_config setting in devstack-plugin-ceph need to be changed.

I suggest removing the settings 'osd_pool_default_pg_num = 8' and 'osd_pool_default_pgp_num = 8' from the file src/devstack/lib/cephadm. See, https://opendev.org/openstack/devstack-plugin-ceph/src/commit/400b1011be405b240be52c6d61f5257e3a0e39b5/devstack/lib/cephadm#L162 . If 'osd_pool_default_pgp_num' is not set, then pgp_num is set to follow pg_num. See, https://github.com/ceph/ceph/commit/0ff1ef3de76e#diff-c44edb74ef654677b989657a8dad6cd87b952012f21fc055c2f531e7ab637b3bR6689

Victoria, can you check whether my suggestion works?

#6 Updated by Victoria Martinez de la Cruz 4 months ago

Tested the deployment with "osd_pool_default_pg_autoscale_mode = off" in bootstrap_conf and seems to fix the issue. Has this option been changed to on between Pacific and Quincy? Is setting this value to off advised or should I try a different config?

Current config and cluster status is

stack@devstack-quincy:~$ cat bootstrap_ceph.conf
[global]
log to file = true
osd crush chooseleaf type = 0
osd_pool_default_pg_num = 8
osd_pool_default_pgp_num = 8
osd_pool_default_size = 1
osd_pool_default_pg_autoscale_mode = off
[mon]
mon_warn_on_pool_no_redundancy = False
[osd]
osd_memory_target_autotune = true
osd_numa_auto_affinity = true
[mgr]
mgr/cephadm/autotune_memory_target_ratio = 0.2

stack@devstack-quincy:~/devstack$ sudo cephadm shell
Inferring fsid ddc1d35b-4f75-4956-a55d-6a9d7e77f1d1
Inferring config /var/lib/ceph/ddc1d35b-4f75-4956-a55d-6a9d7e77f1d1/mon.devstack-quincy/config
Using ceph image with id 'e5af760fa1c1' and tag 'v17.2.1' created on 2022-06-23 19:49:45 +0000 UTC
quay.io/ceph/ceph@sha256:d3f3e1b59a304a280a3a81641ca730982da141dad41e942631e4c5d88711a66b
root@devstack-quincy:/# ceph -s
cluster:
id: ddc1d35b-4f75-4956-a55d-6a9d7e77f1d1
health: HEALTH_OK

services:
mon: 1 daemons, quorum devstack-quincy (age 15m)
mgr: devstack-quincy.repaot(active, since 14m), standbys: devstack-quincy.marjpi
mds: 1/1 daemons up, 1 standby
osd: 1 osds: 1 up (since 13m), 1 in (since 14m)
data:
volumes: 1/1 healthy
pools: 4 pools, 25 pgs
objects: 27 objects, 451 KiB
usage: 21 MiB used, 30 GiB / 30 GiB avail
pgs: 25 active+clean
io:
client: 85 B/s rd, 0 op/s rd, 0 op/s wr

#7 Updated by Ramana Raja 4 months ago

Once we confirm that removing osd_pool_default_pgp_num and osd_pool_default_pg_num in devstack-plugin-ceph works, we can close this tracker.
https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/851521

#8 Updated by Ramana Raja 4 months ago

  • Status changed from Need More Info to Closed

Closing the bug. Changes in devstack-plugin-ceph, https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/851521 fixed the issue in the OpenStack upstream CI.

Also available in: Atom PDF