Project

General

Profile

Actions

Bug #47758

closed

fail to create OSDs because the requested extent is too large

Added by Kiefer Chang over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific,octopus,nautilus
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This was observed in this e2e test failure: https://tracker.ceph.com/issues/47742

Some context about this e2d test:
- We need additional disks to test cephadm OSD creation feature. Because there are no sparse disks on smithi node:
- 3 sparse files are created and exported as iSCSI target luns (LIO).
- Login the target, 3 disks appear on host.
- Dashboard then ask cephadm to create 3 new OSDs with these disks.

[2020-10-06 06:55:10,428][ceph_volume.main][INFO  ] Running command: ceph-volume  lvm batch --no-auto /dev/sdf --yes --no-systemd
...

[2020-10-06 06:55:10,958][ceph_volume.process][INFO  ] Running command: /usr/bin/lsblk --nodeps -P -o NAME,KNAME,MAJ:MIN,FSTYPE,MOUNTPOINT,LABEL,UUID,RO,RM,MODEL,SIZE,STATE,OWNER,GROUP,MODE
,ALIGNMENT,PHY-SEC,LOG-SEC,ROTA,SCHED,TYPE,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,PKNAME,PARTLABEL /dev/sdf
[2020-10-06 06:55:10,961][ceph_volume.process][INFO  ] stdout NAME="sdf" KNAME="sdf" MAJ:MIN="8:80" FSTYPE="" MOUNTPOINT="" LABEL="" UUID="" RO="0" RM="0" MODEL="lun1            " SIZE="15G
" STATE="running" OWNER="root" GROUP="disk" MODE="brw-rw----" ALIGNMENT="0" PHY-SEC="512" LOG-SEC="512" ROTA="1" SCHED="mq-deadline" TYPE="disk" DISC-ALN="0" DISC-GRAN="0B" DISC-MAX="0B" DI
SC-ZERO="0" PKNAME="" PARTLABEL="" 
[2020-10-06 06:55:10,961][ceph_volume.devices.lvm.prepare][DEBUG ] data device size: 15.00 GB
[2020-10-06 06:55:10,961][ceph_volume.process][INFO  ] Running command: /usr/sbin/pvs --noheadings --readonly --units=b --nosuffix --separator=";" -o vg_name,pv_count,lv_count,vg_attr,vg_ex
tent_count,vg_free_count,vg_extent_size /dev/sdf
[2020-10-06 06:55:10,973][ceph_volume.process][INFO  ] stderr Failed to find physical volume "/dev/sdf".
[2020-10-06 06:55:10,974][ceph_volume.process][INFO  ] Running command: /usr/sbin/vgcreate --force --yes ceph-755d5f48-33f7-4bc2-b909-d8e2bea211d7 /dev/sdf
[2020-10-06 06:55:10,986][ceph_volume.process][INFO  ] stdout Physical volume "/dev/sdf" successfully created.
[2020-10-06 06:55:10,999][ceph_volume.process][INFO  ] stdout Volume group "ceph-755d5f48-33f7-4bc2-b909-d8e2bea211d7" successfully created
[2020-10-06 06:55:11,007][ceph_volume.process][INFO  ] Running command: /usr/sbin/vgs --noheadings --readonly --units=b --nosuffix --separator=";" -S vg_name=ceph-755d5f48-33f7-4bc2-b909-d8e2bea211d7 -o vg_name,pv_count,lv_count,vg_attr,vg_extent_count,vg_free_count,vg_extent_size
[2020-10-06 06:55:11,026][ceph_volume.process][INFO  ] stdout ceph-755d5f48-33f7-4bc2-b909-d8e2bea211d7";"1";"0";"wz--n-";"3838";"3838";"4194304
[2020-10-06 06:55:11,026][ceph_volume.api.lvm][DEBUG ] size was passed: 15.00 GB -> 3839
[2020-10-06 06:55:11,027][ceph_volume.process][INFO  ] Running command: /usr/sbin/lvcreate --yes -l 3839 -n osd-block-21867cd0-f8ad-4f5e-ad41-18510386c0c5 ceph-755d5f48-33f7-4bc2-b909-d8e2bea211d7
[2020-10-06 06:55:11,062][ceph_volume.process][INFO  ] stderr Volume group "ceph-755d5f48-33f7-4bc2-b909-d8e2bea211d7" has insufficient free space (3838 extents): 3839 required.
[2020-10-06 06:55:11,066][ceph_volume.devices.lvm.prepare][ERROR ] lvm prepare was unable to complete
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py", line 250, in safe_prepare
    self.prepare()
  File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py", line 361, in prepare
    block_lv = self.prepare_data_device('block', osd_fsid)
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/prepare.py", line 219, in prepare_data_device
    **kwargs)
  File "/usr/lib/python3.6/site-packages/ceph_volume/api/lvm.py", line 949, in create_lv
    process.run(command)
  File "/usr/lib/python3.6/site-packages/ceph_volume/process.py", line 153, in run
    raise RuntimeError(msg)
RuntimeError: command returned non-zero exit status: 5

The lvcreate command asks for one more extent (3838 vs 3839 in this example).


Related issues 5 (0 open5 closed)

Related to Dashboard - Bug #47742: cephadm/test_dashboard_e2e.sh: OSDs are not createdResolvedKiefer Chang

Actions
Has duplicate ceph-volume - Bug #48383: OSD creation fails because volume group has insufficient free space to place a logical volumeDuplicateJuan Miguel Olmo Martínez

Actions
Copied to ceph-volume - Backport #49140: nautilus: fail to create OSDs because the requested extent is too largeResolvedJan FajerskiActions
Copied to ceph-volume - Backport #49141: octopus: fail to create OSDs because the requested extent is too largeResolvedJan FajerskiActions
Copied to ceph-volume - Backport #49142: pacific: fail to create OSDs because the requested extent is too largeResolvedJan FajerskiActions
Actions #1

Updated by Kiefer Chang over 3 years ago

  • Related to Bug #47742: cephadm/test_dashboard_e2e.sh: OSDs are not created added
Actions #2

Updated by Kiefer Chang over 3 years ago

The cause to this problem is the underlying block device reports a larger optimal I/O size (8M) than default extent size (4M), which makes pvcreate align the VG fisrt offset to 8M.

ceph-volume calculates VG's extent count by assuming space after LVM metadata is all usable. Thus it requests one more extent than what the PV can allocate.

 
[root@mgr0 ~]# lsblk -b /dev/sde /dev/sdf -o NAME,VENDOR,SIZE,MIN-IO,OPT-IO
NAME VENDOR          SIZE MIN-IO  OPT-IO
sde  ATA      16106127360    512       0         <--- qemu 15 GB disk
sdf  LIO-ORG  16106127360    512 8388608         <--- iSCSI 15 GB disk

        Optimal I/O Size
+----          8MB        ----+
                                                              16106127360 bytes = 15360 MB
+-------------+---------------+---------------------------------------------+
|LVM metadata |   reserved    |              VG                             |
|             |               |                                             |
+-------------+---------------+---------------------------------------------+

     4 MB            4 MB           (15360 MB - 8MB)/4MB = 3838 extents

ceph-volume calculates extents: (15360-4)/4 = 3839

Not sure if there are ordinary disks report larger optimal I/O size, I can change the reported size to workaround #47742.

Actions #3

Updated by Jan Fajerski over 3 years ago

hmm ok interesting. I'd propose we work around that for now in the test. In the real world we're unlikely to encounter this scenario.

Lets keep this issue open, since the free size calculation could certainly be a bit smarter.

Sound good?

Actions #4

Updated by Kiefer Chang over 3 years ago

Jan Fajerski wrote:

hmm ok interesting. I'd propose we work around that for now in the test. In the real world we're unlikely to encounter this scenario.

Lets keep this issue open, since the free size calculation could certainly be a bit smarter.

Sound good?

Yeah, a PR was created to workaround the test: https://github.com/ceph/ceph/pull/37575

Actions #5

Updated by Jan Fajerski over 3 years ago

  • Is duplicate of Bug #48383: OSD creation fails because volume group has insufficient free space to place a logical volume added
Actions #6

Updated by Jan Fajerski over 3 years ago

  • Status changed from New to Duplicate
Actions #7

Updated by Jan Fajerski over 3 years ago

  • Is duplicate of deleted (Bug #48383: OSD creation fails because volume group has insufficient free space to place a logical volume)
Actions #8

Updated by Jan Fajerski over 3 years ago

  • Has duplicate Bug #48383: OSD creation fails because volume group has insufficient free space to place a logical volume added
Actions #9

Updated by Jan Fajerski over 3 years ago

  • Status changed from Duplicate to In Progress
  • Assignee set to Jan Fajerski
  • Target version deleted (v16.0.0)
  • Source deleted (Community (dev))
  • Severity deleted (3 - minor)

@Juan was this seen in a CI setup or just a rook cluster?

I'll look into adjusting the size calculation by a potential offset.

Actions #10

Updated by Juan Miguel Olmo Martínez over 3 years ago

Jan Fajerski wrote:

@Juan was this seen in a CI setup or just a rook cluster?

I'll look into adjusting the size calculation by a potential offset.

@Jan, this was seen in a QE cluster , creating OSDs using cephadm.

Actions #11

Updated by Juan Miguel Olmo Martínez over 3 years ago

@Jan, have you seen?:
https://github.com/ceph/ceph/pull/38335

We have verified that this change fixes the issue.

Actions #12

Updated by Jan Fajerski over 3 years ago

Sorry Juan, only now get to look at this. I'm somewhat hesitant to solve it like in your PR. It would be much better to actually consider the offset in the free size calculation.

Can you get the output of pvs -o all and vgs -o all from a node that triggers the bug?

Actions #13

Updated by SUNIL KUMAR NAGARAJU over 3 years ago

Hi @Jan,

I was able to reproduce the issue on node system 'clara013' and please find the below details,

Zap was successful:

2020-12-10 11:12:53,673.673 INFO:teuthology.orchestra.run.clara013.stderr:--> Zapping: /dev/sdd
2020-12-10 11:12:53,674.674 INFO:teuthology.orchestra.run.clara013.stderr:--> --destroy was not specified, but zapping a whole device will remove the partition table
2020-12-10 11:12:53,676.676 INFO:teuthology.orchestra.run.clara013.stderr:Running command: /usr/bin/dd if=/dev/zero of=/dev/sdd bs=1M count=10 conv=fsync
2020-12-10 11:12:53,677.677 INFO:teuthology.orchestra.run.clara013.stderr: stderr: 10+0 records in
2020-12-10 11:12:53,678.678 INFO:teuthology.orchestra.run.clara013.stderr:10+0 records out
2020-12-10 11:12:53,679.679 INFO:teuthology.orchestra.run.clara013.stderr:10485760 bytes (10 MB, 10 MiB) copied, 0.0364648 s, 288 MB/s
2020-12-10 11:12:53,680.680 INFO:teuthology.orchestra.run.clara013.stderr:--> Zapping successful for: <Raw Device: /dev/sdd>

Adding OSD failed:

2020-12-10 11:14:46,166.166 INFO:teuthology.orchestra.run.clara013:> sudo /home/ubuntu/cephtest/cephadm --image registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 92f67ff0-3b01-11eb-95d0-002590fc2776 -- ceph orch daemon add osd clara013:/dev/sdd
2020-12-10 11:14:51,803.803 INFO:teuthology.orchestra.run.clara013.stderr:Error EINVAL: Traceback (most recent call last):
2020-12-10 11:14:51,804.804 INFO:teuthology.orchestra.run.clara013.stderr:  File "/usr/share/ceph/mgr/mgr_module.py", line 1195, in _handle_command
2020-12-10 11:14:51,805.805 INFO:teuthology.orchestra.run.clara013.stderr:    return self.handle_command(inbuf, cmd)
2020-12-10 11:14:51,806.806 INFO:teuthology.orchestra.run.clara013.stderr:  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 141, in handle_command
2020-12-10 11:14:51,808.808 INFO:teuthology.orchestra.run.clara013.stderr:    return dispatch[cmd['prefix']].call(self, cmd, inbuf)
2020-12-10 11:14:51,809.809 INFO:teuthology.orchestra.run.clara013.stderr:  File "/usr/share/ceph/mgr/mgr_module.py", line 332, in call
2020-12-10 11:14:51,810.810 INFO:teuthology.orchestra.run.clara013.stderr:    return self.func(mgr, **kwargs)
2020-12-10 11:14:51,811.811 INFO:teuthology.orchestra.run.clara013.stderr:  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 103, in <lambda>
2020-12-10 11:14:51,812.812 INFO:teuthology.orchestra.run.clara013.stderr:    wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)
2020-12-10 11:14:51,813.813 INFO:teuthology.orchestra.run.clara013.stderr:  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 92, in wrapper
2020-12-10 11:14:51,814.814 INFO:teuthology.orchestra.run.clara013.stderr:    return func(*args, **kwargs)
2020-12-10 11:14:51,815.815 INFO:teuthology.orchestra.run.clara013.stderr:  File "/usr/share/ceph/mgr/orchestrator/module.py", line 753, in _daemon_add_osd
2020-12-10 11:14:51,816.816 INFO:teuthology.orchestra.run.clara013.stderr:    raise_if_exception(completion)
2020-12-10 11:14:51,817.817 INFO:teuthology.orchestra.run.clara013.stderr:  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 643, in raise_if_exception
2020-12-10 11:14:51,818.818 INFO:teuthology.orchestra.run.clara013.stderr:    raise e
2020-12-10 11:14:51,820.820 INFO:teuthology.orchestra.run.clara013.stderr:RuntimeError: cephadm exited with an error code: 1, stderr:/bin/podman:stderr --> passed data devices: 1 physical, 0 LVM
2020-12-10 11:14:51,821.821 INFO:teuthology.orchestra.run.clara013.stderr:/bin/podman:stderr --> relative data size: 1.0
2020-12-10 11:14:51,822.822 INFO:teuthology.orchestra.run.clara013.stderr:/bin/podman:stderr Running command: /usr/bin/ceph-authtool --gen-print-key
2020-12-10 11:14:51,823.823 INFO:teuthology.orchestra.run.clara013.stderr:/bin/podman:stderr Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 3d259541-1df5-4033-b1cb-2daf4e4b725b
2020-12-10 11:14:51,824.824 INFO:teuthology.orchestra.run.clara013.stderr:/bin/podman:stderr Running command: /usr/sbin/vgcreate --force --yes ceph-42377c1f-b0f3-4cbd-a928-672fe70d6c54 /dev/sdd
2020-12-10 11:14:51,825.825 INFO:teuthology.orchestra.run.clara013.stderr:/bin/podman:stderr  stdout: Physical volume "/dev/sdd" successfully created.
2020-12-10 11:14:51,826.826 INFO:teuthology.orchestra.run.clara013.stderr:/bin/podman:stderr  stdout: Volume group "ceph-42377c1f-b0f3-4cbd-a928-672fe70d6c54" successfully created
2020-12-10 11:14:51,827.827 INFO:teuthology.orchestra.run.clara013.stderr:/bin/podman:stderr Running command: /usr/sbin/lvcreate --yes -l 57234 -n osd-block-3d259541-1df5-4033-b1cb-2daf4e4b725b ceph-42377c1f-b0f3-4cbd-a928-672fe70d6c54
2020-12-10 11:14:51,828.828 INFO:teuthology.orchestra.run.clara013.stderr:/bin/podman:stderr  stderr: Volume group "ceph-42377c1f-b0f3-4cbd-a928-672fe70d6c54" has insufficient free space (57233 extents): 57234 required.
2020-12-10 11:14:51,829.829 INFO:teuthology.orchestra.run.clara013.stderr:/bin/podman:stderr --> Was unable to complete a new OSD, will rollback changes
2020-12-10 11:14:51,830.830 INFO:teuthology.orchestra.run.clara013.stderr:/bin/podman:stderr Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.0 --yes-i-really-mean-it
2020-12-10 11:14:51,831.831 INFO:teuthology.orchestra.run.clara013.stderr:/bin/podman:stderr  stderr: purged osd.0
2020-12-10 11:14:51,832.832 INFO:teuthology.orchestra.run.clara013.stderr:/bin/podman:stderr -->  RuntimeError: command returned non-zero exit status: 5
2020-12-10 11:14:51,833.833 INFO:teuthology.orchestra.run.clara013.stderr:Traceback (most recent call last):
2020-12-10 11:14:51,834.834 INFO:teuthology.orchestra.run.clara013.stderr:  File "<stdin>", line 6113, in <module>
2020-12-10 11:14:51,836.836 INFO:teuthology.orchestra.run.clara013.stderr:  File "<stdin>", line 1300, in _infer_fsid
2020-12-10 11:14:51,837.837 INFO:teuthology.orchestra.run.clara013.stderr:  File "<stdin>", line 1383, in _infer_image
2020-12-10 11:14:51,838.838 INFO:teuthology.orchestra.run.clara013.stderr:  File "<stdin>", line 3613, in command_ceph_volume
2020-12-10 11:14:51,839.839 INFO:teuthology.orchestra.run.clara013.stderr:  File "<stdin>", line 1062, in call_throws
2020-12-10 11:14:51,840.840 INFO:teuthology.orchestra.run.clara013.stderr:RuntimeError: Failed command: /bin/podman run --rm --ipc=host --net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk -e CONTAINER_IMAGE=registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest -e NODE_NAME=clara013 -e CEPH_VOLUME_OSDSPEC_AFFINITY=None -v /var/run/ceph/92f67ff0-3b01-11eb-95d0-002590fc2776:/var/run/ceph:z -v /var/log/ceph/92f67ff0-3b01-11eb-95d0-002590fc2776:/var/log/ceph:z -v /var/lib/ceph/92f67ff0-3b01-11eb-95d0-002590fc2776/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /tmp/ceph-tmpxfxmv55z:/etc/ceph/ceph.conf:z -v /tmp/ceph-tmpi4gkzx6y:/var/lib/ceph/bootstrap-osd/ceph.keyring:z registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest lvm batch --no-auto /dev/sdd --yes --no-systemd

[ubuntu@clara013 ~]$ sudo pvs -o all
  Fmt  PV UUID                                DevSize PV         Maj Min PMdaFree  PMdaSize  PExtVsn 1st PE  PSize    PFree    Used Attr Allocatable Exported   Missing    PE    Alloc PV Tags #PMda #PMdaUse BA Start BA Size PInUse Duplicate
  lvm2 j6UFyk-82Te-jRta-X7eD-MNff-GiWG-AHfgnU 223.57g /dev/sdd   8   48    508.00k  1020.00k       2   1.00m <223.57g <223.57g   0  a--  allocatable                       57233     0             1        1       0       0    used          

[ubuntu@clara013 ~]$ sudo vgs -o all
  Fmt  VG UUID                                VG                                        Attr   VPerms     Extendable Exported   Partial    AllocPol   Clustered  Shared  VSize    VFree    SYS ID System ID LockType VLockArgs Ext   #Ext  Free  MaxLV MaxPV #PV #PV Missing #LV #SN Seq VG Tags VProfile #VMda #VMdaUse VMdaFree  VMdaSize  #VMdaCps 
  lvm2 yVeND2-dfQD-3ClC-Heoj-EVG2-15oo-U2bXx4 ceph-42377c1f-b0f3-4cbd-a928-672fe70d6c54 wz--n- writeable  extendable                       normal                        <223.57g <223.57g                                     4.00m 57233 57233     0     0   1           0   0   0   1                      1        1   508.00k  1020.00k unmanaged

[ubuntu@clara013 ~]$ lsblk 
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 223.6G  0 disk 
└─sda1   8:1    0 223.6G  0 part /
sdb      8:16   0 223.6G  0 disk 
sdc      8:32   0 223.6G  0 disk 
sdd      8:48   0 223.6G  0 disk
[ubuntu@clara013 ~]$ sudo lvm pvscan 
  PV /dev/sdd   VG ceph-42377c1f-b0f3-4cbd-a928-672fe70d6c54   lvm2 [<223.57 GiB / <223.57 GiB free]
  Total: 1 [<223.57 GiB] / in use: 1 [<223.57 GiB] / in no VG: 0 [0   ]
Actions #14

Updated by Jan Fajerski over 3 years ago

  • Status changed from In Progress to Fix Under Review
  • Backport set to octopus,nautilus
  • Pull request ID set to 38687

Ok I proposed a PR that I think fixes this bug. Would appreciate if you could run the previous test to confirm.

Actions #15

Updated by Enrico Bocchi over 3 years ago

Hello @Jan,

I wanted to report we hit this bug while instantiating a new cluster with physical SSDs.
I have applied the bugfix you suggested in https://github.com/ceph/ceph/pull/38687 and the creation of the OSDs went through smoothly.

Below some details about the devices from the node that triggered the bug:

# lsscsi 
[1:0:0:0]    disk    ATA      INTEL SSDSCKKB96 1132  /dev/sda 
[14:0:0:0]   disk    ATA      INTEL SSDSC2KB01 0132  /dev/sdb 
[14:0:1:0]   disk    ATA      INTEL SSDSC2KB01 0132  /dev/sdc 
[14:0:2:0]   disk    ATA      INTEL SSDSC2KB01 0132  /dev/sdd 
[14:0:3:0]   disk    ATA      INTEL SSDSC2KB01 0132  /dev/sde 
[14:0:4:0]   disk    ATA      INTEL SSDSC2KB01 0132  /dev/sdf 
[14:0:5:0]   disk    ATA      INTEL SSDSC2KB01 0132  /dev/sdg 
[14:0:6:0]   disk    ATA      INTEL SSDSC2KB01 0132  /dev/sdh 
[14:0:7:0]   disk    ATA      INTEL SSDSC2KB01 0132  /dev/sdi 
[14:0:8:0]   disk    ATA      INTEL SSDSC2KB01 0132  /dev/sdj 
[14:0:9:0]   disk    ATA      INTEL SSDSC2KB01 0132  /dev/sdk 
[14:0:10:0]  disk    ATA      INTEL SSDSC2KB01 0132  /dev/sdl 
[14:0:11:0]  disk    ATA      INTEL SSDSC2KB01 0132  /dev/sdm 
[14:0:12:0]  disk    ATA      INTEL SSDSC2KB01 0132  /dev/sdn 
[14:0:13:0]  disk    ATA      INTEL SSDSC2KB01 0132  /dev/sdo 
[14:0:14:0]  disk    ATA      INTEL SSDSC2KB01 0132  /dev/sdp 
[14:0:15:0]  disk    ATA      INTEL SSDSC2KB01 0132  /dev/sdq 
[14:0:16:0]  enclosu LSI      VirtualSES       03    -       

# lsblk -b /dev/sdb -o NAME,VENDOR,SIZE,MIN-IO,OPT-IO
NAME VENDOR            SIZE MIN-IO OPT-IO
sdb  ATA      1920383410176   4096      0

Creating OSDs fails due to the number of extents:

[09:34][root@cephflash21a-04f5dd1763 (qa:ceph/meredith/osd*0) ~]# ceph-volume lvm batch /dev/sd[b-q]
Total OSDs: 16

  Type            Path                                                    LV Size         % of device
----------------------------------------------------------------------------------------------------
  data            /dev/sdb                                                1.75 TB         100.00%

[...cut...]

--> The above OSDs would be created if the operation continues
--> do you want to proceed? (yes/no) y
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new ebd46603-28f7-4e8d-b50c-9b14211b546a
Running command: /usr/sbin/vgcreate --force --yes ceph-22e54bc2-94a9-4b9f-ad6a-223335162a26 /dev/sdb
 stdout: Physical volume "/dev/sdb" successfully created.
 stdout: Volume group "ceph-22e54bc2-94a9-4b9f-ad6a-223335162a26" successfully created
Running command: /usr/sbin/lvcreate --yes -l 457855 -n osd-block-ebd46603-28f7-4e8d-b50c-9b14211b546a ceph-22e54bc2-94a9-4b9f-ad6a-223335162a26
 stderr: Volume group "ceph-22e54bc2-94a9-4b9f-ad6a-223335162a26" has insufficient free space (457854 extents): 457855 required.
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.0 --yes-i-really-mean-it
 stderr: purged osd.0
-->  RuntimeError: command returned non-zero exit status: 5

Some details about PVs and VGs

# pvs -o all
  Fmt  PV UUID                                DevSize PV         Maj Min PMdaFree  PMdaSize  PExtVsn 1st PE  PSize  PFree  Used Attr Allocatable Exported   Missing    PE     Alloc PV Tags #PMda #PMdaUse BA Start BA Size PInUse Duplicate
  lvm2 TBx9tR-8CNt-ppOv-Am12-uwBg-3rxl-iqDv9s  <1.75t /dev/sdb   8   16    508.00k  1020.00k       2   1.00m <1.75t <1.75t   0  a--  allocatable                       457854     0             1        1       0       0    used    

# vgs  -o all
  Fmt  VG UUID                                VG                                        Attr   VPerms     Extendable Exported   Partial    AllocPol   Clustered  Shared  VSize  VFree  SYS ID System ID LockType VLockArgs Ext   #Ext   Free   MaxLV MaxPV #PV #PV Missing #LV #SN Seq VG Tags VProfile #VMda #VMdaUse VMdaFree  VMdaSize  #VMdaCps 
  lvm2 td8u13-oJwg-s7ls-oZtr-YQPu-jN3z-Do0mRd ceph-22e54bc2-94a9-4b9f-ad6a-223335162a26 wz--n- writeable  extendable                       normal                        <1.75t <1.75t               

Actions #16

Updated by Nathan Revo about 3 years ago

I recently experienced this issue while creating a Ceph cluster on some ARM boards(RockPro64). My build was using Ubuntu 20.04.1 with Ceph 15.2.8. Two nodes had 2TB disks and one had a 3TB disk. The first 2 nodes worked with the current codebase. The node with the 3TB disk failed with the +1 extent error listed above. I was going to add the patch to the container, but realized I didn't know how to run a custom container.

I can validate this patch on my hardware once I figure out how to patch the container. Thanks for the hard work!
-Nate

Actions #17

Updated by Jan Fajerski about 3 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport changed from octopus,nautilus to pacific,octopus,nautilus
Actions #18

Updated by Jan Fajerski about 3 years ago

  • Copied to Backport #49140: nautilus: fail to create OSDs because the requested extent is too large added
Actions #19

Updated by Jan Fajerski about 3 years ago

  • Copied to Backport #49141: octopus: fail to create OSDs because the requested extent is too large added
Actions #20

Updated by Jan Fajerski about 3 years ago

  • Copied to Backport #49142: pacific: fail to create OSDs because the requested extent is too large added
Actions #21

Updated by Nathan Cutler about 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions #22

Updated by Anonymous about 3 years ago

I hit this too.
When will the fix be available?

Actions #23

Updated by Anonymous about 3 years ago

    Running command: /usr/sbin/lvcreate --yes -l 457855 -n osd-block-ea5f6942-927c-4373-9dc1-42185660ea55 ceph-15e4d1d7-57cd-4db0-9bd1-8d853e8ea1b6
     stderr: Volume group "ceph-15e4d1d7-57cd-4db0-9bd1-8d853e8ea1b6" has insufficient free space (457854 extents): 457855 required.
NAME    VENDOR          SIZE MIN-IO OPT-IO
nvme2n1        1920383410176    512    512
nvme3n1        1920383410176    512    512
Actions

Also available in: Atom PDF