Project

General

Profile

Bug #36354

mgr/dashboard/rbd: throws 500s with format 1 RBD images

Added by Hector Martin over 2 years ago. Updated 6 months ago.

Status:
Resolved
Priority:
Normal
Category:
dashboard/rbd
Target version:
% Done:

0%

Source:
Tags:
Backport:
nautilus, octopus
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

/api/block/image says:

Exception: InvalidArgument: [errno 22] error getting id for image test1

This breaks the images list. format 1 images do not have an id; the manager should handle this gracefully.

Screenshot_rbd_list.png View (124 KB) Eugen Block, 10/18/2019 02:02 PM

Screenshot_rbd_error.png View (139 KB) Eugen Block, 10/21/2019 07:31 AM


Related issues

Related to mgr - Bug #42480: mgr/dashboard: searching table with data in Object types make Dashboard unresponsive Resolved
Blocked by rbd - Bug #45518: [librbd] The 'copy' method defaults to the source image format Resolved
Copied to mgr - Backport #46019: nautilus: mgr/dashboard/rbd: throws 500s with format 1 RBD images Resolved
Copied to mgr - Backport #46020: octopus: mgr/dashboard/rbd: throws 500s with format 1 RBD images Resolved

History

#1 Updated by Eugen Block over 1 year ago

The described problem still exists in Nautilus. We use Ceph as RBD backend for OpenStack (and CephFS).

ceph01:~ # ceph --version
ceph version 14.2.3-349-g7b1552ea82 (7b1552ea827cf5167b6edbba96dd1c4a9dc16937) nautilus (stable)

We have a couple of images that had to be recovered from backups, so those images were imported as flat images back into the cluster:

ceph01:~ # rbd -p images ls --long | grep -E "284007bf-cd6b-42ee-9529-274d259e6812|2dcb9d7d-3a4f-49a4-8792-b4b74f5b60e5|54ba48c6-a8d9-48f9-8efe-b48acb5e9c78|931f9a1e-2022-4571-909e-6c3f5f8c3ae8|a5d472ba-208a-4bb6-a731-43d5f7eb7d8d|15ed27aa-86cd-4dc8-a312-de4a531ac9a8|2962ee0e-6015-4056-8f94-0fd76135c125|2ebba85d-0bc8-4bcc-95af-ac97f9fea277|e17068e-a36d-4d9b-9779-3af473aba033|fd07dd66-8a82-431c-99cf-9bfc3076af30|01673d5d-4b12-4a44-8793-403581f7d808" 
284007bf-cd6b-42ee-9529-274d259e6812_disk                                                   20GiB                                                    1           
2dcb9d7d-3a4f-49a4-8792-b4b74f5b60e5_disk                                                   40GiB                                                    1           
54ba48c6-a8d9-48f9-8efe-b48acb5e9c78_disk                                                   20GiB                                                    1           
931f9a1e-2022-4571-909e-6c3f5f8c3ae8_disk                                                   40GiB                                                    1           
a5d472ba-208a-4bb6-a731-43d5f7eb7d8d_disk                                                   20GiB                                                    1           
volume-15ed27aa-86cd-4dc8-a312-de4a531ac9a8                                                 20GiB                                                    1           
volume-15ed27aa-86cd-4dc8-a312-de4a531ac9a8@20190719_snap-ebl                               20GiB                                                    1           
volume-2962ee0e-6015-4056-8f94-0fd76135c125                                                 30GiB                                                    1           
volume-2ebba85d-0bc8-4bcc-95af-ac97f9fea277                                                 40GiB                                                    1           
volume-ce17068e-a36d-4d9b-9779-3af473aba033                                                 20GiB                                                    1           
volume-fd07dd66-8a82-431c-99cf-9bfc3076af30                                                 22GiB                                                    1           
01673d5d-4b12-4a44-8793-403581f7d808_disk                                                   40GiB                                                    2      excl 
01673d5d-4b12-4a44-8793-403581f7d808_disk.config                                           450KiB                                                    2           
284007bf-cd6b-42ee-9529-274d259e6812_disk.config                                           422KiB                                                    2           
2dcb9d7d-3a4f-49a4-8792-b4b74f5b60e5_disk.config                                           450KiB                                                    2           
931f9a1e-2022-4571-909e-6c3f5f8c3ae8_disk.config                                           450KiB                                                    2          

This is the exception reported by mgr:

2019-10-02 10:05:42.724 7f98bb49f700  0 mgr[dashboard] dashboard_exception_handler
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 100, in handle_rbd_error
    yield
  File "/usr/lib64/python3.6/contextlib.py", line 52, in inner
    return func(*args, **kwds)
  File "/usr/lib64/python3.6/contextlib.py", line 52, in inner
    return func(*args, **kwds)
  File "/usr/share/ceph/mgr/dashboard/controllers/rbd.py", line 218, in list
    return self._rbd_list(pool_name)
  File "/usr/share/ceph/mgr/dashboard/controllers/rbd.py", line 209, in _rbd_list
    status, value = self._rbd_pool_list(pool)
  File "/usr/share/ceph/mgr/dashboard/tools.py", line 244, in wrapper
    return rvc.run(fn, args, kwargs)
  File "/usr/share/ceph/mgr/dashboard/tools.py", line 226, in run
    raise self.exception
  File "/usr/share/ceph/mgr/dashboard/tools.py", line 147, in run
    val = self.fn(*self.args, **self.kwargs)
  File "/usr/share/ceph/mgr/dashboard/controllers/rbd.py", line 193, in _rbd_pool_list
    stat = cls._rbd_image(ioctx, pool_name, name)
  File "/usr/share/ceph/mgr/dashboard/controllers/rbd.py", line 108, in _rbd_image
    stat['id'] = img.id()
  File "rbd.pyx", line 2996, in rbd.Image.id
rbd.InvalidArgument: [errno 22] error getting id for image b'284007bf-cd6b-42ee-9529-274d259e6812_disk'

It tries to read the same image over and over again (the first in the list) and fails. The result is a error flooded dashboard but no rbd images are displayed.

#2 Updated by Lenz Grimmer over 1 year ago

  • Severity changed from 3 - minor to 2 - major
  • Affected Versions v14.2.3 added

Eugen Block wrote:

The described problem still exists in Nautilus. We use Ceph as RBD backend for OpenStack (and CephFS).

Thanks for the report! Raising severity - the dashboard should handle this more gracefully.

#3 Updated by Kiefer Chang over 1 year ago

some notes:

  • Missing `id` in v1 image: This has an impact on table selection for displaying detail pane
  • For v1 images, some features in Dashboard are not supported (e.g. snapshot), need to handle this.

#4 Updated by Eugen Block over 1 year ago

Thanks for raising severity.
Would it be a possible workaround to export those images and re-import them back into ceph? That way they'd get their id and also image-format 2? I tested that with one image and the re-imported image has image-format 2 and a new id. Since those images are already flat and don't have parent data, there's probably not much to lose, right?

#5 Updated by Mykola Golub over 1 year ago

Eugen Block wrote:

Would it be a possible workaround to export those images and re-import them back into ceph? That way they'd get their id and also image-format 2? I tested that with one image and the re-imported image has image-format 2 and a new id. Since those images are already flat and don't have parent data, there's probably not much to lose, right?

That should work. Sure, if the image has a client it should be stopped first.

Note, since nautilus we have `rbd migration` command [1] that can be used exactly for this purposes. If you need just to update the image format the steps would be:

stop client (if any)
rbd migration prepare $pool/$image_name # additionally you can specify image options and features here
start client
rbd migration execute $pool/$image_name
rbd migration commit $pool/$image_name

[1] https://docs.ceph.com/docs/master/rbd/rbd-live-migration/

#6 Updated by Eugen Block over 1 year ago

Mykola Golub wrote:

Eugen Block wrote:

Would it be a possible workaround to export those images and re-import them back into ceph? That way they'd get their id and also image-format 2? I tested that with one image and the re-imported image has image-format 2 and a new id. Since those images are already flat and don't have parent data, there's probably not much to lose, right?

That should work. Sure, if the image has a client it should be stopped first.

Note, since nautilus we have `rbd migration` command [1] that can be used exactly for this purposes. If you need just to update the image format the steps would be:
[...]
[1] https://docs.ceph.com/docs/master/rbd/rbd-live-migration/

Great! I noticed that feature but haven't tested it yet. I'll test it in my lab cluster and then try that on one of the less important images first. Thanks!

#7 Updated by Eugen Block over 1 year ago

Update:
The live-migration from format 1 to format 2 succeeded, all VMs started successfully. Just a small note on the command, the correct syntax for the prepare command is supposed to be:

rbd migration prepare $pool/$image_name $pool/$image_name

There's one thing remaining, though. Although the rbd images are displayed now a warning message pops up every 5 seconds or so. I attached a screenshot to this report. It also appears that the list only shows images from one specific pool ("images"), not all rbd pools. Is there a way to change that or did I miss something? How is "images" selected?

#8 Updated by Lenz Grimmer over 1 year ago

Eugen Block wrote:

The live-migration from format 1 to format 2 succeeded, all VMs started successfully.

Glad to hear, thanks!

There's one thing remaining, though. Although the rbd images are displayed now a warning message pops up every 5 seconds or so. I attached a screenshot to this report.

That's a feature: gathering the list of images across all pools is an expensive operation that is not performed for every page refresh, so the info is retrieved from a cache. It would probably make sense to change this from being a warning to a notification message, to avoid confusion.

It also appears that the list only shows images from one specific pool ("images"), not all rbd pools. Is there a way to change that or did I miss something? How is "images" selected?

The Dashboard only lists RBDs from pools that have the "rbd" application label associated with it. You can add this label by editing the pool in question in the dashboard.

#9 Updated by Eugen Block about 1 year ago

Lenz Grimmer wrote:

Eugen Block wrote:

The live-migration from format 1 to format 2 succeeded, all VMs started successfully.

Glad to hear, thanks!

There's one thing remaining, though. Although the rbd images are displayed now a warning message pops up every 5 seconds or so. I attached a screenshot to this report.

That's a feature: gathering the list of images across all pools is an expensive operation that is not performed for every page refresh, so the info is retrieved from a cache. It would probably make sense to change this from being a warning to a notification message, to avoid confusion.

It also appears that the list only shows images from one specific pool ("images"), not all rbd pools. Is there a way to change that or did I miss something? How is "images" selected?

The Dashboard only lists RBDs from pools that have the "rbd" application label associated with it. You can add this label by editing the pool in question in the dashboard.

The pool I'm referring to has application "rbd" enabled. But it seems as if it just took some time to update the cache over the weekend, I see the respective images now.
Unfortunately, the search filter doesn't work on that page.
Also, after a couple of seconds the whole page stops working correctly, e.g. selecting a specific image or reducing the number of displayed images. I have to refresh the page because even a switch to a different tab (Dashboard, NFS, etc.) fails. Please let me know if you need any specific information to resolve this.

#10 Updated by Kiefer Chang about 1 year ago

Eugen Block wrote:

Also, after a couple of seconds the whole page stops working correctly, e.g. selecting a specific image or reducing the number of displayed images. I have to refresh the page because even a switch to a different tab (Dashboard, NFS, etc.) fails. Please let me know if you need any specific information to resolve this.

Are there any errors in browser's console? With Chrome you can right-click on the page and click Inspect, then switch to Console tab.

#11 Updated by Eugen Block about 1 year ago

  • File Screenshot_rbd_error.png added

Kiefer Chang wrote:

Eugen Block wrote:

Also, after a couple of seconds the whole page stops working correctly, e.g. selecting a specific image or reducing the number of displayed images. I have to refresh the page because even a switch to a different tab (Dashboard, NFS, etc.) fails. Please let me know if you need any specific information to resolve this.

Are there any errors in browser's console? With Chrome you can right-click on the page and click Inspect, then switch to Console tab.

Yes, there is an error, the screenshot is attached.

#12 Updated by Kiefer Chang about 1 year ago

  • File deleted (Screenshot_rbd_error.png)

#13 Updated by Eugen Block about 1 year ago

Uploaded correct image.

#14 Updated by Kiefer Chang about 1 year ago

Eugen Block wrote:

Uploaded correct image.

This error is reproducible, a issue #42480 is created to track this.

#15 Updated by Kiefer Chang about 1 year ago

  • Related to Bug #42480: mgr/dashboard: searching table with data in Object types make Dashboard unresponsive added

#16 Updated by Ernesto Puerta 8 months ago

  • Status changed from New to In Progress
  • Assignee set to Ernesto Puerta
  • Target version set to v16.0.0
  • Backport set to nautilus, octopus

#17 Updated by Ernesto Puerta 8 months ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 35007

#18 Updated by Ernesto Puerta 8 months ago

  • Blocked by Bug #45518: [librbd] The 'copy' method defaults to the source image format added

#19 Updated by Kiefer Chang 7 months ago

  • Status changed from Fix Under Review to Pending Backport

#20 Updated by Nathan Cutler 7 months ago

  • Copied to Backport #46019: nautilus: mgr/dashboard/rbd: throws 500s with format 1 RBD images added

#21 Updated by Nathan Cutler 7 months ago

  • Copied to Backport #46020: octopus: mgr/dashboard/rbd: throws 500s with format 1 RBD images added

#22 Updated by Nathan Cutler 6 months ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

#23 Updated by Alex Litvak 6 months ago

Will backported fix be released in Nautilus 14.2.11 ?

#24 Updated by Nathan Cutler 6 months ago

Alex Litvak wrote:

Will backported fix be released in Nautilus 14.2.11 ?

Quite probably.

Also available in: Atom PDF