Bug #36354
mgr/dashboard/rbd: throws 500s with format 1 RBD images
0%
Description
/api/block/image
says:
Exception: InvalidArgument: [errno 22] error getting id for image test1
This breaks the images list. format 1 images do not have an id; the manager should handle this gracefully.
Related issues
History
#1 Updated by Eugen Block over 4 years ago
The described problem still exists in Nautilus. We use Ceph as RBD backend for OpenStack (and CephFS).
ceph01:~ # ceph --version
ceph version 14.2.3-349-g7b1552ea82 (7b1552ea827cf5167b6edbba96dd1c4a9dc16937) nautilus (stable)
We have a couple of images that had to be recovered from backups, so those images were imported as flat images back into the cluster:
ceph01:~ # rbd -p images ls --long | grep -E "284007bf-cd6b-42ee-9529-274d259e6812|2dcb9d7d-3a4f-49a4-8792-b4b74f5b60e5|54ba48c6-a8d9-48f9-8efe-b48acb5e9c78|931f9a1e-2022-4571-909e-6c3f5f8c3ae8|a5d472ba-208a-4bb6-a731-43d5f7eb7d8d|15ed27aa-86cd-4dc8-a312-de4a531ac9a8|2962ee0e-6015-4056-8f94-0fd76135c125|2ebba85d-0bc8-4bcc-95af-ac97f9fea277|e17068e-a36d-4d9b-9779-3af473aba033|fd07dd66-8a82-431c-99cf-9bfc3076af30|01673d5d-4b12-4a44-8793-403581f7d808" 284007bf-cd6b-42ee-9529-274d259e6812_disk 20GiB 1 2dcb9d7d-3a4f-49a4-8792-b4b74f5b60e5_disk 40GiB 1 54ba48c6-a8d9-48f9-8efe-b48acb5e9c78_disk 20GiB 1 931f9a1e-2022-4571-909e-6c3f5f8c3ae8_disk 40GiB 1 a5d472ba-208a-4bb6-a731-43d5f7eb7d8d_disk 20GiB 1 volume-15ed27aa-86cd-4dc8-a312-de4a531ac9a8 20GiB 1 volume-15ed27aa-86cd-4dc8-a312-de4a531ac9a8@20190719_snap-ebl 20GiB 1 volume-2962ee0e-6015-4056-8f94-0fd76135c125 30GiB 1 volume-2ebba85d-0bc8-4bcc-95af-ac97f9fea277 40GiB 1 volume-ce17068e-a36d-4d9b-9779-3af473aba033 20GiB 1 volume-fd07dd66-8a82-431c-99cf-9bfc3076af30 22GiB 1 01673d5d-4b12-4a44-8793-403581f7d808_disk 40GiB 2 excl 01673d5d-4b12-4a44-8793-403581f7d808_disk.config 450KiB 2 284007bf-cd6b-42ee-9529-274d259e6812_disk.config 422KiB 2 2dcb9d7d-3a4f-49a4-8792-b4b74f5b60e5_disk.config 450KiB 2 931f9a1e-2022-4571-909e-6c3f5f8c3ae8_disk.config 450KiB 2
This is the exception reported by mgr:
2019-10-02 10:05:42.724 7f98bb49f700 0 mgr[dashboard] dashboard_exception_handler Traceback (most recent call last): File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 100, in handle_rbd_error yield File "/usr/lib64/python3.6/contextlib.py", line 52, in inner return func(*args, **kwds) File "/usr/lib64/python3.6/contextlib.py", line 52, in inner return func(*args, **kwds) File "/usr/share/ceph/mgr/dashboard/controllers/rbd.py", line 218, in list return self._rbd_list(pool_name) File "/usr/share/ceph/mgr/dashboard/controllers/rbd.py", line 209, in _rbd_list status, value = self._rbd_pool_list(pool) File "/usr/share/ceph/mgr/dashboard/tools.py", line 244, in wrapper return rvc.run(fn, args, kwargs) File "/usr/share/ceph/mgr/dashboard/tools.py", line 226, in run raise self.exception File "/usr/share/ceph/mgr/dashboard/tools.py", line 147, in run val = self.fn(*self.args, **self.kwargs) File "/usr/share/ceph/mgr/dashboard/controllers/rbd.py", line 193, in _rbd_pool_list stat = cls._rbd_image(ioctx, pool_name, name) File "/usr/share/ceph/mgr/dashboard/controllers/rbd.py", line 108, in _rbd_image stat['id'] = img.id() File "rbd.pyx", line 2996, in rbd.Image.id rbd.InvalidArgument: [errno 22] error getting id for image b'284007bf-cd6b-42ee-9529-274d259e6812_disk'
It tries to read the same image over and over again (the first in the list) and fails. The result is a error flooded dashboard but no rbd images are displayed.
#2 Updated by Lenz Grimmer over 4 years ago
- Severity changed from 3 - minor to 2 - major
- Affected Versions v14.2.3 added
Eugen Block wrote:
The described problem still exists in Nautilus. We use Ceph as RBD backend for OpenStack (and CephFS).
Thanks for the report! Raising severity - the dashboard should handle this more gracefully.
#3 Updated by Kiefer Chang over 4 years ago
some notes:
- Missing `id` in v1 image: This has an impact on table selection for displaying detail pane
- For v1 images, some features in Dashboard are not supported (e.g. snapshot), need to handle this.
#4 Updated by Eugen Block over 4 years ago
Thanks for raising severity.
Would it be a possible workaround to export those images and re-import them back into ceph? That way they'd get their id and also image-format 2? I tested that with one image and the re-imported image has image-format 2 and a new id. Since those images are already flat and don't have parent data, there's probably not much to lose, right?
#5 Updated by Mykola Golub over 4 years ago
Eugen Block wrote:
Would it be a possible workaround to export those images and re-import them back into ceph? That way they'd get their id and also image-format 2? I tested that with one image and the re-imported image has image-format 2 and a new id. Since those images are already flat and don't have parent data, there's probably not much to lose, right?
That should work. Sure, if the image has a client it should be stopped first.
Note, since nautilus we have `rbd migration` command [1] that can be used exactly for this purposes. If you need just to update the image format the steps would be:
stop client (if any) rbd migration prepare $pool/$image_name # additionally you can specify image options and features here start client rbd migration execute $pool/$image_name rbd migration commit $pool/$image_name
[1] https://docs.ceph.com/docs/master/rbd/rbd-live-migration/
#6 Updated by Eugen Block over 4 years ago
Mykola Golub wrote:
Eugen Block wrote:
Would it be a possible workaround to export those images and re-import them back into ceph? That way they'd get their id and also image-format 2? I tested that with one image and the re-imported image has image-format 2 and a new id. Since those images are already flat and don't have parent data, there's probably not much to lose, right?
That should work. Sure, if the image has a client it should be stopped first.
Note, since nautilus we have `rbd migration` command [1] that can be used exactly for this purposes. If you need just to update the image format the steps would be:
[...]
[1] https://docs.ceph.com/docs/master/rbd/rbd-live-migration/
Great! I noticed that feature but haven't tested it yet. I'll test it in my lab cluster and then try that on one of the less important images first. Thanks!
#7 Updated by Eugen Block over 4 years ago
- File Screenshot_rbd_list.png View added
Update:
The live-migration from format 1 to format 2 succeeded, all VMs started successfully. Just a small note on the command, the correct syntax for the prepare command is supposed to be:
rbd migration prepare $pool/$image_name $pool/$image_name
There's one thing remaining, though. Although the rbd images are displayed now a warning message pops up every 5 seconds or so. I attached a screenshot to this report. It also appears that the list only shows images from one specific pool ("images"), not all rbd pools. Is there a way to change that or did I miss something? How is "images" selected?
#8 Updated by Lenz Grimmer over 4 years ago
Eugen Block wrote:
The live-migration from format 1 to format 2 succeeded, all VMs started successfully.
Glad to hear, thanks!
There's one thing remaining, though. Although the rbd images are displayed now a warning message pops up every 5 seconds or so. I attached a screenshot to this report.
That's a feature: gathering the list of images across all pools is an expensive operation that is not performed for every page refresh, so the info is retrieved from a cache. It would probably make sense to change this from being a warning to a notification message, to avoid confusion.
It also appears that the list only shows images from one specific pool ("images"), not all rbd pools. Is there a way to change that or did I miss something? How is "images" selected?
The Dashboard only lists RBDs from pools that have the "rbd" application label associated with it. You can add this label by editing the pool in question in the dashboard.
#9 Updated by Eugen Block over 4 years ago
Lenz Grimmer wrote:
Eugen Block wrote:
The live-migration from format 1 to format 2 succeeded, all VMs started successfully.
Glad to hear, thanks!
There's one thing remaining, though. Although the rbd images are displayed now a warning message pops up every 5 seconds or so. I attached a screenshot to this report.
That's a feature: gathering the list of images across all pools is an expensive operation that is not performed for every page refresh, so the info is retrieved from a cache. It would probably make sense to change this from being a warning to a notification message, to avoid confusion.
It also appears that the list only shows images from one specific pool ("images"), not all rbd pools. Is there a way to change that or did I miss something? How is "images" selected?
The Dashboard only lists RBDs from pools that have the "rbd" application label associated with it. You can add this label by editing the pool in question in the dashboard.
The pool I'm referring to has application "rbd" enabled. But it seems as if it just took some time to update the cache over the weekend, I see the respective images now.
Unfortunately, the search filter doesn't work on that page.
Also, after a couple of seconds the whole page stops working correctly, e.g. selecting a specific image or reducing the number of displayed images. I have to refresh the page because even a switch to a different tab (Dashboard, NFS, etc.) fails. Please let me know if you need any specific information to resolve this.
#10 Updated by Kiefer Chang over 4 years ago
Eugen Block wrote:
Also, after a couple of seconds the whole page stops working correctly, e.g. selecting a specific image or reducing the number of displayed images. I have to refresh the page because even a switch to a different tab (Dashboard, NFS, etc.) fails. Please let me know if you need any specific information to resolve this.
Are there any errors in browser's console? With Chrome you can right-click on the page and click Inspect, then switch to Console tab.
#11 Updated by Eugen Block over 4 years ago
- File Screenshot_rbd_error.png added
Kiefer Chang wrote:
Eugen Block wrote:
Also, after a couple of seconds the whole page stops working correctly, e.g. selecting a specific image or reducing the number of displayed images. I have to refresh the page because even a switch to a different tab (Dashboard, NFS, etc.) fails. Please let me know if you need any specific information to resolve this.
Are there any errors in browser's console? With Chrome you can right-click on the page and click Inspect, then switch to Console tab.
Yes, there is an error, the screenshot is attached.
#12 Updated by Kiefer Chang over 4 years ago
- File deleted (
Screenshot_rbd_error.png)
#13 Updated by Eugen Block over 4 years ago
- File Screenshot_rbd_error.png View added
Uploaded correct image.
#14 Updated by Kiefer Chang over 4 years ago
Eugen Block wrote:
Uploaded correct image.
This error is reproducible, a issue #42480 is created to track this.
#15 Updated by Kiefer Chang over 4 years ago
- Related to Bug #42480: mgr/dashboard: searching table with data in Object types make Dashboard unresponsive added
#16 Updated by Ernesto Puerta almost 4 years ago
- Status changed from New to In Progress
- Assignee set to Ernesto Puerta
- Target version set to v16.0.0
- Backport set to nautilus, octopus
#17 Updated by Ernesto Puerta almost 4 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 35007
#18 Updated by Ernesto Puerta almost 4 years ago
- Blocked by Bug #45518: [librbd] The 'copy' method defaults to the source image format added
#19 Updated by Kiefer Chang almost 4 years ago
- Status changed from Fix Under Review to Pending Backport
#20 Updated by Nathan Cutler almost 4 years ago
- Copied to Backport #46019: nautilus: mgr/dashboard/rbd: throws 500s with format 1 RBD images added
#21 Updated by Nathan Cutler almost 4 years ago
- Copied to Backport #46020: octopus: mgr/dashboard/rbd: throws 500s with format 1 RBD images added
#22 Updated by Nathan Cutler over 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".
#23 Updated by Alex Litvak over 3 years ago
Will backported fix be released in Nautilus 14.2.11 ?
#24 Updated by Nathan Cutler over 3 years ago
Alex Litvak wrote:
Will backported fix be released in Nautilus 14.2.11 ?
Quite probably.
#25 Updated by Ernesto Puerta almost 3 years ago
- Project changed from mgr to Dashboard
- Category changed from 139 to Component - RBD