Project

General

Profile

Bug #16887

ceph 10.2.2 rbd status on image format 2 returns "(2) No such file or directory"

Added by Kjetil Joergensen 10 months ago. Updated 27 days ago.

Status:
Resolved
Priority:
Normal
Target version:
-
Start date:
08/01/2016
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
jewel,hammer
Regression:
Yes
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
jewel
Needs Doc:
No

Description

Not quite sure when this started failing, it only ended up noticeable now. So, I can't tell you exactly when, but I'm reasonably sure this worked in 0.9.X.

Previous means by extraacting block_name_prefix and replacing rbd_data with rbd_header and do rados listwatchers on it fails in the same way. Older images "works", which leads me to believe that there's something with how rbd create --image-format=2 operates in 10.2.2 and missing behavior in tools/rbd/action/Status.cc or if it's something else.

kjetil@sc2-r10-u07:~$ rbd --version
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
kjetil@sc2-r10-u07:~$ rbd create --image-format=2 --size=10M  watcher-fail
kjetil@sc2-r10-u07:~$ rbd status watcher-fail
rbd: show status failed: (2) No such file or directory
kjetil@sc2-r10-u07:~$ rbd --debug-rbd=20/20 status watcher-fail
2016-08-01 12:15:39.288796 7f5e2ad95d80  5 librbd::AioImageRequestWQ: 0x561e4e0830d0 : ictx=0x561e4e081170
2016-08-01 12:15:39.288872 7f5e2ad95d80 20 librbd::ImageState: open
2016-08-01 12:15:39.288876 7f5e2ad95d80 10 librbd::ImageState: 0x561e4e076860 send_open_unlock
2016-08-01 12:15:39.288882 7f5e2ad95d80 10 librbd::image::OpenRequest: 0x561e4e083a40 send_v2_detect_header
2016-08-01 12:15:39.289842 7f5dfffff700 10 librbd::image::OpenRequest: handle_v2_detect_header: r=0
2016-08-01 12:15:39.289848 7f5dfffff700 10 librbd::image::OpenRequest: 0x561e4e083a40 send_v2_get_id
2016-08-01 12:15:39.290127 7f5dfffff700 10 librbd::image::OpenRequest: handle_v2_get_id: r=0
2016-08-01 12:15:39.290132 7f5dfffff700 10 librbd::image::OpenRequest: 0x561e4e083a40 send_v2_get_immutable_metadata
2016-08-01 12:15:39.291861 7f5dfffff700 10 librbd::image::OpenRequest: handle_v2_get_immutable_metadata: r=0
2016-08-01 12:15:39.291869 7f5dfffff700 10 librbd::image::OpenRequest: 0x561e4e083a40 send_v2_get_stripe_unit_count
2016-08-01 12:15:39.292261 7f5dfffff700 10 librbd::image::OpenRequest: handle_v2_get_stripe_unit_count: r=-8
2016-08-01 12:15:39.292274 7f5dfffff700 10 librbd::ImageCtx: init_layout stripe_unit 4194304 stripe_count 1 object_size 4194304 prefix rbd_data.1270eb77238e1f29 format rbd_data.1270eb77238e1f29.%016llx
2016-08-01 12:15:39.292276 7f5dfffff700 10 librbd::image::OpenRequest: 0x561e4e083a40 send_v2_apply_metadata: start_key=conf_
2016-08-01 12:15:39.292538 7f5dfffff700 10 librbd::image::OpenRequest: 0x561e4e083a40 handle_v2_apply_metadata: r=0
2016-08-01 12:15:39.292544 7f5dfffff700 20 librbd::ImageCtx: apply_metadata
2016-08-01 12:15:39.292632 7f5dfffff700 20 librbd::ImageCtx: enabling caching...
2016-08-01 12:15:39.292635 7f5dfffff700 20 librbd::ImageCtx: Initial cache settings: size=33554432 num_objects=10 max_dirty=25165824 target_dirty=16777216 max_dirty_age=1
2016-08-01 12:15:39.292696 7f5dfffff700 10 librbd::ImageCtx:  cache bytes 33554432 -> about 1048 objects
2016-08-01 12:15:39.292724 7f5dfffff700 10 librbd::image::OpenRequest: 0x561e4e083a40 send_refresh
2016-08-01 12:15:39.292727 7f5dfffff700 10 librbd::image::RefreshRequest: 0x7f5de8002e60 send_v2_get_mutable_metadata
2016-08-01 12:15:39.293261 7f5dfffff700 10 librbd::image::RefreshRequest: 0x7f5de8002e60 handle_v2_get_mutable_metadata: r=0
2016-08-01 12:15:39.293265 7f5dfffff700 10 librbd::image::RefreshRequest: 0x7f5de8002e60 send_v2_get_flags
2016-08-01 12:15:39.293551 7f5dfffff700 10 librbd::image::RefreshRequest: 0x7f5de8002e60 handle_v2_get_flags: r=0
2016-08-01 12:15:39.293554 7f5dfffff700 10 librbd::image::RefreshRequest: 0x7f5de8002e60 send_v2_apply
2016-08-01 12:15:39.293565 7f5dff7fe700 10 librbd::image::RefreshRequest: 0x7f5de8002e60 handle_v2_apply
2016-08-01 12:15:39.293568 7f5dff7fe700 20 librbd::image::RefreshRequest: 0x7f5de8002e60 apply
2016-08-01 12:15:39.293578 7f5dff7fe700 10 librbd::image::OpenRequest: handle_refresh: r=0
2016-08-01 12:15:39.293583 7f5dff7fe700 10 librbd::ImageState: 0x561e4e076860 handle_open: r=0
2016-08-01 12:15:39.293595 7f5e2ad95d80 20 librbd: info 0x561e4e081170
rbd: show status failed: (2) No such file or directory
2016-08-01 12:15:39.294518 7f5e2ad95d80 20 librbd::ImageState: close
2016-08-01 12:15:39.294522 7f5e2ad95d80 10 librbd::ImageState: 0x561e4e076860 send_close_unlock
2016-08-01 12:15:39.294524 7f5e2ad95d80 10 librbd::image::CloseRequest: 0x561e4e083f10 send_shut_down_aio_queue
2016-08-01 12:15:39.294526 7f5e2ad95d80  5 librbd::AioImageRequestWQ: shut_down: in_flight=0
2016-08-01 12:15:39.294537 7f5dff7fe700 10 librbd::image::CloseRequest: 0x561e4e083f10 handle_shut_down_aio_queue: r=0
2016-08-01 12:15:39.294540 7f5dff7fe700 10 librbd::image::CloseRequest: 0x561e4e083f10 send_flush
2016-08-01 12:15:39.294543 7f5dff7fe700 10 librbd::image::CloseRequest: 0x561e4e083f10 handle_flush: r=0
2016-08-01 12:15:39.294543 7f5dff7fe700 10 librbd::image::CloseRequest: 0x561e4e083f10 send_flush_readahead
2016-08-01 12:15:39.294544 7f5dff7fe700 10 librbd::image::CloseRequest: 0x561e4e083f10 handle_flush_readahead: r=0
2016-08-01 12:15:39.294545 7f5dff7fe700 10 librbd::image::CloseRequest: 0x561e4e083f10 send_shut_down_cache
2016-08-01 12:15:39.294577 7f5dff7fe700 10 librbd::image::CloseRequest: 0x561e4e083f10 handle_shut_down_cache: r=0
2016-08-01 12:15:39.294579 7f5dff7fe700 10 librbd::image::CloseRequest: 0x561e4e083f10 send_flush_op_work_queue
2016-08-01 12:15:39.294581 7f5dff7fe700 10 librbd::image::CloseRequest: 0x561e4e083f10 handle_flush_op_work_queue: r=0
2016-08-01 12:15:39.294587 7f5dff7fe700 10 librbd::ImageState: 0x561e4e076860 handle_close: r=0
kjetil@sc2-r10-u07:~$ rbd info watcher-fail
rbd image 'watcher-fail':
    size 10240 kB in 3 objects
    order 22 (4096 kB objects)
    block_name_prefix: rbd_data.1270eb77238e1f2
    format: 2
    features: layering
    flags:

Related issues

Related to Bug #18653: Improve compatibility between librbd + krbd for the data pool Pending Backport 01/24/2017
Copied to Backport #16951: jewel: ceph 10.2.2 rbd status on image format 2 returns "(2) No such file or directory" Resolved
Copied to Backport #16952: hammer: ceph 10.2.2 rbd status on image format 2 returns "(2) No such file or directory" Resolved

History

#1 Updated by Kjetil Joergensen 10 months ago

With --debug-rados=20/20

Ok - so there seems to be some disagreement about what the rbd_header. object should be named. "rbd status" ends up truncating the last byte off of the oid.

rbd create (and presumably also delete) ends up with: 2016-08-01 13:35:52.907412 7f60963dbd80 10 librados: call oid=rbd_header.126d274c238e1f29 nspace=
rbd status on the other hand ends up with: 2016-08-01 13:36:09.916841 7fa6dea01d80 10 librados: list-watchers oid=rbd_header.126d274c238e1f2 nspace=

$ rbd create --debug-rados=20/20 --debug-rbd=20/20 --image-format=2 --size=10M deleteme
2016-08-01 13:35:52.892972 7f60963dbd80  1 librados: starting msgr at :/0
2016-08-01 13:35:52.892977 7f60963dbd80  1 librados: starting objecter
2016-08-01 13:35:52.893166 7f60963dbd80  1 librados: setting wanted keys
2016-08-01 13:35:52.893169 7f60963dbd80  1 librados: calling monclient init
2016-08-01 13:35:52.896564 7f60963dbd80  1 librados: init done
2016-08-01 13:35:52.896602 7f60963dbd80 10 librados: wait_for_osdmap waiting
2016-08-01 13:35:52.897971 7f60963dbd80 10 librados: wait_for_osdmap done waiting
2016-08-01 13:35:52.898076 7f60963dbd80 10 librbd: create name=deleteme, size=10485760, opts=[format=2, features=1, order=22, stripe_unit=0, stripe_count=0]
2016-08-01 13:35:52.898094 7f60963dbd80 20 librbd: create 0x7ffff0e35fa0 name = deleteme size = 10485760 old_format = 0 features = 1 order = 22 stripe_unit = 0 stripe_count = 0
2016-08-01 13:35:52.898102 7f60963dbd80 10 librados: stat oid=deleteme.rbd nspace=
2016-08-01 13:35:52.899180 7f60963dbd80 10 librados: Objecter returned from stat r=-2
2016-08-01 13:35:52.899189 7f60963dbd80 10 librados: stat oid=rbd_id.deleteme nspace=
2016-08-01 13:35:52.900065 7f60963dbd80 10 librados: Objecter returned from stat r=-2
2016-08-01 13:35:52.900077 7f60963dbd80 10 librados: stat oid=rbd_directory nspace=
2016-08-01 13:35:52.901161 7f60963dbd80 10 librados: Objecter returned from stat r=0
2016-08-01 13:35:52.901169 7f60963dbd80 10 librados: create oid=rbd_id.deleteme nspace=
2016-08-01 13:35:52.903054 7f60963dbd80 10 librados: Objecter returned from create r=0
2016-08-01 13:35:52.903078 7f60963dbd80 10 librados: call oid=rbd_id.deleteme nspace=
2016-08-01 13:35:52.904935 7f60963dbd80 10 librados: Objecter returned from call r=0
2016-08-01 13:35:52.904942 7f60963dbd80  2 librbd: adding rbd image to directory...
2016-08-01 13:35:52.904949 7f60963dbd80 10 librados: call oid=rbd_directory nspace=
2016-08-01 13:35:52.907395 7f60963dbd80 10 librados: Objecter returned from call r=0
2016-08-01 13:35:52.907412 7f60963dbd80 10 librados: call oid=rbd_header.126d274c238e1f29 nspace=
2016-08-01 13:35:52.911143 7f60963dbd80 10 librados: Objecter returned from call r=0
2016-08-01 13:35:52.911152 7f60963dbd80  2 librbd: done.
2016-08-01 13:35:52.911157 7f60963dbd80 10 librados: watch_flush enter
2016-08-01 13:35:52.911212 7f60963dbd80 10 librados: watch_flush exit
2016-08-01 13:35:52.911659 7f60963dbd80  1 librados: shutdown
$ rbd status --debug-rados=20/20 --debug-rbd=20/20 deleteme
2016-08-01 13:36:09.907722 7fa6dea01d80  1 librados: starting msgr at :/0
2016-08-01 13:36:09.907726 7fa6dea01d80  1 librados: starting objecter
2016-08-01 13:36:09.907918 7fa6dea01d80  1 librados: setting wanted keys
2016-08-01 13:36:09.907921 7fa6dea01d80  1 librados: calling monclient init
2016-08-01 13:36:09.910356 7fa6dea01d80  1 librados: init done
2016-08-01 13:36:09.910363 7fa6dea01d80 10 librados: wait_for_osdmap waiting
2016-08-01 13:36:09.911674 7fa6dea01d80 10 librados: wait_for_osdmap done waiting
2016-08-01 13:36:09.911844 7fa6dea01d80  5 librbd::AioImageRequestWQ: 0x55d116f1fb00 : ictx=0x55d116f1e440
2016-08-01 13:36:09.911898 7fa6dea01d80 20 librbd::ImageState: open
2016-08-01 13:36:09.911901 7fa6dea01d80 10 librbd::ImageState: 0x55d116f1efe0 send_open_unlock
2016-08-01 13:36:09.911909 7fa6dea01d80 10 librbd::image::OpenRequest: 0x55d116f20450 send_v2_detect_header
2016-08-01 13:36:09.913018 7fa6c0883700 10 librbd::image::OpenRequest: handle_v2_detect_header: r=0
2016-08-01 13:36:09.913024 7fa6c0883700 10 librbd::image::OpenRequest: 0x55d116f20450 send_v2_get_id
2016-08-01 13:36:09.913637 7fa6c0883700 10 librbd::image::OpenRequest: handle_v2_get_id: r=0
2016-08-01 13:36:09.913641 7fa6c0883700 10 librbd::image::OpenRequest: 0x55d116f20450 send_v2_get_immutable_metadata
2016-08-01 13:36:09.915175 7fa6c0883700 10 librbd::image::OpenRequest: handle_v2_get_immutable_metadata: r=0
2016-08-01 13:36:09.915181 7fa6c0883700 10 librbd::image::OpenRequest: 0x55d116f20450 send_v2_get_stripe_unit_count
2016-08-01 13:36:09.915591 7fa6c0883700 10 librbd::image::OpenRequest: handle_v2_get_stripe_unit_count: r=-8
2016-08-01 13:36:09.915600 7fa6c0883700 10 librbd::ImageCtx: init_layout stripe_unit 4194304 stripe_count 1 object_size 4194304 prefix rbd_data.126d274c238e1f29 format rbd_data.126d274c238e1f29.%016llx
2016-08-01 13:36:09.915602 7fa6c0883700 10 librbd::image::OpenRequest: 0x55d116f20450 send_v2_apply_metadata: start_key=conf_
2016-08-01 13:36:09.915838 7fa6c0883700 10 librbd::image::OpenRequest: 0x55d116f20450 handle_v2_apply_metadata: r=0
2016-08-01 13:36:09.915845 7fa6c0883700 20 librbd::ImageCtx: apply_metadata
2016-08-01 13:36:09.915927 7fa6c0883700 20 librbd::ImageCtx: enabling caching...
2016-08-01 13:36:09.915930 7fa6c0883700 20 librbd::ImageCtx: Initial cache settings: size=33554432 num_objects=10 max_dirty=25165824 target_dirty=16777216 max_dirty_age=1
2016-08-01 13:36:09.915986 7fa6c0883700 10 librbd::ImageCtx:  cache bytes 33554432 -> about 1048 objects
2016-08-01 13:36:09.916013 7fa6c0883700 10 librbd::image::OpenRequest: 0x55d116f20450 send_refresh
2016-08-01 13:36:09.916017 7fa6c0883700 10 librbd::image::RefreshRequest: 0x7fa6a0000d00 send_v2_get_mutable_metadata
2016-08-01 13:36:09.916503 7fa6c0883700 10 librbd::image::RefreshRequest: 0x7fa6a0000d00 handle_v2_get_mutable_metadata: r=0
2016-08-01 13:36:09.916512 7fa6c0883700 10 librbd::image::RefreshRequest: 0x7fa6a0000d00 send_v2_get_flags
2016-08-01 13:36:09.916755 7fa6c0883700 10 librbd::image::RefreshRequest: 0x7fa6a0000d00 handle_v2_get_flags: r=0
2016-08-01 13:36:09.916760 7fa6c0883700 10 librbd::image::RefreshRequest: 0x7fa6a0000d00 send_v2_apply
2016-08-01 13:36:09.916772 7fa6bb189700 10 librbd::image::RefreshRequest: 0x7fa6a0000d00 handle_v2_apply
2016-08-01 13:36:09.916778 7fa6bb189700 20 librbd::image::RefreshRequest: 0x7fa6a0000d00 apply
2016-08-01 13:36:09.916784 7fa6bb189700 10 librados: set snap write context: seq = 0 and snaps = []
2016-08-01 13:36:09.916790 7fa6bb189700 10 librbd::image::OpenRequest: handle_refresh: r=0
2016-08-01 13:36:09.916795 7fa6bb189700 10 librbd::ImageState: 0x55d116f1efe0 handle_open: r=0
2016-08-01 13:36:09.916826 7fa6dea01d80 20 librbd: info 0x55d116f1e440
2016-08-01 13:36:09.916841 7fa6dea01d80 10 librados: list-watchers oid=rbd_header.126d274c238e1f2 nspace=
2016-08-01 13:36:09.917985 7fa6dea01d80 10 librados: Objecter returned from list-watchers r=-2
rbd: show status failed: (2) No such file or directory
2016-08-01 13:36:09.918009 7fa6dea01d80 20 librbd::ImageState: close
2016-08-01 13:36:09.918012 7fa6dea01d80 10 librbd::ImageState: 0x55d116f1efe0 send_close_unlock
2016-08-01 13:36:09.918014 7fa6dea01d80 10 librbd::image::CloseRequest: 0x55d116f20cb0 send_shut_down_aio_queue
2016-08-01 13:36:09.918015 7fa6dea01d80  5 librbd::AioImageRequestWQ: shut_down: in_flight=0
2016-08-01 13:36:09.918026 7fa6bb189700 10 librbd::image::CloseRequest: 0x55d116f20cb0 handle_shut_down_aio_queue: r=0
2016-08-01 13:36:09.918033 7fa6bb189700 10 librbd::image::CloseRequest: 0x55d116f20cb0 send_flush
2016-08-01 13:36:09.918036 7fa6bb189700 10 librbd::image::CloseRequest: 0x55d116f20cb0 handle_flush: r=0
2016-08-01 13:36:09.918037 7fa6bb189700 10 librbd::image::CloseRequest: 0x55d116f20cb0 send_flush_readahead
2016-08-01 13:36:09.918038 7fa6bb189700 10 librbd::image::CloseRequest: 0x55d116f20cb0 handle_flush_readahead: r=0
2016-08-01 13:36:09.918039 7fa6bb189700 10 librbd::image::CloseRequest: 0x55d116f20cb0 send_shut_down_cache
2016-08-01 13:36:09.918077 7fa6bb189700 10 librbd::image::CloseRequest: 0x55d116f20cb0 handle_shut_down_cache: r=0
2016-08-01 13:36:09.918079 7fa6bb189700 10 librbd::image::CloseRequest: 0x55d116f20cb0 send_flush_op_work_queue
2016-08-01 13:36:09.918081 7fa6bb189700 10 librbd::image::CloseRequest: 0x55d116f20cb0 handle_flush_op_work_queue: r=0
2016-08-01 13:36:09.918087 7fa6bb189700 10 librbd::ImageState: 0x55d116f1efe0 handle_close: r=0
2016-08-01 13:36:09.918119 7fa6dea01d80 20 librados: flush_aio_writes
2016-08-01 13:36:09.918122 7fa6dea01d80 20 librados: flush_aio_writes
2016-08-01 13:36:09.918133 7fa6dea01d80 10 librados: watch_flush enter
2016-08-01 13:36:09.918144 7fa6dea01d80 10 librados: watch_flush exit
2016-08-01 13:36:09.918449 7fa6dea01d80  1 librados: shutdown

#2 Updated by Kjetil Joergensen 10 months ago

As for - why this impacts us. We use/abuse a combination of advisory locking (hint) and watchers to paper over mandatory exclusive locking/mapping of rbd images with krbd. krbd itself seems to do "the right thing". This completely breaks attempting to safely reclaim orphaned advisory locks, as we also double check for any active watchers.

So - actual problem. "At some point", the rbd block_name_prefix seems to have been "extended" to 16 bytes, previously 15. rbd status/info and friends, currently truncates this to 15, krbd still does the right thing.

#3 Updated by Kjetil Joergensen 10 months ago

For anybody else with the same problem, really-dirty-hack full of awful assumptions: rados -p rbd listwatchers rbd_header.$(rados -p rbd get rbd_id.$image /dev/stdout | strings)

#4 Updated by Kjetil Joergensen 10 months ago

Summary: It looks like the block_name_prefix/rbd_id may accidentally have been extended by one byte, "rbd status" does not support this.

#5 Updated by Kjetil Joergensen 10 months ago

And - it's not consistent. I have one 10.2.2 cluster where this is a problem and one where it's not. Is the length of rbd_id/block_prefix dynamic or is it supposed to be fixed length ?

#6 Updated by Kjetil Joergensen 10 months ago

Actually - if I read this correctly - somehow we consistently end up with block_prefix (including rbd_data.) that's 25 bytes, which is larger than RBD_MAX_BLOCK_NAME_SIZE.

On the cluster we do have that's misbehaving, rados::get_instance_id() returns values in 10^8 range, on one of our non-misbehaving clusters it's in 10^7. If I read librbd's create_v2 correctly, bid_ss << std::hex << bid << std::hex << extra; has bid be rados::get_instance_id(), which ends up as a total of 16 characters, we now tack on the prefix rbd_data., which now gets us to 25 characters/bytes, which is larger than RBD_MAX_BLOCK_NAME_SIZE. I haven't determine where the truncation happens for "rbd status / rbd info" yet.

However - in order to reproduce, you'll probably need a ceph cluster where rados::get_instance_id() returns a number > 0x0fffffff.

#7 Updated by Kjetil Joergensen 10 months ago

To un-break my previous comment slightly:

librbd's create_v2 makes the block prefix / id as: bid_ss << std::hex << bid << std::hex << extra;
  • bid comes from caller, and comes from librados' get_instance_id()
  • extra is random() & 0xffffffff

On one of our clusters, bid ends up being larger than 0x0fffffff, so bid_ss ends up at 16 characters. When this then is formatted as rbd_data. + bid_ss, we end up with 25 characters, which is larger than RBD_MAX_BLOCK_NAME_SIZE. Then, when doing rbd status or rbd info on one of these rbd images, the rbd block prefix seems to be truncated down to 24 (or 15) characters, which causes the object-name for rbd_header.$id to be one character short, which makes the fetch fail with ENOENT.

This would "fixes" "rbd status", although at this point it might be more appropriate to bump RBD_MAX_BLOCK_NAME_SIZE given that we're already past that point, and probably also enforce RBD_MAX_BLOCK_NAME_SIZE when rbd_id and block_prefix is generated.

<rbd>
diff --git a/src/librbd/internal.cc b/src/librbd/internal.cc
index 13682df..44d82ad 100644
--- a/src/librbd/internal.cc
++ b/src/librbd/internal.cc
@ -477,7 +477,7 @ int mirror_image_disable_internal(ImageCtx *ictx, bool force,
info.num_objs = Striper::get_num_objects(ictx->layout, info.size);
info.order = obj_order;
memcpy(&info.block_name_prefix, ictx->object_prefix.c_str(),
- min((size_t)RBD_MAX_BLOCK_NAME_SIZE,
min((size_t)RBD_MAX_BLOCK_NAME_SIZE + 1,
ictx->object_prefix.length() + 1));
// clear deprecated fields
info.parent_pool = 1L;
diff --git a/src/tools/rbd/action/Status.cc b/src/tools/rbd/action/Status.cc
index ab37bc8..b4980dd 100644
--
a/src/tools/rbd/action/Status.cc
+++ b/src/tools/rbd/action/Status.cc
@ -38,9 +38,9 @ static int do_show_status(librados::IoCtx &io_ctx, librbd::Image &image,
if (r < 0)
return r;

- char prefix[RBD_MAX_BLOCK_NAME_SIZE + 1];
- strncpy(prefix, info.block_name_prefix, RBD_MAX_BLOCK_NAME_SIZE);
- prefix[RBD_MAX_BLOCK_NAME_SIZE] = '\0';
+ char prefix[RBD_MAX_BLOCK_NAME_SIZE + 2];
+ strncpy(prefix, info.block_name_prefix, RBD_MAX_BLOCK_NAME_SIZE + 1);
+ prefix[RBD_MAX_BLOCK_NAME_SIZE + 1] = '\0';

header_oid = RBD_HEADER_PREFIX;
header_oid.append(prefix + strlen(RBD_DATA_PREFIX));

#8 Updated by Kjetil Joergensen 10 months ago

Less broken formatting:

diff --git a/src/librbd/internal.cc b/src/librbd/internal.cc
index 13682df..44d82ad 100644
--- a/src/librbd/internal.cc
+++ b/src/librbd/internal.cc
@@ -477,7 +477,7 @@ int mirror_image_disable_internal(ImageCtx *ictx, bool force,
     info.num_objs = Striper::get_num_objects(ictx->layout, info.size);
     info.order = obj_order;
     memcpy(&info.block_name_prefix, ictx->object_prefix.c_str(),
-          min((size_t)RBD_MAX_BLOCK_NAME_SIZE,
+          min((size_t)RBD_MAX_BLOCK_NAME_SIZE + 1,
               ictx->object_prefix.length() + 1));
     // clear deprecated fields
     info.parent_pool = -1L;
diff --git a/src/tools/rbd/action/Status.cc b/src/tools/rbd/action/Status.cc
index ab37bc8..b4980dd 100644
--- a/src/tools/rbd/action/Status.cc
+++ b/src/tools/rbd/action/Status.cc
@@ -38,9 +38,9 @@ static int do_show_status(librados::IoCtx &io_ctx, librbd::Image &image,
     if (r < 0)
       return r;

-    char prefix[RBD_MAX_BLOCK_NAME_SIZE + 1];
-    strncpy(prefix, info.block_name_prefix, RBD_MAX_BLOCK_NAME_SIZE);
-    prefix[RBD_MAX_BLOCK_NAME_SIZE] = '\0';
+    char prefix[RBD_MAX_BLOCK_NAME_SIZE + 2];
+    strncpy(prefix, info.block_name_prefix, RBD_MAX_BLOCK_NAME_SIZE + 1);
+    prefix[RBD_MAX_BLOCK_NAME_SIZE + 1] = '\0';

     header_oid = RBD_HEADER_PREFIX;
     header_oid.append(prefix + strlen(RBD_DATA_PREFIX));

#9 Updated by Kjetil Joergensen 10 months ago

Patch is for illustrative purposes - not intended as a solution.

#10 Updated by Jason Dillaman 10 months ago

  • Status changed from New to In Progress
  • Assignee set to Jason Dillaman
  • Backport set to jewel

#11 Updated by Kjetil Joergensen 10 months ago

My assertions of this being jewel specific should be treated with a grain of salt, it may well be that we tipped over the limit after we switched to jewel, rather than this being jewel specific.

#12 Updated by Jason Dillaman 10 months ago

  • Backport changed from jewel to jewel,hammer

This is a very old issue that could hit if the combination of a client's (global) instance id concatenated with a potentially large randomly generated number overflows the fixed API size limits.

#13 Updated by Jason Dillaman 10 months ago

  • Status changed from In Progress to Need Review

PR: https://github.com/ceph/ceph/pull/10581

Won't solve the issue for existing images since the API uses fixed width arrays, but it will stop new images from being created with too large of an id.

#14 Updated by Mykola Golub 10 months ago

  • Status changed from Need Review to Pending Backport

#15 Updated by Loic Dachary 10 months ago

  • Copied to Backport #16951: jewel: ceph 10.2.2 rbd status on image format 2 returns "(2) No such file or directory" added

#16 Updated by Loic Dachary 10 months ago

  • Copied to Backport #16952: hammer: ceph 10.2.2 rbd status on image format 2 returns "(2) No such file or directory" added

#17 Updated by Kjetil Joergensen 10 months ago

Reading into this - librbd/rbd having generated slightly out-of-spec id's is our mess to deal with ? (We get to maintain a set of out-of-tree re-implementations of bits of librbd to do the interactions we need or re-make these images with a less broken librbd).

#18 Updated by Jason Dillaman 10 months ago

@Kjetil: unfortunately we cannot just increase the length rbd_image_info_t::block_name_prefix [1] since that would break the API. The next best alternative would be to introduce a new API method as a new feature ticket to retrieve the image id that doesn't suffer from the same length limitation.

[1] https://github.com/ceph/ceph/blob/master/src/include/rbd/librbd.h#L93

#19 Updated by Nathan Cutler 6 months ago

  • Status changed from Pending Backport to Resolved
  • Needs Doc set to No

#20 Updated by Ilya Dryomov 4 months ago

  • Related to Bug #18653: Improve compatibility between librbd + krbd for the data pool added

#21 Updated by Nathan Cutler 27 days ago

Since master is a moving target, this [1] might be a better ("permanent") URL to replace the one in #16887-18

[1] https://github.com/ceph/ceph/blob/e3d1a0d069c70f820e46d0f4badc03d949bbb90c/src/include/rbd/librbd.h#L93

Also available in: Atom PDF