Project

General

Profile

Actions

Bug #52637

closed

IO cannot return to normal even though qos is removed

Added by wb song over 2 years ago. Updated over 2 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Set rbd_qos_bps_limit to a smaller value, and then remove it, the IO cannot return to normal.

Steps to reproduce:
1. rbd bench --io-type write --io-total 10G img1 --debug-rbd 15
2. rbd image-meta set img1 conf_rbd_qos_bps_limit 10 # execute on another client
3. rbd image-meta remove img1 conf_rbd_qos_bps_limit
Although rbd_qos_bps_limit has been removed, the IO of the bencher still hangs.

log:

$ rbd bench --io-type write --io-total 10G img1 --debug-rbd 15
2021-09-16T18:53:11.237+0800 7f441ed32cc0  5 librbd::io::Dispatcher: 0x5632253434d0 register_dispatch: dispatch_layer=8
2021-09-16T18:53:11.237+0800 7f441ed32cc0  5 librbd::io::QueueImageDispatch: 0x563225341920 QueueImageDispatch: ictx=0x5632252fd760
2021-09-16T18:53:11.237+0800 7f441ed32cc0  5 librbd::io::Dispatcher: 0x5632253434d0 register_dispatch: dispatch_layer=1
2021-09-16T18:53:11.237+0800 7f441ed32cc0  5 librbd::io::QosImageDispatch: 0x5632253420b0 QosImageDispatch: ictx=0x5632252fd760
2021-09-16T18:53:11.237+0800 7f441ed32cc0  5 librbd::io::Dispatcher: 0x5632253434d0 register_dispatch: dispatch_layer=2
2021-09-16T18:53:11.237+0800 7f441ed32cc0  5 librbd::io::RefreshImageDispatch: 0x5632253429e0 RefreshImageDispatch: ictx=0x5632252fd760
2021-09-16T18:53:11.237+0800 7f441ed32cc0  5 librbd::io::Dispatcher: 0x5632253434d0 register_dispatch: dispatch_layer=4
2021-09-16T18:53:11.237+0800 7f441ed32cc0  5 librbd::io::WriteBlockImageDispatch: 0x5632253464b0 WriteBlockImageDispatch: ictx=0x5632252fd760
2021-09-16T18:53:11.237+0800 7f441ed32cc0  5 librbd::io::Dispatcher: 0x5632253434d0 register_dispatch: dispatch_layer=6
2021-09-16T18:53:11.237+0800 7f441ed32cc0  5 librbd::io::Dispatcher: 0x5632253468a0 register_dispatch: dispatch_layer=6
2021-09-16T18:53:11.238+0800 7f441ed32cc0 10 librbd::ImageState: 0x563225341170 0x563225341170 send_open_unlock
2021-09-16T18:53:11.238+0800 7f441ed32cc0 10 librbd::image::OpenRequest: 0x563225347af0 send_v2_detect_header
2021-09-16T18:53:11.240+0800 7f44067fc700 10 librbd::image::OpenRequest: handle_v2_detect_header: r=0
2021-09-16T18:53:11.241+0800 7f44067fc700 10 librbd::image::OpenRequest: 0x563225347af0 send_v2_get_id
2021-09-16T18:53:11.241+0800 7f4405ffb700 10 librbd::image::OpenRequest: handle_v2_get_id: r=0
2021-09-16T18:53:11.242+0800 7f4405ffb700 10 librbd::image::OpenRequest: 0x563225347af0 send_v2_get_initial_metadata
2021-09-16T18:53:11.243+0800 7f44067fc700 10 librbd::image::OpenRequest: handle_v2_get_initial_metadata: r=0
2021-09-16T18:53:11.243+0800 7f44067fc700 10 librbd::image::OpenRequest: 0x563225347af0 send_v2_get_create_timestamp
2021-09-16T18:53:11.244+0800 7f4405ffb700 10 librbd::image::OpenRequest: 0x563225347af0 handle_v2_get_create_timestamp: r=0
2021-09-16T18:53:11.244+0800 7f4405ffb700 10 librbd::image::OpenRequest: 0x563225347af0 send_v2_get_access_modify_timestamp
2021-09-16T18:53:11.245+0800 7f44067fc700 10 librbd::image::OpenRequest: 0x563225347af0 handle_v2_get_access_modify_timestamp: r=0
2021-09-16T18:53:11.245+0800 7f44067fc700 10 librbd::image::OpenRequest: 0x563225347af0 send_v2_get_data_pool
2021-09-16T18:53:11.245+0800 7f4405ffb700 10 librbd::image::OpenRequest: 0x563225347af0 handle_v2_get_data_pool: r=0
2021-09-16T18:53:11.245+0800 7f4405ffb700 10 librbd::ImageCtx: init_layout stripe_unit 4194304 stripe_count 1 object_size 4194304 prefix rbd_data.100d57d75c40 format rbd_data.100d57d75c40.%016llx
2021-09-16T18:53:11.246+0800 7f4405ffb700 10 librbd::image::OpenRequest: 0x563225347af0 send_refresh
2021-09-16T18:53:11.246+0800 7f4405ffb700 10 librbd::ConfigWatcher: init:
2021-09-16T18:53:11.247+0800 7f4405ffb700 10 librbd::image::RefreshRequest: 0x7f43e400df30 send_v2_get_mutable_metadata
2021-09-16T18:53:11.248+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43e400df30 handle_v2_get_mutable_metadata: r=0
2021-09-16T18:53:11.249+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43e400df30 send_v2_get_parent: legacy=0
2021-09-16T18:53:11.250+0800 7f4405ffb700 10 librbd::image::RefreshRequest: 0x7f43e400df30 handle_v2_get_parent: r=0
2021-09-16T18:53:11.250+0800 7f4405ffb700 10 librbd::image::RefreshRequest: 0x7f43e400df30 send_v2_get_metadata
2021-09-16T18:53:11.250+0800 7f4405ffb700 15 librbd::image::GetMetadataRequest: 0x7f43e4018470 metadata_list: start_key=conf_
2021-09-16T18:53:11.251+0800 7f44067fc700 15 librbd::image::GetMetadataRequest: 0x7f43e4018470 handle_metadata_list: r=0
2021-09-16T18:53:11.251+0800 7f44067fc700 15 librbd::image::GetMetadataRequest: 0x7f43e4018470 finish: r=0
2021-09-16T18:53:11.251+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43e400df30 handle_v2_get_metadata: r=0
2021-09-16T18:53:11.251+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43e400df30 send_v2_get_pool_metadata
2021-09-16T18:53:11.251+0800 7f44067fc700 15 librbd::image::GetMetadataRequest: 0x7f43d00036f0 metadata_list: start_key=conf_
2021-09-16T18:53:11.251+0800 7f4405ffb700 15 librbd::image::GetMetadataRequest: 0x7f43d00036f0 handle_metadata_list: r=0
2021-09-16T18:53:11.252+0800 7f4405ffb700 15 librbd::image::GetMetadataRequest: 0x7f43d00036f0 finish: r=0
2021-09-16T18:53:11.252+0800 7f4405ffb700 10 librbd::image::RefreshRequest: 0x7f43e400df30 handle_v2_get_pool_metadata: r=0
2021-09-16T18:53:11.252+0800 7f4405ffb700  5 librbd::ImageCtx: 0x5632252fd760: disabling zero-copy writes
2021-09-16T18:53:11.252+0800 7f4405ffb700 10 librbd::image::RefreshRequest: 0x7f43e400df30 send_v2_get_group
2021-09-16T18:53:11.253+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43e400df30 handle_v2_get_group: r=0
2021-09-16T18:53:11.253+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43e400df30 send_v2_init_exclusive_lock
2021-09-16T18:53:11.253+0800 7f44067fc700 10 librbd::ExclusiveLock: 0x7f43d00032c0 init
2021-09-16T18:53:11.253+0800 7f44067fc700  5 librbd::io::WriteBlockImageDispatch: 0x5632253464b0 block_writes: 0x5632252fd760, num=1
2021-09-16T18:53:11.253+0800 7f44067fc700 10 librbd::io::WriteBlockImageDispatch: 0x5632253464b0 flush_io:
2021-09-16T18:53:11.253+0800 7f4405ffb700 10 librbd::ExclusiveLock: 0x7f43d00032c0 handle_init_complete: features=61
2021-09-16T18:53:11.253+0800 7f4405ffb700  5 librbd::io::Dispatcher: 0x5632253434d0 register_dispatch: dispatch_layer=3
2021-09-16T18:53:11.254+0800 7f44067fc700  5 librbd::io::WriteBlockImageDispatch: 0x5632253464b0 unblock_writes: 0x5632252fd760, num=0
2021-09-16T18:53:11.254+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43e400df30 handle_v2_init_exclusive_lock: r=0
2021-09-16T18:53:11.254+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43e400df30 send_v2_apply
2021-09-16T18:53:11.254+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43e400df30 handle_v2_apply
2021-09-16T18:53:11.254+0800 7f44067fc700 10 librbd::image::OpenRequest: handle_refresh: r=0
2021-09-16T18:53:11.254+0800 7f44067fc700 10 librbd::image::OpenRequest: send_init_plugin_registry: plugins=
2021-09-16T18:53:11.254+0800 7f44067fc700 10 librbd::image::OpenRequest: handle_init_plugin_registry: r=0
2021-09-16T18:53:11.254+0800 7f44067fc700 10 librbd::image::OpenRequest: 0x563225347af0 send_init_cache
2021-09-16T18:53:11.254+0800 7f44067fc700  5 librbd::cache::WriteAroundObjectDispatch: 0x7f43d0003820 init:
2021-09-16T18:53:11.254+0800 7f44067fc700  5 librbd::io::Dispatcher: 0x5632253468a0 register_dispatch: dispatch_layer=1
2021-09-16T18:53:11.254+0800 7f44067fc700 10 librbd::image::OpenRequest: 0x563225347af0 send_register_watch
2021-09-16T18:53:11.254+0800 7f44067fc700 10 librbd::Watcher: 0x7f43e4004cd0 register_watch:
2021-09-16T18:53:11.274+0800 7f44067fc700 10 librbd::Watcher: 0x7f43e4004cd0 handle_register_watch: r=0
2021-09-16T18:53:11.274+0800 7f44067fc700 10 librbd::image::OpenRequest: 0x563225347af0 handle_register_watch: r=0
2021-09-16T18:53:11.274+0800 7f44067fc700  5 librbd::io::SimpleSchedulerObjectDispatch: 0x7f43d00045a0 SimpleSchedulerObjectDispatch: ictx=0x5632252fd760
2021-09-16T18:53:11.274+0800 7f44067fc700  5 librbd::io::SimpleSchedulerObjectDispatch: 0x7f43d00045a0 init:
2021-09-16T18:53:11.274+0800 7f44067fc700  5 librbd::io::Dispatcher: 0x5632253468a0 register_dispatch: dispatch_layer=5
2021-09-16T18:53:11.274+0800 7f44067fc700 10 librbd::ImageState: 0x563225341170 0x563225341170 handle_open: r=0
2021-09-16T18:53:11.275+0800 7f441ed32cc0  5 librbd::exclusive_lock::ImageDispatch: 0x7f43e4016190 needs_exclusive_lock: exclusive lock required: delaying IO
2021-09-16T18:53:11.275+0800 7f441ed32cc0 10 librbd::ManagedLock: 0x7f43d00032d8 acquire_lock:
2021-09-16T18:53:11.275+0800 7f441ed32cc0 10 librbd::ManagedLock: 0x7f43d00032d8 send_acquire_lock:
2021-09-16T18:53:11.275+0800 7f44067fc700 10 librbd::ExclusiveLock: 0x7f43d00032c0 pre_acquire_lock_handler
2021-09-16T18:53:11.275+0800 7f44067fc700 10 librbd::exclusive_lock::PreAcquireRequest: 0x7f43d0002d20 send_prepare_lock:
2021-09-16T18:53:11.275+0800 7f44067fc700 10 librbd::ImageState: 0x563225341170 prepare_lock
2021-09-16T18:53:11.275+0800 7f44067fc700 10 librbd::ImageState: 0x563225341170 0x563225341170 send_prepare_lock_unlock
2021-09-16T18:53:11.275+0800 7f44067fc700 10 librbd::exclusive_lock::PreAcquireRequest: 0x7f43d0002d20 handle_prepare_lock: r=0
2021-09-16T18:53:11.275+0800 7f44067fc700 10 librbd::exclusive_lock::PreAcquireRequest: 0x7f43d0002d20 send_flush_notifies:
2021-09-16T18:53:11.275+0800 7f44067fc700 10 librbd::exclusive_lock::PreAcquireRequest: 0x7f43d0002d20 handle_flush_notifies:
2021-09-16T18:53:11.275+0800 7f44067fc700 10 librbd::ManagedLock: 0x7f43d00032d8 handle_pre_acquire_lock: r=0
2021-09-16T18:53:11.275+0800 7f44067fc700 10 librbd::managed_lock::AcquireRequest: 0x7f43d0003cc0 send_get_locker:
2021-09-16T18:53:11.275+0800 7f44067fc700 10 librbd::managed_lock::GetLockerRequest: 0x7f43d0002f60 send_get_lockers:
2021-09-16T18:53:11.276+0800 7f4405ffb700 10 librbd::managed_lock::GetLockerRequest: 0x7f43d0002f60 handle_get_lockers: r=0
2021-09-16T18:53:11.276+0800 7f4405ffb700 10 librbd::managed_lock::GetLockerRequest: 0x7f43d0002f60 finish: r=-2
2021-09-16T18:53:11.276+0800 7f4405ffb700 10 librbd::managed_lock::AcquireRequest: 0x7f43d0003cc0 handle_get_locker: r=-2
2021-09-16T18:53:11.276+0800 7f4405ffb700 10 librbd::managed_lock::AcquireRequest: 0x7f43d0003cc0 send_lock: entity=client.4139, cookie=auto 139929229213296
2021-09-16T18:53:11.298+0800 7f44067fc700 10 librbd::managed_lock::AcquireRequest: 0x7f43d0003cc0 handle_lock: r=0
2021-09-16T18:53:11.298+0800 7f44067fc700 10 librbd::ManagedLock: 0x7f43d00032d8 handle_acquire_lock: r=0
2021-09-16T18:53:11.298+0800 7f44067fc700  5 librbd::ManagedLock: 0x7f43d00032d8 handle_acquire_lock: successfully acquired exclusive lock
2021-09-16T18:53:11.298+0800 7f44067fc700 10 librbd::ExclusiveLock: 0x7f43d00032c0 post_acquire_lock_handler: r=0
2021-09-16T18:53:11.298+0800 7f44067fc700 10 librbd::exclusive_lock::PostAcquireRequest: 0x7f43d0004440 send_open_object_map:
2021-09-16T18:53:11.298+0800 7f44067fc700 10 librbd::object_map::RefreshRequest: 0x7f43d0004fc0 send_lock: oid=rbd_object_map.100d57d75c40
2021-09-16T18:53:11.298+0800 7f44067fc700 10 librbd::object_map::LockRequest: 0x7f43d00050d0 send_lock: oid=rbd_object_map.100d57d75c40
2021-09-16T18:53:11.326+0800 7f4405ffb700 10 librbd::object_map::LockRequest: 0x7f43d00050d0 handle_lock: r=0
2021-09-16T18:53:11.326+0800 7f4405ffb700 10 librbd::object_map::RefreshRequest: 0x7f43d0004fc0 handle_lock
2021-09-16T18:53:11.326+0800 7f4405ffb700 10 librbd::object_map::RefreshRequest: 0x7f43d0004fc0 send_load: oid=rbd_object_map.100d57d75c40
2021-09-16T18:53:11.327+0800 7f44067fc700 10 librbd::object_map::RefreshRequest: 0x7f43d0004fc0 handle_load: r=0
2021-09-16T18:53:11.327+0800 7f44067fc700 10 librbd::exclusive_lock::PostAcquireRequest: 0x7f43d0004440 handle_open_object_map: r=0
2021-09-16T18:53:11.327+0800 7f44067fc700 10 librbd::ExclusiveLock: 0x7f43d00032c0 handle_post_acquiring_lock
2021-09-16T18:53:11.327+0800 7f44067fc700 10 librbd::ImageState: 0x563225341170 handle_prepare_lock_complete
2021-09-16T18:53:11.327+0800 7f44067fc700 10 librbd::exclusive_lock::PostAcquireRequest: 0x7f43d0004440 send_open_image_cache:
2021-09-16T18:53:11.327+0800 7f4405ffb700 10 librbd::exclusive_lock::PostAcquireRequest: 0x7f43d0004440 handle_open_image_cache: r=0
2021-09-16T18:53:11.327+0800 7f4405ffb700 10 librbd::ExclusiveLock: 0x7f43d00032c0 handle_post_acquired_lock: r=0
2021-09-16T18:53:11.327+0800 7f4405ffb700 10 librbd::ImageWatcher: 0x7f43e4004cd0 notify acquired lock
2021-09-16T18:53:11.327+0800 7f4405ffb700 10 librbd::ImageWatcher: 0x7f43e4004cd0 current lock owner: [4139,139929229213296]
2021-09-16T18:53:11.327+0800 7f4405ffb700 10 librbd::ManagedLock: 0x7f43d00032d8 handle_post_acquire_lock: r=0
2021-09-16T18:53:11.327+0800 7f4405ffb700  5 librbd::exclusive_lock::ImageDispatch: 0x7f43e4016190 handle_acquire_lock: r=0
bench  type write io_size 4096 io_threads 16 bytes 10737418240 pattern sequential
  SEC       OPS   OPS/SEC   BYTES/SEC
2021-09-16T18:53:11.328+0800 7f44067fc700  5 librbd::Watcher: 0x7f43e4004cd0 notifications_blocked: blocked=0
2021-09-16T18:53:11.328+0800 7f44067fc700 10 librbd::Watcher::C_NotifyAck 0x7f43d0004b30 C_NotifyAck: id=51539607567, handle=139929229213296
2021-09-16T18:53:11.328+0800 7f44067fc700 10 librbd::ImageWatcher: 0x7f43e4004cd0 image exclusively locked announcement
2021-09-16T18:53:11.328+0800 7f44067fc700 10 librbd::ImageWatcher: 0x7f43e4004cd0 current lock owner: [4139,139929229213296]
2021-09-16T18:53:11.328+0800 7f44067fc700 10 librbd::Watcher::C_NotifyAck 0x7f43d0004b30 finish: r=0
2021-09-16T18:53:11.328+0800 7f4405ffb700 10 librbd::io::ImageRequest: 0x7f4405ff5e50 update_timestamp: aio_write
    1      2544   2554.89    10 MiB/s
    2      5168    2593.3    10 MiB/s
    3      7744   2583.22    10 MiB/s
    4     10272   2572.64    10 MiB/s
    5     12832   2568.06    10 MiB/s
    6     15360    2563.2    10 MiB/s
    7     17888   2542.98   9.9 MiB/s
    8     20448    2540.8   9.9 MiB/s
    9     22976   2539.28   9.9 MiB/s
   10     25504    2534.4   9.9 MiB/s
2021-09-16T18:53:21.442+0800 7f44067fc700  5 librbd::Watcher: 0x7f43e4004cd0 notifications_blocked: blocked=0            # --> client B: rbd image-meta set img1 conf_rbd_qos_bps_limit 10
2021-09-16T18:53:21.442+0800 7f44067fc700 10 librbd::Watcher::C_NotifyAck 0x7f43d000a630 C_NotifyAck: id=51539607568, handle=139929229213296
2021-09-16T18:53:21.442+0800 7f44067fc700 10 librbd::ImageWatcher: 0x7f43e4004cd0 remote metadata_set request: key=conf_rbd_qos_bps_limit, value=10
2021-09-16T18:53:21.442+0800 7f44067fc700  5 librbd::Operations: 0x563225341b00 execute_metadata_set: key=conf_rbd_qos_bps_limit, value=10
2021-09-16T18:53:21.551+0800 7f44067fc700 10 librbd::Request: 0x7f43d0040eb0 finish_and_destroy: r=0
2021-09-16T18:53:21.551+0800 7f44067fc700 10 librbd::Request: 0x7f43d0040eb0 finish: r=0
2021-09-16T18:53:21.552+0800 7f44067fc700 10 librbd::ImageWatcher: 0x7f43e4004cd0: notify_header_update
2021-09-16T18:53:21.552+0800 7f44067fc700 15 librbd::io::RefreshImageDispatch: 0x5632253429e0 needs_refresh: on_dispatched=0x5632253564f0
2021-09-16T18:53:21.552+0800 7f44067fc700 10 librbd::ImageState: 0x563225341170 0x563225341170 send_refresh_unlock
2021-09-16T18:53:21.552+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43d000cf30 send_v2_get_mutable_metadata
2021-09-16T18:53:21.553+0800 7f44067fc700 15 librbd::io::RefreshImageDispatch: 0x5632253429e0 needs_refresh: on_dispatched=0x563225356e20
2021-09-16T18:53:21.553+0800 7f44067fc700 15 librbd::io::RefreshImageDispatch: 0x5632253429e0 needs_refresh: on_dispatched=0x563225357030
2021-09-16T18:53:21.553+0800 7f44067fc700 15 librbd::io::RefreshImageDispatch: 0x5632253429e0 needs_refresh: on_dispatched=0x563225355cb0
2021-09-16T18:53:21.553+0800 7f44067fc700 15 librbd::io::RefreshImageDispatch: 0x5632253429e0 needs_refresh: on_dispatched=0x56322534cc20
2021-09-16T18:53:21.553+0800 7f44067fc700  5 librbd::Watcher: 0x7f43e4004cd0 notifications_blocked: blocked=0
2021-09-16T18:53:21.553+0800 7f44067fc700 10 librbd::Watcher::C_NotifyAck 0x7f43d0007da0 C_NotifyAck: id=51539607569, handle=139929229213296
2021-09-16T18:53:21.553+0800 7f44067fc700 10 librbd::ImageWatcher: 0x7f43e4004cd0 image header updated
2021-09-16T18:53:21.553+0800 7f44067fc700 10 librbd::ImageWatcher: 0x7f43d0036440 C_ResponseMessage: r=0
2021-09-16T18:53:21.553+0800 7f44067fc700 10 librbd::Watcher::C_NotifyAck 0x7f43d0007da0 finish: r=0
2021-09-16T18:53:21.554+0800 7f44067fc700 15 librbd::io::RefreshImageDispatch: 0x5632253429e0 needs_refresh: on_dispatched=0x56322534d000
2021-09-16T18:53:21.554+0800 7f4405ffb700 15 librbd::io::RefreshImageDispatch: 0x5632253429e0 needs_refresh: on_dispatched=0x563225356770
2021-09-16T18:53:21.554+0800 7f44067fc700 15 librbd::io::RefreshImageDispatch: 0x5632253429e0 needs_refresh: on_dispatched=0x563225356ee0
2021-09-16T18:53:21.554+0800 7f44067fc700 15 librbd::io::RefreshImageDispatch: 0x5632253429e0 needs_refresh: on_dispatched=0x563225356950
2021-09-16T18:53:21.554+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43d000cf30 handle_v2_get_mutable_metadata: r=0
2021-09-16T18:53:21.554+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43d000cf30 send_v2_get_parent: legacy=0
2021-09-16T18:53:21.555+0800 7f44067fc700 15 librbd::io::RefreshImageDispatch: 0x5632253429e0 needs_refresh: on_dispatched=0x563225357190
2021-09-16T18:53:21.555+0800 7f44067fc700 15 librbd::io::RefreshImageDispatch: 0x5632253429e0 needs_refresh: on_dispatched=0x563225356890
2021-09-16T18:53:21.555+0800 7f44067fc700 15 librbd::io::RefreshImageDispatch: 0x5632253429e0 needs_refresh: on_dispatched=0x563225356230
2021-09-16T18:53:21.556+0800 7f44067fc700 15 librbd::io::RefreshImageDispatch: 0x5632253429e0 needs_refresh: on_dispatched=0x563225354930
2021-09-16T18:53:21.556+0800 7f44067fc700 15 librbd::io::RefreshImageDispatch: 0x5632253429e0 needs_refresh: on_dispatched=0x563225355910
2021-09-16T18:53:21.556+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43d000cf30 handle_v2_get_parent: r=0
2021-09-16T18:53:21.556+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43d000cf30 send_v2_get_metadata
2021-09-16T18:53:21.556+0800 7f44067fc700 15 librbd::image::GetMetadataRequest: 0x7f43d002d130 metadata_list: start_key=conf_
2021-09-16T18:53:21.556+0800 7f44067fc700 15 librbd::io::RefreshImageDispatch: 0x5632253429e0 needs_refresh: on_dispatched=0x563225356430
2021-09-16T18:53:21.556+0800 7f4405ffb700 10 librbd::ImageWatcher: 0x7f43d0028860 C_ResponseMessage: r=0
2021-09-16T18:53:21.556+0800 7f4405ffb700 10 librbd::Watcher::C_NotifyAck 0x7f43d000a630 finish: r=0
2021-09-16T18:53:21.556+0800 7f44067fc700 15 librbd::io::RefreshImageDispatch: 0x5632253429e0 needs_refresh: on_dispatched=0x5632253565d0
2021-09-16T18:53:21.557+0800 7f44067fc700 15 librbd::image::GetMetadataRequest: 0x7f43d002d130 handle_metadata_list: r=0
2021-09-16T18:53:21.557+0800 7f44067fc700 15 librbd::image::GetMetadataRequest: 0x7f43d002d130 finish: r=0
2021-09-16T18:53:21.557+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43d000cf30 handle_v2_get_metadata: r=0
2021-09-16T18:53:21.557+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43d000cf30 send_v2_get_pool_metadata
2021-09-16T18:53:21.557+0800 7f44067fc700 15 librbd::image::GetMetadataRequest: 0x7f43d000c2c0 metadata_list: start_key=conf_
2021-09-16T18:53:21.558+0800 7f4405ffb700 15 librbd::image::GetMetadataRequest: 0x7f43d000c2c0 handle_metadata_list: r=0
2021-09-16T18:53:21.558+0800 7f4405ffb700 15 librbd::image::GetMetadataRequest: 0x7f43d000c2c0 finish: r=0
2021-09-16T18:53:21.558+0800 7f4405ffb700 10 librbd::image::RefreshRequest: 0x7f43d000cf30 handle_v2_get_pool_metadata: r=0
2021-09-16T18:53:21.559+0800 7f4405ffb700 10 librbd::image::RefreshRequest: 0x7f43d000cf30 send_v2_get_group
2021-09-16T18:53:21.559+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43d000cf30 handle_v2_get_group: r=0
2021-09-16T18:53:21.559+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43d000cf30 send_v2_apply
2021-09-16T18:53:21.559+0800 7f4405ffb700 10 librbd::image::RefreshRequest: 0x7f43d000cf30 handle_v2_apply
2021-09-16T18:53:21.560+0800 7f4405ffb700 10 librbd::ImageState: 0x563225341170 0x563225341170 handle_refresh: r=0
2021-09-16T18:53:21.560+0800 7f4405ffb700 10 librbd::ImageState: 0x563225341170 0x563225341170 send_refresh_unlock
2021-09-16T18:53:21.560+0800 7f4405ffb700 10 librbd::image::RefreshRequest: 0x7f43e401ed50 send_v2_get_mutable_metadata
2021-09-16T18:53:21.560+0800 7f4405ffb700 15 librbd::io::QosImageDispatch: 0x5632253420b0 needs_throttle: on_dispatched=0x563225355f10, flag=2
2021-09-16T18:53:21.561+0800 7f44067fc700 15 librbd::io::QosImageDispatch: 0x5632253420b0 needs_throttle: on_dispatched=0x5632253564f0, flag=2
2021-09-16T18:53:21.561+0800 7f44067fc700 15 librbd::io::QosImageDispatch: 0x5632253420b0 needs_throttle: on_dispatched=0x563225356e20, flag=2
2021-09-16T18:53:21.561+0800 7f44067fc700 15 librbd::io::QosImageDispatch: 0x5632253420b0 needs_throttle: on_dispatched=0x563225357030, flag=2
2021-09-16T18:53:21.561+0800 7f44067fc700 15 librbd::io::QosImageDispatch: 0x5632253420b0 needs_throttle: on_dispatched=0x563225355cb0, flag=2
2021-09-16T18:53:21.562+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43e401ed50 handle_v2_get_mutable_metadata: r=0
2021-09-16T18:53:21.562+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43e401ed50 send_v2_get_parent: legacy=0
2021-09-16T18:53:21.563+0800 7f4405ffb700 10 librbd::image::RefreshRequest: 0x7f43e401ed50 handle_v2_get_parent: r=0
2021-09-16T18:53:21.563+0800 7f4405ffb700 10 librbd::image::RefreshRequest: 0x7f43e401ed50 send_v2_get_metadata
2021-09-16T18:53:21.563+0800 7f4405ffb700 15 librbd::image::GetMetadataRequest: 0x7f43e4020330 metadata_list: start_key=conf_
2021-09-16T18:53:21.564+0800 7f44067fc700 15 librbd::image::GetMetadataRequest: 0x7f43e4020330 handle_metadata_list: r=0
2021-09-16T18:53:21.564+0800 7f44067fc700 15 librbd::image::GetMetadataRequest: 0x7f43e4020330 finish: r=0
2021-09-16T18:53:21.564+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43e401ed50 handle_v2_get_metadata: r=0
2021-09-16T18:53:21.564+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43e401ed50 send_v2_get_pool_metadata
2021-09-16T18:53:21.564+0800 7f44067fc700 15 librbd::image::GetMetadataRequest: 0x7f43d00426d0 metadata_list: start_key=conf_
2021-09-16T18:53:21.564+0800 7f4405ffb700 15 librbd::image::GetMetadataRequest: 0x7f43d00426d0 handle_metadata_list: r=0
2021-09-16T18:53:21.564+0800 7f4405ffb700 15 librbd::image::GetMetadataRequest: 0x7f43d00426d0 finish: r=0
2021-09-16T18:53:21.564+0800 7f4405ffb700 10 librbd::image::RefreshRequest: 0x7f43e401ed50 handle_v2_get_pool_metadata: r=0
2021-09-16T18:53:21.565+0800 7f4405ffb700 10 librbd::image::RefreshRequest: 0x7f43e401ed50 send_v2_get_group
2021-09-16T18:53:21.613+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43e401ed50 handle_v2_get_group: r=0
2021-09-16T18:53:21.613+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43e401ed50 send_v2_apply
2021-09-16T18:53:21.613+0800 7f44067fc700 10 librbd::image::RefreshRequest: 0x7f43e401ed50 handle_v2_apply
2021-09-16T18:53:21.613+0800 7f44067fc700 10 librbd::ImageState: 0x563225341170 0x563225341170 handle_refresh: r=0
2021-09-16T18:53:21.616+0800 7f4405ffb700 15 librbd::io::QosImageDispatch: 0x5632253420b0 needs_throttle: on_dispatched=0x56322534cc20, flag=2
2021-09-16T18:53:21.616+0800 7f4405ffb700 15 librbd::io::QosImageDispatch: 0x5632253420b0 needs_throttle: on_dispatched=0x56322534d000, flag=2
2021-09-16T18:53:21.616+0800 7f4405ffb700 15 librbd::io::QosImageDispatch: 0x5632253420b0 needs_throttle: on_dispatched=0x563225356770, flag=2
2021-09-16T18:53:21.616+0800 7f4405ffb700 15 librbd::io::QosImageDispatch: 0x5632253420b0 needs_throttle: on_dispatched=0x563225356950, flag=2
2021-09-16T18:53:21.616+0800 7f4405ffb700 15 librbd::io::QosImageDispatch: 0x5632253420b0 needs_throttle: on_dispatched=0x563225356ee0, flag=2
2021-09-16T18:53:21.617+0800 7f4405ffb700 15 librbd::io::QosImageDispatch: 0x5632253420b0 needs_throttle: on_dispatched=0x563225357190, flag=2
2021-09-16T18:53:21.617+0800 7f4405ffb700 15 librbd::io::QosImageDispatch: 0x5632253420b0 needs_throttle: on_dispatched=0x563225356890, flag=2
2021-09-16T18:53:21.617+0800 7f4405ffb700 15 librbd::io::QosImageDispatch: 0x5632253420b0 needs_throttle: on_dispatched=0x563225356230, flag=2
2021-09-16T18:53:21.617+0800 7f4405ffb700 15 librbd::io::QosImageDispatch: 0x5632253420b0 needs_throttle: on_dispatched=0x563225354930, flag=2
2021-09-16T18:53:21.617+0800 7f4405ffb700 15 librbd::io::QosImageDispatch: 0x5632253420b0 needs_throttle: on_dispatched=0x563225355910, flag=2
2021-09-16T18:53:21.617+0800 7f4405ffb700 15 librbd::io::QosImageDispatch: 0x5632253420b0 needs_throttle: on_dispatched=0x563225356430, flag=2

2021-09-16T18:53:49.597+0800 7f4405ffb700  5 librbd::Watcher: 0x7f43e4004cd0 notifications_blocked: blocked=0            # --> client B: rbd image-meta remove img1 conf_rbd_qos_bps_limit
2021-09-16T18:53:49.597+0800 7f4405ffb700 10 librbd::Watcher::C_NotifyAck 0x7f43e4014fb0 C_NotifyAck: id=51539607570, handle=139929229213296
2021-09-16T18:53:49.597+0800 7f4405ffb700 10 librbd::ImageWatcher: 0x7f43e4004cd0 remote metadata_remove request: key=conf_rbd_qos_bps_limit
2021-09-16T18:53:49.597+0800 7f4405ffb700  5 librbd::Operations: 0x563225341b00 execute_metadata_remove: key=conf_rbd_qos_bps_limit
2021-09-16T18:53:49.608+0800 7f44067fc700 10 librbd::Request: 0x7f43e4018720 finish_and_destroy: r=0
2021-09-16T18:53:49.608+0800 7f44067fc700 10 librbd::Request: 0x7f43e4018720 finish: r=0
2021-09-16T18:53:49.608+0800 7f44067fc700 10 librbd::ImageWatcher: 0x7f43e4004cd0: notify_header_update
2021-09-16T18:53:49.608+0800 7f4405ffb700  5 librbd::Watcher: 0x7f43e4004cd0 notifications_blocked: blocked=0
2021-09-16T18:53:49.608+0800 7f4405ffb700 10 librbd::Watcher::C_NotifyAck 0x7f43e4015920 C_NotifyAck: id=51539607571, handle=139929229213296
2021-09-16T18:53:49.608+0800 7f4405ffb700 10 librbd::ImageWatcher: 0x7f43e4004cd0 image header updated
2021-09-16T18:53:49.609+0800 7f4405ffb700 10 librbd::ImageWatcher: 0x7f43e402f9b0 C_ResponseMessage: r=0
2021-09-16T18:53:49.609+0800 7f4405ffb700 10 librbd::Watcher::C_NotifyAck 0x7f43e4015920 finish: r=0
2021-09-16T18:53:49.610+0800 7f44067fc700 10 librbd::ImageWatcher: 0x7f43e4011280 C_ResponseMessage: r=0
2021-09-16T18:53:49.610+0800 7f44067fc700 10 librbd::Watcher::C_NotifyAck 0x7f43e4014fb0 finish: r=0

# IO still hungs ...

^C2021-09-16T18:54:27.856+0800 7f43ecff9700 -1 received  signal: Interrupt, si_code : 128, si_value (int): 257, si_value (ptr): 0x101, si_errno: 0, si_pid : 0, si_uid : 0, si_addr0, si_status257
^C
$

Actions #1

Updated by Ilya Dryomov over 2 years ago

After the first "rbd image-meta set" command, client A clearly picked up the header update notification (send_v2_get_mutable_metadata, send_v2_get_parent, send_v2_get_metadata, etc calls). After the second "rbd image-meta set" command, the excerpt ends at notify_header_update. Did you wait for the header updated notification to be acted on or is it never picked up?

Actions #2

Updated by wb song over 2 years ago

Ilya Dryomov wrote:

After the first "rbd image-meta set" command, client A clearly picked up the header update notification (send_v2_get_mutable_metadata, send_v2_get_parent, send_v2_get_metadata, etc calls). After the second "rbd image-meta set" command, the excerpt ends at notify_header_update. Did you wait for the header updated notification to be acted on or is it never picked up?

After the second setup, I waited 38 seconds and found that client A never picked up the header update notification. Should I wait longer? In fact, I sometimes waited more than ten minutes and still didn't receive any update notification.
Thanks.

Actions #3

Updated by Ilya Dryomov over 2 years ago

It could be that the header update notification gets "behind" all the pending I/O. You set an extremely low limit (just 10 bytes per second), so it could be related to that.

I would retest with a more realistic limit, for example 1M per second and wait long enough for all pending I/O to get through given that limit. This should tell whether it is some sort of ordering/prioritization issue or a permanent lockup.

Actions #4

Updated by wb song over 2 years ago

Ilya Dryomov wrote:

This should tell whether it is some sort of ordering/prioritization issue or a permanent lockup.

It is some sort of ordering/prioritization issue.

I think the header update notification is a control plane message with a higher priority and should not be ranked behind pending I/O. What do you think?

Actions #5

Updated by Ilya Dryomov over 2 years ago

wb song wrote:

It is some sort of ordering/prioritization issue.

So it doesn't always get "stuck"? Can you describe what you tried and your findings in more detail?

I think the header update notification is a control plane message with a higher priority and should not be ranked behind pending I/O. What do you think?

I agree but it could be that the new limit is applied to future I/O while the existing in-flight I/O that has already went through the QOS layer is processed according to the old limit. If the old limit is extremely low it can take a long time and may feel like a hang.

Actions #6

Updated by wb song over 2 years ago

Ilya Dryomov wrote:

wb song wrote:

It is some sort of ordering/prioritization issue.

So it doesn't always get "stuck"? Can you describe what you tried and your findings in more detail?

I tried to set the bps to 4096, then change it to 10, and remove bps after a while. At this time, it seemed that the IO was hung up. After about 7 minutes, the IO returned to normal.

Actions #7

Updated by Ilya Dryomov over 2 years ago

Right, so I don't think there is a bug here. The update just doesn't affect the I/O that is already in flight (read: has been throttled and contained in the QOS queue).

Actions #8

Updated by wb song over 2 years ago

Ilya Dryomov wrote:

Right, so I don't think there is a bug here. The update just doesn't affect the I/O that is already in flight (read: has been throttled and contained in the QOS queue).

Got it, please close it. Thanks.

Actions #9

Updated by Ilya Dryomov over 2 years ago

  • Status changed from New to Won't Fix
  • Assignee set to Ilya Dryomov
Actions #10

Updated by Ilya Dryomov over 2 years ago

  • Subject changed from [qos] IO cannot return to normal even though qos is removed to IO cannot return to normal even though qos is removed
Actions

Also available in: Atom PDF