Bug #36626
couldn't rewatch after network was blocked and client blacklisted
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
ceph version: v12.2.9
reproduceļ¼
1. map an image in client A
[root@client_A ~]# rbd nbd map pool_test/image_test /dev/nbd0
2. map an image in client B
[root@client_B ~]# rbd nbd map pool_test/image_test /dev/nbd0
3. write data in client B and press Ctrl+C to break (client B own the lock)
[root@client_B ~]# dd if=/dev/zero of=/dev/nbd0 bs=4K count=100000 oflag=direct ^C314+0 records in 314+0 records out 1286144 bytes (1.3 MB) copied, 3.87275 s, 332 kB/s
4. down client B's public network
[root@client_B ~]# ifconfig enp3s0f0 down;sleep 40;ifconfig enp3s0f0 up;
5. write data in client A before client B' network up (client B blacklisted)
[root@client_A ~]# dd if=/dev/zero of=/dev/nbd0 bs=4K count=100 oflag=direct 100+0 records in 100+0 records out 409600 bytes (410 kB) copied, 0.000626715 s, 654 MB/s
6. write data in client B
[root@client_B ~]# dd if=/dev/zero of=/dev/nbd0 bs=4K count=1000 oflag=direct // hung ^C^C
History
#1 Updated by Jason Dillaman over 4 years ago
- Status changed from New to Need More Info
Can you provide any logs from step 6? I would expect that all the IOs would be failing w/ EBLACKLISTED. Perhaps it's just the cache continuously retrying upon failure? If you disable the cache via the "rbd cache = false" config option, does it still occur?
#2 Updated by wb song over 4 years ago
- File client-rbd-337165.zip added
It still occurs even rbd_cache was false.
the log file is a bit big and i deleted some content.
2019-04-12 16:08:24.114529 7f8c4effd700 10 librbd::ManagedLock: 0x7f8c4005e3d0 send_acquire_lock 2019-04-12 16:08:24.114531 7f8c4effd700 10 librbd::ImageWatcher: 0x7f8c4001b900 image header updated 2019-04-12 16:08:24.114532 7f8c4effd700 20 librbd::ImageState: 0x7f8c7b007660 handle_update_notification: refresh_seq = 311126, last_refresh = 311124 2019-04-12 16:08:24.114534 7f8c4effd700 20 librbd::ImageState: 0x7f8c7b0126c0 ImageUpdateWatchers::notify 2019-04-12 16:08:24.114535 7f8c4effd700 20 librbd::ImageState: 0x7f8c7b0126c0 ImageUpdateWatchers::send_notify: handle=0, watcher=0x7ffe29247c60 2019-04-12 16:08:24.114538 7f8c4effd700 10 librbd::Watcher: 0x7f8c4001b900 rewatch: 2019-04-12 16:08:24.114540 7f8c4effd700 10 librbd::watcher::RewatchRequest: 0x7f8c480471d0 rewatch 2019-04-12 16:08:24.114540 7f8c4cff9700 20 librbd::ImageState: 0x7f8c7b0126c0 ImageUpdateWatchers::handle_notify: handle=0, watcher=0x7ffe29247c60 2019-04-12 16:08:24.114543 7f8c4cff9700 20 librbd: info 0x7f8c7b011bc0 2019-04-12 16:08:24.114545 7f8c4cff9700 10 librbd::ImageState: 0x7f8c7b007660 0x7f8c7b007660 send_refresh_unlock 2019-04-12 16:08:24.114546 7f8c4cff9700 10 librbd::image::RefreshRequest: 0x7f8c44061810 send_v2_get_mutable_metadata 2019-04-12 16:08:24.114865 7f8c4f7fe700 10 librbd::watcher::RewatchRequest: 0x7f8c480471d0 handle_rewatch r=-108 2019-04-12 16:08:24.114869 7f8c4f7fe700 10 librbd::watcher::RewatchRequest: 0x7f8c480471d0 finish r=-108 2019-04-12 16:08:24.114870 7f8c4f7fe700 10 librbd::Watcher: 0x7f8c4001b900 handle_rewatch: r=-108 2019-04-12 16:08:24.114877 7f8c4effd700 10 librbd::Watcher: 0x7f8c4001b900 handle_rewatch_callback: r=-108 2019-04-12 16:08:24.114879 7f8c4effd700 10 librbd::ImageWatcher: 0x7f8c4001b900 handle_rewatch_complete: r=-108 2019-04-12 16:08:24.114880 7f8c4effd700 10 librbd::ManagedLock: 0x7f8c4005e3d0 reacquire_lock: woke up waiting acquire 2019-04-12 16:08:24.114881 7f8c4effd700 10 librbd::ManagedLock: 0x7f8c4005e3d0 send_acquire_lock 2019-04-12 16:08:24.114883 7f8c4effd700 10 librbd::ImageWatcher: 0x7f8c4001b900 image header updated 2019-04-12 16:08:24.114884 7f8c4effd700 20 librbd::ImageState: 0x7f8c7b007660 handle_update_notification: refresh_seq = 311127, last_refresh = 311124 2019-04-12 16:08:24.114886 7f8c4effd700 10 librbd::Watcher: 0x7f8c4001b900 rewatch: 2019-04-12 16:08:24.114887 7f8c4effd700 10 librbd::watcher::RewatchRequest: 0x7f8c480c6660 rewatch 2019-04-12 16:08:24.114931 7f8c4f7fe700 10 librbd::image::RefreshRequest: 0x7f8c44061810 handle_v2_get_mutable_metadata: r=-108 2019-04-12 16:08:24.114939 7f8c4effd700 10 librbd::ImageState: 0x7f8c7b007660 0x7f8c7b007660 handle_refresh: r=-108 2019-04-12 16:08:24.115164 7f8c4f7fe700 10 librbd::watcher::RewatchRequest: 0x7f8c480c6660 handle_rewatch r=-108 2019-04-12 16:08:24.115166 7f8c4f7fe700 10 librbd::watcher::RewatchRequest: 0x7f8c480c6660 finish r=-108 2019-04-12 16:08:24.115168 7f8c4f7fe700 10 librbd::Watcher: 0x7f8c4001b900 handle_rewatch: r=-108 2019-04-12 16:08:24.115174 7f8c4effd700 10 librbd::Watcher: 0x7f8c4001b900 handle_rewatch_callback: r=-108 2019-04-12 16:08:24.115175 7f8c4effd700 10 librbd::ImageWatcher: 0x7f8c4001b900 handle_rewatch_complete: r=-108 2019-04-12 16:08:24.115176 7f8c4effd700 10 librbd::ManagedLock: 0x7f8c4005e3d0 reacquire_lock: woke up waiting acquire 2019-04-12 16:08:24.115178 7f8c4effd700 10 librbd::ManagedLock: 0x7f8c4005e3d0 send_acquire_lock
#3 Updated by Jason Dillaman over 4 years ago
The provided log doesn't show any attempt to write IO while the client is blacklisted. Was that part just snipped out of the log prior to 16:08:24?