Project

General

Profile

Bug #36626

couldn't rewatch after network was blocked and client blacklisted

Added by wb song 10 months ago. Updated 4 months ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Target version:
Start date:
10/30/2018
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

ceph version: v12.2.9

reproduceļ¼š

1. map an image in client A

[root@client_A ~]# rbd nbd map pool_test/image_test
/dev/nbd0

2. map an image in client B
[root@client_B ~]# rbd nbd map pool_test/image_test
/dev/nbd0

3. write data in client B and press Ctrl+C to break (client B own the lock)
[root@client_B ~]# dd if=/dev/zero of=/dev/nbd0 bs=4K count=100000 oflag=direct
^C314+0 records in
314+0 records out
1286144 bytes (1.3 MB) copied, 3.87275 s, 332 kB/s

4. down client B's public network
[root@client_B ~]# ifconfig enp3s0f0 down;sleep 40;ifconfig enp3s0f0 up;

5. write data in client A before client B' network up (client B blacklisted)
[root@client_A ~]# dd if=/dev/zero of=/dev/nbd0 bs=4K count=100 oflag=direct
100+0 records in
100+0 records out
409600 bytes (410 kB) copied, 0.000626715 s, 654 MB/s

6. write data in client B
[root@client_B ~]# dd if=/dev/zero of=/dev/nbd0 bs=4K count=1000 oflag=direct        // hung
^C^C

client-rbd-337165.zip (546 KB) wb song, 04/12/2019 09:08 AM

History

#1 Updated by Jason Dillaman 4 months ago

  • Status changed from New to Need More Info

Can you provide any logs from step 6? I would expect that all the IOs would be failing w/ EBLACKLISTED. Perhaps it's just the cache continuously retrying upon failure? If you disable the cache via the "rbd cache = false" config option, does it still occur?

#2 Updated by wb song 4 months ago

It still occurs even rbd_cache was false.
the log file is a bit big and i deleted some content.

2019-04-12 16:08:24.114529 7f8c4effd700 10 librbd::ManagedLock: 0x7f8c4005e3d0 send_acquire_lock
2019-04-12 16:08:24.114531 7f8c4effd700 10 librbd::ImageWatcher: 0x7f8c4001b900 image header updated
2019-04-12 16:08:24.114532 7f8c4effd700 20 librbd::ImageState: 0x7f8c7b007660 handle_update_notification: refresh_seq = 311126, last_refresh = 311124
2019-04-12 16:08:24.114534 7f8c4effd700 20 librbd::ImageState: 0x7f8c7b0126c0 ImageUpdateWatchers::notify
2019-04-12 16:08:24.114535 7f8c4effd700 20 librbd::ImageState: 0x7f8c7b0126c0 ImageUpdateWatchers::send_notify: handle=0, watcher=0x7ffe29247c60
2019-04-12 16:08:24.114538 7f8c4effd700 10 librbd::Watcher: 0x7f8c4001b900 rewatch: 
2019-04-12 16:08:24.114540 7f8c4effd700 10 librbd::watcher::RewatchRequest: 0x7f8c480471d0 rewatch 
2019-04-12 16:08:24.114540 7f8c4cff9700 20 librbd::ImageState: 0x7f8c7b0126c0 ImageUpdateWatchers::handle_notify: handle=0, watcher=0x7ffe29247c60
2019-04-12 16:08:24.114543 7f8c4cff9700 20 librbd: info 0x7f8c7b011bc0
2019-04-12 16:08:24.114545 7f8c4cff9700 10 librbd::ImageState: 0x7f8c7b007660 0x7f8c7b007660 send_refresh_unlock
2019-04-12 16:08:24.114546 7f8c4cff9700 10 librbd::image::RefreshRequest: 0x7f8c44061810 send_v2_get_mutable_metadata
2019-04-12 16:08:24.114865 7f8c4f7fe700 10 librbd::watcher::RewatchRequest: 0x7f8c480471d0 handle_rewatch r=-108
2019-04-12 16:08:24.114869 7f8c4f7fe700 10 librbd::watcher::RewatchRequest: 0x7f8c480471d0 finish r=-108
2019-04-12 16:08:24.114870 7f8c4f7fe700 10 librbd::Watcher: 0x7f8c4001b900 handle_rewatch: r=-108
2019-04-12 16:08:24.114877 7f8c4effd700 10 librbd::Watcher: 0x7f8c4001b900 handle_rewatch_callback: r=-108
2019-04-12 16:08:24.114879 7f8c4effd700 10 librbd::ImageWatcher: 0x7f8c4001b900 handle_rewatch_complete: r=-108
2019-04-12 16:08:24.114880 7f8c4effd700 10 librbd::ManagedLock: 0x7f8c4005e3d0 reacquire_lock: woke up waiting acquire
2019-04-12 16:08:24.114881 7f8c4effd700 10 librbd::ManagedLock: 0x7f8c4005e3d0 send_acquire_lock
2019-04-12 16:08:24.114883 7f8c4effd700 10 librbd::ImageWatcher: 0x7f8c4001b900 image header updated
2019-04-12 16:08:24.114884 7f8c4effd700 20 librbd::ImageState: 0x7f8c7b007660 handle_update_notification: refresh_seq = 311127, last_refresh = 311124
2019-04-12 16:08:24.114886 7f8c4effd700 10 librbd::Watcher: 0x7f8c4001b900 rewatch: 
2019-04-12 16:08:24.114887 7f8c4effd700 10 librbd::watcher::RewatchRequest: 0x7f8c480c6660 rewatch 
2019-04-12 16:08:24.114931 7f8c4f7fe700 10 librbd::image::RefreshRequest: 0x7f8c44061810 handle_v2_get_mutable_metadata: r=-108
2019-04-12 16:08:24.114939 7f8c4effd700 10 librbd::ImageState: 0x7f8c7b007660 0x7f8c7b007660 handle_refresh: r=-108
2019-04-12 16:08:24.115164 7f8c4f7fe700 10 librbd::watcher::RewatchRequest: 0x7f8c480c6660 handle_rewatch r=-108
2019-04-12 16:08:24.115166 7f8c4f7fe700 10 librbd::watcher::RewatchRequest: 0x7f8c480c6660 finish r=-108
2019-04-12 16:08:24.115168 7f8c4f7fe700 10 librbd::Watcher: 0x7f8c4001b900 handle_rewatch: r=-108
2019-04-12 16:08:24.115174 7f8c4effd700 10 librbd::Watcher: 0x7f8c4001b900 handle_rewatch_callback: r=-108
2019-04-12 16:08:24.115175 7f8c4effd700 10 librbd::ImageWatcher: 0x7f8c4001b900 handle_rewatch_complete: r=-108
2019-04-12 16:08:24.115176 7f8c4effd700 10 librbd::ManagedLock: 0x7f8c4005e3d0 reacquire_lock: woke up waiting acquire
2019-04-12 16:08:24.115178 7f8c4effd700 10 librbd::ManagedLock: 0x7f8c4005e3d0 send_acquire_lock

#3 Updated by Jason Dillaman 4 months ago

The provided log doesn't show any attempt to write IO while the client is blacklisted. Was that part just snipped out of the log prior to 16:08:24?

Also available in: Atom PDF