Bug #22271
vdbench's IO drop to 0 when resize the image at the same time
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
[root~ ]# rbd create test1 -s 20G --image-feature exclusive-lock
[root~ ]# rbd nbd map test1
/dev/nbd0
[root~ ]#
Use Vdbench to write in /dev/nbd0 and on another client try to execute
rbd resize test1 -s 50G
, then the Vdbench's IOPS drop to 0 and resize
hung also. The Vdbench write script as follows:
hd=default,vdbench=/home/vdbench,user=root,shell=ssh
hd=hd1,system=localhost
sd=sd1,host=hd1,lun=/dev/nbd0,openflags=o_direct,thread=64
wd=wd1,sd=sd*,seekpct=100,rdpct=0,xfersize=4K
rd=rd1,wd=wd1,iorate=max,elapse=600,maxdata=2T,interval=1,warmup=30
It appears only image exclusive-lock was enabled and Vdbench thread > 1. My environment is the luminous branch(12.2.1-830-gecec659) and is easy to reproduce, It works fine on the latest master branch.
Also, I try to reproduce it by
rbd bench
、fio and dd, but all have succeed. I'm not understand why only Vdbench can got this? Maybe there exist a potential deadlock?
[root~ ]# ./vdbench -f 1M_rw_nbd
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
Vdbench distribution: vdbench Wed July 20 15:49:52 MDT 2016
For documentation, see 'vdbench.pdf'.
17:05:36.294 input argument scanned: '-f1M_rw_nbd'
17:05:36.379 Starting slave: /home/vdbench/vdbench SlaveJvm -m 192.175.10.234 -n localhost-10-171129-17.05.36.252 -l hd1-0 -p 5570
17:05:36.742 All slaves are now connected
17:05:38.003 Starting RD=rd1; I/O rate: Uncontrolled MAX; elapsed=600 warmup=30; For loops: None
Nov 29, 2017 interval i/o MB/sec bytes read resp read write resp resp queue cpu% cpu%
rate 1024**2 i/o pct time resp resp max stddev depth sys+u sys
17:05:39.055 1 33.00 0.13 4096 0.00 41.067 0.000 41.067 98.463 25.709 1.4 25.0 4.1
17:05:40.049 2 52.00 0.20 4096 0.00 37.399 0.000 37.399 93.148 20.634 2.0 22.0 5.1
17:05:41.049 3 40.00 0.16 4096 0.00 48.861 0.000 48.861 105.764 24.070 2.0 40.8 8.9
17:05:42.048 4 0.00 0.00 0 0.00 0.000 0.000 0.000 0.000 0.000 2.0 35.5 6.9
17:05:43.048 5 0.00 0.00 0 0.00 0.000 0.000 0.000 0.000 0.000 2.0 32.5 7.3
17:05:43.393
17:05:43.393 Message from slave hd1-0:
17:05:43.393 New messages found on /var/adm/messages. Do they belong to you?
17:05:43.393 /var/log/messages: Nov 29 17:05:40 s247 ceph-mgr[7365]: ::ffff:192.10.10.247 - - [29/Nov/2017:17:05:40] "GET /toplevel_data HTTP/1.1" 200 201 "" "Python-urllib/2.7"
17:05:43.393 /var/log/messages: Nov 29 17:05:41 s247 ceph-mgr[7365]: ::ffff:192.10.10.247 - - [29/Nov/2017:17:05:41] "GET /rbd_pool_data/mirror HTTP/1.1" 200 304 "" "Python-urllib/2.7"
17:05:43.393 /var/log/messages: Nov 29 17:05:41 s247 ceph-mgr[7365]: ::ffff:192.10.10.247 - - [29/Nov/2017:17:05:41] "GET /rbd_pool_data/rbd HTTP/1.1" 200 2195 "" "Python-urllib/2.7"
17:05:43.393
17:05:44.010 6 0.00 0.00 0 0.00 0.000 0.000 0.000 0.000 0.000 2.0 31.7 4.0
17:05:45.049 7 0.00 0.00 0 0.00 0.000 0.000 0.000 0.000 0.000 2.0 26.4 2.9
17:05:46.048 8 0.00 0.00 0 0.00 0.000 0.000 0.000 0.000 0.000 2.0 25.9 3.0
17:05:47.050 9 0.00 0.00 0 0.00 0.000 0.000 0.000 0.000 0.000 2.0 22.7 4.3
17:05:48.050 10 0.00 0.00 0 0.00 0.000 0.000 0.000 0.000 0.000 2.0 17.0 3.2
History
#1 Updated by Mykola Golub over 6 years ago
Recently Jason fixed a deadlock triggered in rbd-nbd by resize event [1]. Do you have a chance to try librbd from the recent master or rebuild with this patch [2] applied?
[1] http://tracker.ceph.com/issues/22131
[2] https://github.com/ceph/ceph/pull/18947/commits/6a335481d20c6a765c84d561a01fb52172eccba4
#2 Updated by Mykola Golub over 6 years ago
- Status changed from New to Need More Info
#3 Updated by wb song over 6 years ago
Ah, I think luminous has already backported this patch, it's still in the openning stat actually.
I rebuild with this patch1 applied on luminous and it works fine after many times test.
Please close this issue, thank you so much.
[1]https://github.com/ceph/ceph/pull/18947/commits/6a335481d20c6a765c84d561a01fb52172eccba4
#4 Updated by Jason Dillaman over 6 years ago
- Status changed from Need More Info to Duplicate