Project

General

Profile

Bug #22271

vdbench's IO drop to 0 when resize the image at the same time

Added by wb song over 6 years ago. Updated over 6 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description


[root~ ]# rbd create test1 -s 20G --image-feature exclusive-lock
[root~ ]# rbd nbd map test1
/dev/nbd0
[root~ ]# 

Use Vdbench to write in /dev/nbd0 and on another client try to execute rbd resize test1 -s 50G, then the Vdbench's IOPS drop to 0 and resize hung also. The Vdbench write script as follows:

hd=default,vdbench=/home/vdbench,user=root,shell=ssh
hd=hd1,system=localhost

sd=sd1,host=hd1,lun=/dev/nbd0,openflags=o_direct,thread=64

wd=wd1,sd=sd*,seekpct=100,rdpct=0,xfersize=4K
rd=rd1,wd=wd1,iorate=max,elapse=600,maxdata=2T,interval=1,warmup=30

It appears only image exclusive-lock was enabled and Vdbench thread > 1. My environment is the luminous branch(12.2.1-830-gecec659) and is easy to reproduce, It works fine on the latest master branch.
Also, I try to reproduce it by rbd bench、fio and dd, but all have succeed. I'm not understand why only Vdbench can got this? Maybe there exist a potential deadlock?

[root~ ]# ./vdbench -f 1M_rw_nbd 

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
Vdbench distribution: vdbench Wed July 20 15:49:52 MDT 2016
For documentation, see 'vdbench.pdf'.

17:05:36.294 input argument scanned: '-f1M_rw_nbd'
17:05:36.379 Starting slave: /home/vdbench/vdbench SlaveJvm -m 192.175.10.234 -n localhost-10-171129-17.05.36.252 -l hd1-0 -p 5570   
17:05:36.742 All slaves are now connected
17:05:38.003 Starting RD=rd1; I/O rate: Uncontrolled MAX; elapsed=600 warmup=30; For loops: None

Nov 29, 2017  interval        i/o   MB/sec   bytes   read     resp     read    write     resp     resp queue  cpu%  cpu%
                             rate  1024**2     i/o    pct     time     resp     resp      max   stddev depth sys+u   sys
17:05:39.055         1      33.00     0.13    4096   0.00   41.067    0.000   41.067   98.463   25.709   1.4  25.0   4.1
17:05:40.049         2      52.00     0.20    4096   0.00   37.399    0.000   37.399   93.148   20.634   2.0  22.0   5.1
17:05:41.049         3      40.00     0.16    4096   0.00   48.861    0.000   48.861  105.764   24.070   2.0  40.8   8.9
17:05:42.048         4       0.00     0.00       0   0.00    0.000    0.000    0.000    0.000    0.000   2.0  35.5   6.9
17:05:43.048         5       0.00     0.00       0   0.00    0.000    0.000    0.000    0.000    0.000   2.0  32.5   7.3
17:05:43.393 
17:05:43.393 Message from slave hd1-0: 
17:05:43.393 New messages found on /var/adm/messages. Do they belong to you?
17:05:43.393 /var/log/messages: Nov 29 17:05:40 s247 ceph-mgr[7365]: ::ffff:192.10.10.247 - - [29/Nov/2017:17:05:40] "GET /toplevel_data HTTP/1.1" 200 201 "" "Python-urllib/2.7" 
17:05:43.393 /var/log/messages: Nov 29 17:05:41 s247 ceph-mgr[7365]: ::ffff:192.10.10.247 - - [29/Nov/2017:17:05:41] "GET /rbd_pool_data/mirror HTTP/1.1" 200 304 "" "Python-urllib/2.7" 
17:05:43.393 /var/log/messages: Nov 29 17:05:41 s247 ceph-mgr[7365]: ::ffff:192.10.10.247 - - [29/Nov/2017:17:05:41] "GET /rbd_pool_data/rbd HTTP/1.1" 200 2195 "" "Python-urllib/2.7" 
17:05:43.393 
17:05:44.010         6       0.00     0.00       0   0.00    0.000    0.000    0.000    0.000    0.000   2.0  31.7   4.0
17:05:45.049         7       0.00     0.00       0   0.00    0.000    0.000    0.000    0.000    0.000   2.0  26.4   2.9
17:05:46.048         8       0.00     0.00       0   0.00    0.000    0.000    0.000    0.000    0.000   2.0  25.9   3.0
17:05:47.050         9       0.00     0.00       0   0.00    0.000    0.000    0.000    0.000    0.000   2.0  22.7   4.3
17:05:48.050        10       0.00     0.00       0   0.00    0.000    0.000    0.000    0.000    0.000   2.0  17.0   3.2

History

#1 Updated by Mykola Golub over 6 years ago

Recently Jason fixed a deadlock triggered in rbd-nbd by resize event [1]. Do you have a chance to try librbd from the recent master or rebuild with this patch [2] applied?

[1] http://tracker.ceph.com/issues/22131
[2] https://github.com/ceph/ceph/pull/18947/commits/6a335481d20c6a765c84d561a01fb52172eccba4

#2 Updated by Mykola Golub over 6 years ago

  • Status changed from New to Need More Info

#3 Updated by wb song over 6 years ago

Ah, I think luminous has already backported this patch, it's still in the openning stat actually.
I rebuild with this patch1 applied on luminous and it works fine after many times test.
Please close this issue, thank you so much.
[1]https://github.com/ceph/ceph/pull/18947/commits/6a335481d20c6a765c84d561a01fb52172eccba4

#4 Updated by Jason Dillaman over 6 years ago

  • Status changed from Need More Info to Duplicate

Also available in: Atom PDF