Project

General

Profile

Actions

Bug #19897

closed

rbd maybe pending in 99% when remove a clone image

Added by Tang Jin almost 7 years ago. Updated over 6 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Prerequisite: rbd_op_threads is 3 and rbd_cache is disable

When rbd removes a clone image, it is possible that rbd cmd pends in 99%.

If rbd removes a clone image, it will delete its parent ImageCtx at first including to delete its op_work_queue, it will be waiting until the ThreadPool finishes all its jobs because of 3 threads.
But the matter is that this 'drain' operation happens in ThreadPool::worker context itself, so the ThreadPool will never finish by itself.

Actions #1

Updated by Tang Jin almost 7 years ago

these are the log when enable tp=15

root@node1:jintang$ rbd rm test_pool/test_child
2017-05-10 10:37:43.889802 7f381283ed40 10 librbd::thread_pool start
2017-05-10 10:37:43.889807 7f381283ed40 10 librbd::thread_pool registering config observer on rbd_op_threads
2017-05-10 10:37:43.889812 7f381283ed40 10 librbd::thread_pool start_threads creating and starting 0x561767abe700
2017-05-10 10:37:43.889874 7f381283ed40 10 librbd::thread_pool start_threads creating and starting 0x561767af6880
2017-05-10 10:37:43.890051 7f381283ed40 10 librbd::thread_pool start_threads creating and starting 0x561767af6b50
2017-05-10 10:37:43.890074 7f381283ed40 15 librbd::thread_pool started
2017-05-10 10:37:43.890099 7f37ef7fe700 10 librbd::thread_pool worker start
2017-05-10 10:37:43.890129 7f37effff700 10 librbd::thread_pool worker start
2017-05-10 10:37:43.890148 7f37f4e05700 10 librbd::thread_pool worker start
2017-05-10 10:37:43.914399 7f37ef7fe700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37d8003920 (1 active)
2017-05-10 10:37:43.914422 7f37ef7fe700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x7f37d8003920 (0 active)
2017-05-10 10:37:43.914425 7f37ef7fe700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37e4000fd0 (1 active)
2017-05-10 10:37:43.914434 7f37f4e05700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37d8003ad0 (2 active)
2017-05-10 10:37:43.914440 7f37effff700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37d80032d0 (3 active)
2017-05-10 10:37:43.914446 7f37ef7fe700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x7f37e4000fd0 (2 active)
2017-05-10 10:37:43.914452 7f37f4e05700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x7f37d8003ad0 (1 active)
2017-05-10 10:37:43.914466 7f37ef7fe700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37dc000df0 (2 active)
2017-05-10 10:37:43.914471 7f37effff700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x7f37d80032d0 (1 active)
2017-05-10 10:37:43.914525 7f37ef7fe700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x7f37dc000df0 (0 active)
2017-05-10 10:37:43.914865 7f37effff700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37d800aa00 (1 active)
2017-05-10 10:37:43.914873 7f37effff700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x7f37d800aa00 (0 active)
2017-05-10 10:37:43.914873 7f37effff700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37dc000d80 (1 active)
2017-05-10 10:37:43.914880 7f37effff700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x7f37dc000d80 (0 active)
2017-05-10 10:37:43.914881 7f37effff700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37d8002040 (1 active)
2017-05-10 10:37:43.914885 7f37effff700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x7f37d8002040 (0 active)
2017-05-10 10:37:43.914885 7f37effff700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x561767af7370 (1 active)
2017-05-10 10:37:43.914907 7f37effff700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x561767af7370 (0 active)
2017-05-10 10:37:43.914978 7f37ef7fe700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x561767af78f0 (1 active)
2017-05-10 10:37:43.914988 7f37ef7fe700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x561767af78f0 (0 active)
2017-05-10 10:37:43.914989 7f37ef7fe700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37e40018e0 (1 active)
2017-05-10 10:37:43.915044 7f37ef7fe700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x7f37e40018e0 (0 active)
2017-05-10 10:37:43.933738 7f37f4e05700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x561767af7350 (1 active)
2017-05-10 10:37:43.934303 7f37f4e05700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x561767af7350 (0 active)
Removing image: 99% complete...2017-05-10 10:37:43.937271 7f37ef7fe700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x561767af78f0 (1 active)
2017-05-10 10:37:43.937304 7f37ef7fe700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x561767af78f0 (0 active)
2017-05-10 10:37:43.937305 7f37ef7fe700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37e4002df0 (1 active)
2017-05-10 10:37:43.937316 7f37ef7fe700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x7f37e4002df0 (0 active)
2017-05-10 10:37:43.946777 7f37f4e05700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x561767a65360 (1 active)
2017-05-10 10:37:43.946845 7f37f4e05700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x561767a65360 (0 active)
2017-05-10 10:37:43.961223 7f37effff700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37e0003b60 (1 active)
2017-05-10 10:37:43.961230 7f37effff700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x7f37e0003b60 (0 active)
2017-05-10 10:37:43.961231 7f37effff700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37dc000e40 (1 active)
2017-05-10 10:37:43.961239 7f37effff700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x7f37dc000e40 (0 active)
2017-05-10 10:37:43.961239 7f37effff700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37dc001c30 (1 active)
2017-05-10 10:37:43.961264 7f37effff700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x7f37dc001c30 (0 active)
2017-05-10 10:37:43.961265 7f37effff700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37dc001f00 (1 active)
2017-05-10 10:37:43.961286 7f37f4e05700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37dc001f50 (2 active)
2017-05-10 10:37:43.961296 7f37effff700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x7f37dc001f00 (1 active)
2017-05-10 10:37:43.961339 7f37f4e05700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x7f37dc001f50 (0 active)
2017-05-10 10:37:43.980598 7f37effff700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37dc001c50 (1 active)
2017-05-10 10:37:43.980657 7f37effff700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x7f37dc001c50 (0 active)
2017-05-10 10:37:43.980659 7f37effff700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37dc000dd0 (1 active)
2017-05-10 10:37:43.980663 7f37effff700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x7f37dc000dd0 (0 active)
2017-05-10 10:37:43.980683 7f37effff700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37dc000df0 (1 active)
2017-05-10 10:37:43.980694 7f37ef7fe700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37dc000d80 (2 active)
2017-05-10 10:37:43.980726 7f37f4e05700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37e4001860 (3 active)
2017-05-10 10:37:43.980736 7f37f4e05700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x7f37e4001860 (2 active)
2017-05-10 10:37:43.980738 7f37f4e05700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37e4002df0 (3 active)
2017-05-10 10:37:43.980742 7f37f4e05700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x7f37e4002df0 (2 active)
2017-05-10 10:37:43.980743 7f37f4e05700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37e0003ee0 (3 active)
2017-05-10 10:37:43.980747 7f37ef7fe700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x7f37dc000d80 (2 active)
2017-05-10 10:37:43.980755 7f37ef7fe700 12 tp_librbd worker wq librbd::op_work_queue start processing 0x7f37dc001f50 (3 active)
2017-05-10 10:37:43.980760 7f37ef7fe700 10 librbd::thread_pool drain
2017-05-10 10:37:43.980760 7f37f4e05700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x7f37e0003ee0 (2 active)
2017-05-10 10:37:43.980762 7f37effff700 15 tp_librbd worker wq librbd::op_work_queue done processing 0x7f37dc000df0 (1 active)

Actions #3

Updated by Nathan Cutler almost 7 years ago

  • Project changed from Ceph to rbd
  • Category deleted (librbd)
Actions #4

Updated by Nathan Cutler almost 7 years ago

  • Status changed from New to Fix Under Review
Actions #5

Updated by Jason Dillaman almost 7 years ago

  • Priority changed from High to Normal
  • Severity changed from 2 - major to 3 - minor

Note: op work threads are currently hard-coded to 1.

Actions #6

Updated by Jason Dillaman over 6 years ago

  • Status changed from Fix Under Review to Duplicate

Multithread issues are being tracked under #17379

Actions

Also available in: Atom PDF