Project

General

Profile

Bug #22957

[bluestore]bstore_kv_final thread seems deadlock

Added by zhou yang about 6 years ago. Updated about 6 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph 12.2.1
ec overwrite
cephfs performance test

pool 2 'fs_data' erasure size 3 min_size 3 crush_rule 1 object_hash rjenkins pg_num 1 pgp_num 1 last_change 57 flags hashpspool,ec_overwrites stripe_width 8192 fast_read 1 application cephfs
pool 3 'fs_meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 last_change 57 flags hashpspool stripe_width 0 application cephfs

[root@node2 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-9 3.00000 root test
-10 1.00000 host test-node1
0 hdd 1.00000 osd.0 up 1.00000 1.00000
-11 1.00000 host test-node2
4 hdd 1.00000 osd.4 up 1.00000 1.00000
-12 1.00000 host test-node3
5 hdd 1.00000 osd.5 up 1.00000 1.00000
-1 9.00000 root default
-3 3.00000 host node1
0 hdd 1.00000 osd.0 up 1.00000 1.00000
3 hdd 1.00000 osd.3 up 1.00000 1.00000
6 hdd 1.00000 osd.6 up 1.00000 1.00000
-4 3.00000 host node2
1 hdd 1.00000 osd.1 up 1.00000 1.00000
4 hdd 1.00000 osd.4 up 1.00000 1.00000
7 hdd 1.00000 osd.7 up 1.00000 1.00000
-5 3.00000 host node3
2 hdd 1.00000 osd.2 up 1.00000 1.00000
5 hdd 1.00000 osd.5 up 1.00000 1.00000
8 hdd 1.00000 osd.8 up 1.00000 1.00000
[root@node2 ~]#

I found the test suit was blocked. Then I put some obj to fs_data pool, and the IO was blocked.
I trcace the op and find the ec sub op was blocked by osd.1(op seq: 799599, see attachment).
From the calltrace, it seems that the bstore_kv_final thread deadlock.
This problem happened twice and diffcult reproduce

info thread
Id Target Id Frame
66 Thread 0x7f4a18fd4700 (LWP 13527) "log" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
65 Thread 0x7f4a185c7700 (LWP 13825) "msgr-worker-0" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
64 Thread 0x7f4a17dc6700 (LWP 13826) "msgr-worker-1" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
63 Thread 0x7f4a175c5700 (LWP 13828) "msgr-worker-2" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
62 Thread 0x7f4a16dc4700 (LWP 13831) "msgr-worker-3" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
61 Thread 0x7f4a165c3700 (LWP 13834) "msgr-worker-4" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
60 Thread 0x7f4a15dc2700 (LWP 13837) "msgr-worker-5" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
59 Thread 0x7f4a155c1700 (LWP 13840) "msgr-worker-6" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
58 Thread 0x7f4a14dc0700 (LWP 13843) "msgr-worker-7" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
57 Thread 0x7f4a145bf700 (LWP 13845) "msgr-worker-8" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
56 Thread 0x7f4a13dbe700 (LWP 13849) "msgr-worker-9" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
55 Thread 0x7f4a135bd700 (LWP 13850) "msgr-worker-10" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
54 Thread 0x7f4a12dbc700 (LWP 13853) "msgr-worker-11" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
53 Thread 0x7f4a125bb700 (LWP 13855) "msgr-worker-12" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
52 Thread 0x7f4a11dba700 (LWP 13858) "msgr-worker-13" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
51 Thread 0x7f4a115b9700 (LWP 13861) "msgr-worker-14" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
50 Thread 0x7f4a10db8700 (LWP 13863) "msgr-worker-15" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
49 Thread 0x7f4a0fe40700 (LWP 13893) "service" 0x00007f4a1ae78ab2 in pthread_cond_timedwait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
48 Thread 0x7f4a0f63f700 (LWP 13896) "admin_socket" 0x00007f4a19f5ee0d in poll () from /lib64/libc.so.6
47 Thread 0x7f4a0e5f5700 (LWP 14124) "ceph-osd" 0x00007f4a1ae78ab2 in pthread_cond_timedwait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
46 Thread 0x7f4a0ddf4700 (LWP 14128) "safe_timer" 0x00007f4a1ae78ab2 in pthread_cond_timedwait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
45 Thread 0x7f4a0d5f3700 (LWP 14131) "safe_timer" 0x00007f4a1ae78ab2 in pthread_cond_timedwait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
44 Thread 0x7f4a0cdf2700 (LWP 14133) "safe_timer" 0x00007f4a1ae78705 in pthread_cond_wait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
43 Thread 0x7f4a0c5f1700 (LWP 14136) "safe_timer" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
42 Thread 0x7f4a0bdf0700 (LWP 14139) "bstore_aio" 0x00007f4a1cdab644 in __io_getevents_0_4 () from /lib64/libaio.so.1
41 Thread 0x7f4a0b5ef700 (LWP 14144) "bstore_aio" 0x00007f4a1cdab644 in __io_getevents_0_4 () from /lib64/libaio.so.1
40 Thread 0x7f4a035df700 (LWP 17234) "dfin" 0x00007f4a1ae78705 in pthread_cond_wait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
39 Thread 0x7f4a03de0700 (LWP 17235) "finisher" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
38 Thread 0x7f4a045e1700 (LWP 17236) "bstore_kv_sync" 0x00007f4a1ae78705 in pthread_cond_wait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
37 Thread 0x7f4a04de2700 (LWP 17237) "bstore_kv_final" 0x00007f4a1ae7af7d in __lll_lock_wait () from /lib64/libpthread.so.0
36 Thread 0x7f4a0adee700 (LWP 17238) "bstore_mempool" 0x00007f4a1ae78ab2 in pthread_cond_timedwait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
35 Thread 0x7f4a08687700 (LWP 17243) "ms_dispatch" 0x00007f4a1ae78705 in pthread_cond_wait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
34 Thread 0x7f4a07e86700 (LWP 17244) "ms_local" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
33 Thread 0x7f4a07685700 (LWP 17245) "ms_dispatch" 0x00007f4a1ae78705 in pthread_cond_wait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
32 Thread 0x7f4a06e84700 (LWP 17246) "ms_local" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
31 Thread 0x7f4a06683700 (LWP 17247) "ms_dispatch" 0x00007f4a1ae78705 in pthread_cond_wait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
30 Thread 0x7f4a05e82700 (LWP 17248) "ms_local" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
29 Thread 0x7f4a05681700 (LWP 17249) "ms_dispatch" 0x00007f4a1ae78705 in pthread_cond_wait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
28 Thread 0x7f4a02dde700 (LWP 17250) "ms_local" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
27 Thread 0x7f4a025dd700 (LWP 17251) "ms_dispatch" 0x00007f4a1ae78705 in pthread_cond_wait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
26 Thread 0x7f4a01ddc700 (LWP 17252) "ms_local" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
25 Thread 0x7f4a015db700 (LWP 17253) "ms_dispatch" 0x00007f4a1ae78705 in pthread_cond_wait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
24 Thread 0x7f4a00dda700 (LWP 17254) "ms_local" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
23 Thread 0x7f4a005d9700 (LWP 17255) "ms_dispatch" 0x00007f4a1ae78705 in pthread_cond_wait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
22 Thread 0x7f49ffdd8700 (LWP 17256) "ms_local" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
21 Thread 0x7f49ff5d7700 (LWP 17258) "safe_timer" 0x00007f4a1ae78ab2 in pthread_cond_timedwait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
20 Thread 0x7f49fedd6700 (LWP 17259) "fn_anonymous" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
19 Thread 0x7f49fe5d5700 (LWP 17260) "safe_timer" 0x00007f4a1ae78ab2 in pthread_cond_timedwait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
18 Thread 0x7f49fddd4700 (LWP 17261) "tp_peering" 0x00007f4a1ae78ab2 in pthread_cond_timedwait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
17 Thread 0x7f49fd5d3700 (LWP 17262) "tp_peering" 0x00007f4a1ae78ab2 in pthread_cond_timedwait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
16 Thread 0x7f49fcdd2700 (LWP 17263) "tp_osd_tp" 0x00007f4a1ae78ab2 in pthread_cond_timedwait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
15 Thread 0x7f49fc5d1700 (LWP 17264) "tp_osd_tp" 0x00007f4a1ae78ab2 in pthread_cond_timedwait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
14 Thread 0x7f49fbdd0700 (LWP 17265) "tp_osd_disk" 0x00007f4a1ae78ab2 in pthread_cond_timedwait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
13 Thread 0x7f49fb5cf700 (LWP 17266) "tp_osd_cmd" 0x00007f4a1ae78ab2 in pthread_cond_timedwait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
12 Thread 0x7f49fadce700 (LWP 17267) "osd_srv_heartbt" 0x00007f4a1ae78ab2 in pthread_cond_timedwait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
11 Thread 0x7f49fa5cd700 (LWP 17268) "fn_anonymous" 0x00007f4a1ae78705 in pthread_cond_wait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
10 Thread 0x7f49f9dcc700 (LWP 17269) "fn_anonymous" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
9 Thread 0x7f49f95cb700 (LWP 17270) "safe_timer" 0x00007f4a1ae78705 in pthread_cond_wait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
8 Thread 0x7f49f8dca700 (LWP 17271) "safe_timer" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
7 Thread 0x7f49f85c9700 (LWP 17272) "safe_timer" 0x00007f4a1ae78705 in pthread_cond_wait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
6 Thread 0x7f49f7dc8700 (LWP 17273) "safe_timer" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
5 Thread 0x7f49f75c7700 (LWP 17274) "osd_srv_agent" 0x00007f4a1ae78705 in pthread_cond_wait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
4 Thread 0x7f49f6dc6700 (LWP 17276) "signal_handler" 0x00007f4a19f5ee0d in poll () from /lib64/libc.so.6
3 Thread 0x7f49f65c5700 (LWP 17710) "rocksdb:bg0" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
2 Thread 0x7f49f5dc4700 (LWP 11756) "rocksdb:bg0" 0x00007f4a1ae78705 in pthread_cond_wait
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
1 Thread 0x7f4a1d822d00 (LWP 13517) "ceph-osd" 0x00007f4a1ae78705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

thread 37
[Switching to thread 37 (Thread 0x7f4a04de2700 (LWP 17237))]
#0 0x00007f4a1ae7af7d in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) where
#0 0x00007f4a1ae7af7d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f4a1ae76d41 in _L_lock_790 () from /lib64/libpthread.so.0
#2 0x00007f4a1ae76c47 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f4a1e2a3048 in Mutex::Lock(bool) ()
#4 0x00007f4a1e141da1 in BlueStore::_txc_committed_kv(BlueStore::TransContext*) ()
#5 0x00007f4a1e15cff1 in BlueStore::_txc_state_proc(BlueStore::TransContext*) ()
#6 0x00007f4a1e15e960 in BlueStore::_kv_finalize_thread() ()
#7 0x00007f4a1e1b4acd in BlueStore::KVFinalizeThread::entry() ()
#8 0x00007f4a1ae74df3 in start_thread () from /lib64/libpthread.so.0
#9 0x00007f4a19f693dd in clone () from /lib64/libc.so.6

ec-subop-tracker-blocked.log View (78.7 KB) zhou yang, 02/08/2018 07:43 AM


Related issues

Related to RADOS - Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix Resolved 09/20/2017

History

#1 Updated by Nathan Cutler about 6 years ago

  • Tracker changed from Tasks to Bug
  • Project changed from Stable releases to bluestore
  • Regression set to No
  • Severity set to 3 - minor

#2 Updated by Sage Weil about 6 years ago

  • Status changed from New to Duplicate

I'm pretty sure this is #21470, fixed in 12.2.2. Please upgrade!

#3 Updated by Sage Weil about 6 years ago

  • Related to Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix added

#4 Updated by Adam Kupczyk about 6 years ago

Hi Zhou,
1) Could you next time attach with gdb and "bt" of threads bstore_kv_final and finisher.
2) Are you working on dpdk capable H/W ?
Regards,
Adam

#5 Updated by zhou yang about 6 years ago

Sage Weil wrote:

I'm pretty sure this is #21470, fixed in 12.2.2. Please upgrade!

Thanks a lot, I will upgrade later.

#6 Updated by zhou yang about 6 years ago

Adam Kupczyk wrote:

Hi Zhou,
1) Could you next time attach with gdb and "bt" of threads bstore_kv_final and finisher.
2) Are you working on dpdk capable H/W ?
Regards,
Adam

OK,I will attach them next time.
I am not working on dpdk capable H/W.

Also available in: Atom PDF