Bug #22957
[bluestore]bstore_kv_final thread seems deadlock
0%
Description
ceph 12.2.1
ec overwrite
cephfs performance test
pool 2 'fs_data' erasure size 3 min_size 3 crush_rule 1 object_hash rjenkins pg_num 1 pgp_num 1 last_change 57 flags hashpspool,ec_overwrites stripe_width 8192 fast_read 1 application cephfs
pool 3 'fs_meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 last_change 57 flags hashpspool stripe_width 0 application cephfs
[root@node2 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-9 3.00000 root test
-10 1.00000 host test-node1
0 hdd 1.00000 osd.0 up 1.00000 1.00000
-11 1.00000 host test-node2
4 hdd 1.00000 osd.4 up 1.00000 1.00000
-12 1.00000 host test-node3
5 hdd 1.00000 osd.5 up 1.00000 1.00000
-1 9.00000 root default
-3 3.00000 host node1
0 hdd 1.00000 osd.0 up 1.00000 1.00000
3 hdd 1.00000 osd.3 up 1.00000 1.00000
6 hdd 1.00000 osd.6 up 1.00000 1.00000
-4 3.00000 host node2
1 hdd 1.00000 osd.1 up 1.00000 1.00000
4 hdd 1.00000 osd.4 up 1.00000 1.00000
7 hdd 1.00000 osd.7 up 1.00000 1.00000
-5 3.00000 host node3
2 hdd 1.00000 osd.2 up 1.00000 1.00000
5 hdd 1.00000 osd.5 up 1.00000 1.00000
8 hdd 1.00000 osd.8 up 1.00000 1.00000
[root@node2 ~]#
I found the test suit was blocked. Then I put some obj to fs_data pool, and the IO was blocked.
I trcace the op and find the ec sub op was blocked by osd.1(op seq: 799599, see attachment).
From the calltrace, it seems that the bstore_kv_final thread deadlock.
This problem happened twice and diffcult reproduce
info thread
Id Target Id Frame
66 Thread 0x7f4a18fd4700 (LWP 13527) "log" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
65 Thread 0x7f4a185c7700 (LWP 13825) "msgr-worker-0" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
64 Thread 0x7f4a17dc6700 (LWP 13826) "msgr-worker-1" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
63 Thread 0x7f4a175c5700 (LWP 13828) "msgr-worker-2" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
62 Thread 0x7f4a16dc4700 (LWP 13831) "msgr-worker-3" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
61 Thread 0x7f4a165c3700 (LWP 13834) "msgr-worker-4" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
60 Thread 0x7f4a15dc2700 (LWP 13837) "msgr-worker-5" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
59 Thread 0x7f4a155c1700 (LWP 13840) "msgr-worker-6" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
58 Thread 0x7f4a14dc0700 (LWP 13843) "msgr-worker-7" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
57 Thread 0x7f4a145bf700 (LWP 13845) "msgr-worker-8" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
56 Thread 0x7f4a13dbe700 (LWP 13849) "msgr-worker-9" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
55 Thread 0x7f4a135bd700 (LWP 13850) "msgr-worker-10" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
54 Thread 0x7f4a12dbc700 (LWP 13853) "msgr-worker-11" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
53 Thread 0x7f4a125bb700 (LWP 13855) "msgr-worker-12" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
52 Thread 0x7f4a11dba700 (LWP 13858) "msgr-worker-13" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
51 Thread 0x7f4a115b9700 (LWP 13861) "msgr-worker-14" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
50 Thread 0x7f4a10db8700 (LWP 13863) "msgr-worker-15" 0x00007f4a19f699b3 in epoll_wait () from /lib64/libc.so.6
49 Thread 0x7f4a0fe40700 (LWP 13893) "service" 0x00007f4a1ae78ab2 in pthread_cond_timedwait
48 Thread 0x7f4a0f63f700 (LWP 13896) "admin_socket" 0x00007f4a19f5ee0d in poll () from /lib64/libc.so.6
47 Thread 0x7f4a0e5f5700 (LWP 14124) "ceph-osd" 0x00007f4a1ae78ab2 in pthread_cond_timedwait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
46 Thread 0x7f4a0ddf4700 (LWP 14128) "safe_timer" 0x00007f4a1ae78ab2 in pthread_cond_timedwait
45 Thread 0x7f4a0d5f3700 (LWP 14131) "safe_timer" 0x00007f4a1ae78ab2 in pthread_cond_timedwait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
44 Thread 0x7f4a0cdf2700 (LWP 14133) "safe_timer" 0x00007f4a1ae78705 in pthread_cond_wait
43 Thread 0x7f4a0c5f1700 (LWP 14136) "safe_timer" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
42 Thread 0x7f4a0bdf0700 (LWP 14139) "bstore_aio" 0x00007f4a1cdab644 in __io_getevents_0_4 () from /lib64/libaio.so.1
41 Thread 0x7f4a0b5ef700 (LWP 14144) "bstore_aio" 0x00007f4a1cdab644 in __io_getevents_0_4 () from /lib64/libaio.so.1
40 Thread 0x7f4a035df700 (LWP 17234) "dfin" 0x00007f4a1ae78705 in pthread_cond_wait
39 Thread 0x7f4a03de0700 (LWP 17235) "finisher" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
38 Thread 0x7f4a045e1700 (LWP 17236) "bstore_kv_sync" 0x00007f4a1ae78705 in pthread_cond_wait
37 Thread 0x7f4a04de2700 (LWP 17237) "bstore_kv_final" 0x00007f4a1ae7af7d in __lll_lock_wait () from /lib64/libpthread.so.0
36 Thread 0x7f4a0adee700 (LWP 17238) "bstore_mempool" 0x00007f4a1ae78ab2 in pthread_cond_timedwait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
35 Thread 0x7f4a08687700 (LWP 17243) "ms_dispatch" 0x00007f4a1ae78705 in pthread_cond_wait
34 Thread 0x7f4a07e86700 (LWP 17244) "ms_local" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
33 Thread 0x7f4a07685700 (LWP 17245) "ms_dispatch" 0x00007f4a1ae78705 in pthread_cond_wait
32 Thread 0x7f4a06e84700 (LWP 17246) "ms_local" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
31 Thread 0x7f4a06683700 (LWP 17247) "ms_dispatch" 0x00007f4a1ae78705 in pthread_cond_wait
30 Thread 0x7f4a05e82700 (LWP 17248) "ms_local" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
29 Thread 0x7f4a05681700 (LWP 17249) "ms_dispatch" 0x00007f4a1ae78705 in pthread_cond_wait
28 Thread 0x7f4a02dde700 (LWP 17250) "ms_local" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
27 Thread 0x7f4a025dd700 (LWP 17251) "ms_dispatch" 0x00007f4a1ae78705 in pthread_cond_wait
26 Thread 0x7f4a01ddc700 (LWP 17252) "ms_local" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
25 Thread 0x7f4a015db700 (LWP 17253) "ms_dispatch" 0x00007f4a1ae78705 in pthread_cond_wait
24 Thread 0x7f4a00dda700 (LWP 17254) "ms_local" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
23 Thread 0x7f4a005d9700 (LWP 17255) "ms_dispatch" 0x00007f4a1ae78705 in pthread_cond_wait
22 Thread 0x7f49ffdd8700 (LWP 17256) "ms_local" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
21 Thread 0x7f49ff5d7700 (LWP 17258) "safe_timer" 0x00007f4a1ae78ab2 in pthread_cond_timedwait
20 Thread 0x7f49fedd6700 (LWP 17259) "fn_anonymous" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
19 Thread 0x7f49fe5d5700 (LWP 17260) "safe_timer" 0x00007f4a1ae78ab2 in pthread_cond_timedwait
18 Thread 0x7f49fddd4700 (LWP 17261) "tp_peering" 0x00007f4a1ae78ab2 in pthread_cond_timedwait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
17 Thread 0x7f49fd5d3700 (LWP 17262) "tp_peering" 0x00007f4a1ae78ab2 in pthread_cond_timedwait
16 Thread 0x7f49fcdd2700 (LWP 17263) "tp_osd_tp" 0x00007f4a1ae78ab2 in pthread_cond_timedwait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
15 Thread 0x7f49fc5d1700 (LWP 17264) "tp_osd_tp" 0x00007f4a1ae78ab2 in pthread_cond_timedwait
14 Thread 0x7f49fbdd0700 (LWP 17265) "tp_osd_disk" 0x00007f4a1ae78ab2 in pthread_cond_timedwait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
13 Thread 0x7f49fb5cf700 (LWP 17266) "tp_osd_cmd" 0x00007f4a1ae78ab2 in pthread_cond_timedwait
12 Thread 0x7f49fadce700 (LWP 17267) "osd_srv_heartbt" 0x00007f4a1ae78ab2 in pthread_cond_timedwait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
11 Thread 0x7f49fa5cd700 (LWP 17268) "fn_anonymous" 0x00007f4a1ae78705 in pthread_cond_wait
10 Thread 0x7f49f9dcc700 (LWP 17269) "fn_anonymous" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
9 Thread 0x7f49f95cb700 (LWP 17270) "safe_timer" 0x00007f4a1ae78705 in pthread_cond_wait
8 Thread 0x7f49f8dca700 (LWP 17271) "safe_timer" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
7 Thread 0x7f49f85c9700 (LWP 17272) "safe_timer" 0x00007f4a1ae78705 in pthread_cond_wait
6 Thread 0x7f49f7dc8700 (LWP 17273) "safe_timer" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
5 Thread 0x7f49f75c7700 (LWP 17274) "osd_srv_agent" 0x00007f4a1ae78705 in pthread_cond_wait
4 Thread 0x7f49f6dc6700 (LWP 17276) "signal_handler" 0x00007f4a19f5ee0d in poll () from /lib64/libc.so.6
3 Thread 0x7f49f65c5700 (LWP 17710) "rocksdb:bg0" 0x00007f4a1ae78705 in pthread_cond_wait@GLIBC_2.3.2 () from /lib64/libpthread.so.0
@GLIBC_2.3.2 () from /lib64/libpthread.so.0
2 Thread 0x7f49f5dc4700 (LWP 11756) "rocksdb:bg0" 0x00007f4a1ae78705 in pthread_cond_wait
1 Thread 0x7f4a1d822d00 (LWP 13517) "ceph-osd" 0x00007f4a1ae78705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
thread 37
[Switching to thread 37 (Thread 0x7f4a04de2700 (LWP 17237))]
#0 0x00007f4a1ae7af7d in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) where
#0 0x00007f4a1ae7af7d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f4a1ae76d41 in _L_lock_790 () from /lib64/libpthread.so.0
#2 0x00007f4a1ae76c47 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f4a1e2a3048 in Mutex::Lock(bool) ()
#4 0x00007f4a1e141da1 in BlueStore::_txc_committed_kv(BlueStore::TransContext*) ()
#5 0x00007f4a1e15cff1 in BlueStore::_txc_state_proc(BlueStore::TransContext*) ()
#6 0x00007f4a1e15e960 in BlueStore::_kv_finalize_thread() ()
#7 0x00007f4a1e1b4acd in BlueStore::KVFinalizeThread::entry() ()
#8 0x00007f4a1ae74df3 in start_thread () from /lib64/libpthread.so.0
#9 0x00007f4a19f693dd in clone () from /lib64/libc.so.6
Related issues
History
#1 Updated by Nathan Cutler about 6 years ago
- Tracker changed from Tasks to Bug
- Project changed from Stable releases to bluestore
- Regression set to No
- Severity set to 3 - minor
#2 Updated by Sage Weil about 6 years ago
- Status changed from New to Duplicate
I'm pretty sure this is #21470, fixed in 12.2.2. Please upgrade!
#3 Updated by Sage Weil about 6 years ago
- Related to Bug #21470: Ceph OSDs crashing in BlueStore::queue_transactions() using EC after applying fix added
#4 Updated by Adam Kupczyk about 6 years ago
Hi Zhou,
1) Could you next time attach with gdb and "bt" of threads bstore_kv_final and finisher.
2) Are you working on dpdk capable H/W ?
Regards,
Adam
#5 Updated by zhou yang about 6 years ago
Sage Weil wrote:
I'm pretty sure this is #21470, fixed in 12.2.2. Please upgrade!
Thanks a lot, I will upgrade later.
#6 Updated by zhou yang about 6 years ago
Adam Kupczyk wrote:
Hi Zhou,
1) Could you next time attach with gdb and "bt" of threads bstore_kv_final and finisher.
2) Are you working on dpdk capable H/W ?
Regards,
Adam
OK,I will attach them next time.
I am not working on dpdk capable H/W.