Actions
Bug #20561
closedbluestore: segv in _deferred_submit_unlock from deferred_try_submit, _txc_finish
Status:
Can't reproduce
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2017-07-09T22:05:21.544 INFO:tasks.ceph.osd.4.smithi160.stderr:*** Caught signal (Segmentation fault) ** 2017-07-09T22:05:21.545 INFO:tasks.ceph.osd.4.smithi160.stderr: in thread 7f75760ae700 thread_name:bstore_kv_final 2017-07-09T22:05:21.545 INFO:tasks.ceph.osd.4.smithi160.stderr: ceph version 12.0.3-2680-g2977f4c (2977f4cc13470652c1f8223770b751d4f753c85d) luminous (rc) 2017-07-09T22:05:21.545 INFO:tasks.ceph.osd.4.smithi160.stderr: 1: (()+0x9d1c19) [0x7f758afeec19] 2017-07-09T22:05:21.545 INFO:tasks.ceph.osd.4.smithi160.stderr: 2: (()+0x10330) [0x7f7588b24330] 2017-07-09T22:05:21.545 INFO:tasks.ceph.osd.4.smithi160.stderr: 3: (BlueStore::_deferred_submit_unlock(BlueStore::OpSequencer*)+0x67d) [0x7f758aea842d] 2017-07-09T22:05:21.545 INFO:tasks.ceph.osd.4.smithi160.stderr: 4: (BlueStore::deferred_try_submit()+0x3f0) [0x7f758aec0820] 2017-07-09T22:05:21.545 INFO:tasks.ceph.osd.4.smithi160.stderr: 5: (BlueStore::_txc_finish(BlueStore::TransContext*)+0xad8) [0x7f758aec1438] 2017-07-09T22:05:21.546 INFO:tasks.ceph.osd.4.smithi160.stderr: 6: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x1f6) [0x7f758aed1eb6] 2017-07-09T22:05:21.546 INFO:tasks.ceph.osd.4.smithi160.stderr: 7: (BlueStore::_kv_finalize_thread()+0x618) [0x7f758aed3878] 2017-07-09T22:05:21.546 INFO:tasks.ceph.osd.4.smithi160.stderr: 8: (BlueStore::KVFinalizeThread::entry()+0xd) [0x7f758af26cad] 2017-07-09T22:05:21.546 INFO:tasks.ceph.osd.4.smithi160.stderr: 9: (()+0x8184) [0x7f7588b1c184] 2017-07-09T22:05:21.546 INFO:tasks.ceph.osd.4.smithi160.stderr: 10: (clone()+0x6d) [0x7f7587c0c37d] 2017-07-09T22:05:21.546 INFO:tasks.ceph.osd.4.smithi160.stderr:2017-07-09 22:05:21.523919 7f75760ae700 -1 *** Caught signal (Segmentation fault) ** 2017-07-09T22:05:21.546 INFO:tasks.ceph.osd.4.smithi160.stderr: in thread 7f75760ae700 thread_name:bstore_kv_final
/a/sage-2017-07-09_19:43:05-rados:thrash-wip-11793-distro-basic-smithi/1379576
no log or core unfortunately.
Updated by Sage Weil almost 7 years ago
- Priority changed from High to Urgent
This might be related to a failure reported on the list:
Date: Mon, 10 Jul 2017 08:06:36 +0000 From: Wangwenfeng <wang.wenfeng@h3c.com> To: "'ceph-devel@vger.kernel.org'" <ceph-devel@vger.kernel.org> Subject: osd down when run fio randwrite 4k using bluestore Hi, I setup a Ceph cluster of Luminous, it’s osd use bluestore and I create a cephfs, it’s metadata using replicated and data pool using erasure. pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 52 flags hashpspool stripe_width 0 pool 2 'EC_2_1_8' erasure size 3 min_size 2 crush_ruleset 2 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 56 flags hashpspool,ec_overwrites stripe_width 8192 expected_num_objects 27000000 When run fio to test the cluster, which command is fio --numjobs=16 --iodepth=16 --ioengine=libaio --runtime=600 --direct=1 --group_reporting --rw=randwrite --bs=4k --name=aa --filename=/ec/1.txt --size=500G for a while time, some osds is reported down. The bluestore log is 2017-07-08 11:29:33.948698 7fb8e7682700 20 bluefs _flush_and_sync_log cleaned file file(ino 30 size 0x2f44a1 mtime 2017-07-08 11:29:33.925066 bdev 1 extents [1:0x2300000+100000,1:0x2900000+200000]) 2017-07-08 11:29:33.948722 7fb8e7682700 20 bluestore(/var/lib/ceph/osd/ceph-3) _kv_sync_thread committed 1 cleaned 1 in 0.023813 (0.000012 flush + 0.023800 kv commit) 2017-07-08 11:29:33.948726 7fb8e7682700 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_state_proc txc 0x56545091e680 kv_submitted 2017-07-08 11:29:33.948728 7fb8e7682700 20 bluestore(/var/lib/ceph/osd/ceph-3) _txc_committed_kv txc 0x56545091e680 2017-07-08 11:29:33.948741 7fb8e7682700 20 bluestore(/var/lib/ceph/osd/ceph-3) _deferred_queue txc 0x56545091e680 osr 0x56544e271800 2017-07-08 11:29:33.948758 7fb8e7682700 20 bluestore.DeferredBatch(0x56544e2fc880) prepare_write seq 155 0xdc5f0000~4000 crc 9d8983a0 2017-07-08 11:29:33.948784 7fb8e7682700 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_state_proc txc 0x56545091eec0 deferred_cleanup 2017-07-08 11:29:33.948787 7fb8e7682700 20 bluestore(/var/lib/ceph/osd/ceph-3) _txc_finish 0x56545091eec0 onodes 0x5654516618c0 2017-07-08 11:29:33.948790 7fb8e7682700 20 bluestore.BufferSpace(0x565450608698 in 0x56544dc495e0) finish_write discard buffer(0x56545177e010 space 0x565450608698 0x0~80000 writing nocache) 2017-07-08 11:29:33.948804 7fb8e7682700 20 bluestore.BufferSpace(0x565450608858 in 0x56544dc495e0) finish_write discard buffer(0x56545177def0 space 0x565450608858 0x0~30000 writing nocache) 2017-07-08 11:29:33.948812 7fb8e7682700 20 bluestore.BufferSpace(0x565450609738 in 0x56544dc495e0) finish_write discard buffer(0x56545177df80 space 0x565450609738 0x0~80000 writing nocache) 2017-07-08 11:29:33.948822 7fb8e7682700 20 bluestore.BufferSpace(0x565450609c08 in 0x56544dc495e0) finish_write discard buffer(0x56545177de60 space 0x565450609c08 0x30000~7000 writing nocache) 2017-07-08 11:29:33.948836 7fb8e7682700 20 bluestore(/var/lib/ceph/osd/ceph-3) _txc_finish txc 0x56545091eec0 done 2017-07-08 11:29:33.948838 7fb8e7682700 20 bluestore(/var/lib/ceph/osd/ceph-3) _txc_finish txc 0x56544ddbd440 done 2017-07-08 11:29:33.948840 7fb8e7682700 20 bluestore(/var/lib/ceph/osd/ceph-3) _txc_finish txc 0x56545091e680 deferred_queued 2017-07-08 11:29:33.948842 7fb8e7682700 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_release_alloc 0x56545091eec0 [] 2017-07-08 11:29:33.948864 7fb8e7682700 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_release_alloc 0x56544ddbd440 []
Updated by Sage Weil over 6 years ago
- Status changed from Need More Info to Can't reproduce
Actions