Project

General

Profile

Actions

Bug #20561

closed

bluestore: segv in _deferred_submit_unlock from deferred_try_submit, _txc_finish

Added by Sage Weil almost 7 years ago. Updated over 6 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2017-07-09T22:05:21.544 INFO:tasks.ceph.osd.4.smithi160.stderr:*** Caught signal (Segmentation fault) **
2017-07-09T22:05:21.545 INFO:tasks.ceph.osd.4.smithi160.stderr: in thread 7f75760ae700 thread_name:bstore_kv_final
2017-07-09T22:05:21.545 INFO:tasks.ceph.osd.4.smithi160.stderr: ceph version 12.0.3-2680-g2977f4c (2977f4cc13470652c1f8223770b751d4f753c85d) luminous (rc)
2017-07-09T22:05:21.545 INFO:tasks.ceph.osd.4.smithi160.stderr: 1: (()+0x9d1c19) [0x7f758afeec19]
2017-07-09T22:05:21.545 INFO:tasks.ceph.osd.4.smithi160.stderr: 2: (()+0x10330) [0x7f7588b24330]
2017-07-09T22:05:21.545 INFO:tasks.ceph.osd.4.smithi160.stderr: 3: (BlueStore::_deferred_submit_unlock(BlueStore::OpSequencer*)+0x67d) [0x7f758aea842d]
2017-07-09T22:05:21.545 INFO:tasks.ceph.osd.4.smithi160.stderr: 4: (BlueStore::deferred_try_submit()+0x3f0) [0x7f758aec0820]
2017-07-09T22:05:21.545 INFO:tasks.ceph.osd.4.smithi160.stderr: 5: (BlueStore::_txc_finish(BlueStore::TransContext*)+0xad8) [0x7f758aec1438]
2017-07-09T22:05:21.546 INFO:tasks.ceph.osd.4.smithi160.stderr: 6: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x1f6) [0x7f758aed1eb6]
2017-07-09T22:05:21.546 INFO:tasks.ceph.osd.4.smithi160.stderr: 7: (BlueStore::_kv_finalize_thread()+0x618) [0x7f758aed3878]
2017-07-09T22:05:21.546 INFO:tasks.ceph.osd.4.smithi160.stderr: 8: (BlueStore::KVFinalizeThread::entry()+0xd) [0x7f758af26cad]
2017-07-09T22:05:21.546 INFO:tasks.ceph.osd.4.smithi160.stderr: 9: (()+0x8184) [0x7f7588b1c184]
2017-07-09T22:05:21.546 INFO:tasks.ceph.osd.4.smithi160.stderr: 10: (clone()+0x6d) [0x7f7587c0c37d]
2017-07-09T22:05:21.546 INFO:tasks.ceph.osd.4.smithi160.stderr:2017-07-09 22:05:21.523919 7f75760ae700 -1 *** Caught signal (Segmentation fault) **
2017-07-09T22:05:21.546 INFO:tasks.ceph.osd.4.smithi160.stderr: in thread 7f75760ae700 thread_name:bstore_kv_final

/a/sage-2017-07-09_19:43:05-rados:thrash-wip-11793-distro-basic-smithi/1379576

no log or core unfortunately.

Actions #1

Updated by Sage Weil almost 7 years ago

  • Priority changed from High to Urgent

This might be related to a failure reported on the list:

Date: Mon, 10 Jul 2017 08:06:36 +0000  
From: Wangwenfeng <wang.wenfeng@h3c.com>  
To: "'ceph-devel@vger.kernel.org'" <ceph-devel@vger.kernel.org>  
Subject: osd down when run fio randwrite 4k using bluestore   

Hi,  
  I setup a Ceph cluster of Luminous, it’s osd use bluestore and I create a cephfs, it’s metadata using replicated and data pool using erasure.
   pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 52 flags hashpspool stripe_width 0
   pool 2 'EC_2_1_8' erasure size 3 min_size 2 crush_ruleset 2 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 56 flags hashpspool,ec_overwrites stripe_width 8192 expected_num_objects 27000000
  When run fio to test the cluster, which command is
   fio --numjobs=16 --iodepth=16 --ioengine=libaio --runtime=600 --direct=1 --group_reporting --rw=randwrite --bs=4k --name=aa --filename=/ec/1.txt --size=500G

  for a while time, some osds is reported down.  
  The bluestore log is

2017-07-08 11:29:33.948698 7fb8e7682700 20 bluefs _flush_and_sync_log cleaned file file(ino 30 size 0x2f44a1 mtime 2017-07-08 11:29:33.925066 bdev 1 extents [1:0x2300000+100000,1:0x2900000+200000])
2017-07-08 11:29:33.948722 7fb8e7682700 20 bluestore(/var/lib/ceph/osd/ceph-3) _kv_sync_thread committed 1 cleaned 1 in 0.023813 (0.000012 flush + 0.023800 kv commit)
2017-07-08 11:29:33.948726 7fb8e7682700 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_state_proc txc 0x56545091e680 kv_submitted
2017-07-08 11:29:33.948728 7fb8e7682700 20 bluestore(/var/lib/ceph/osd/ceph-3) _txc_committed_kv txc 0x56545091e680
2017-07-08 11:29:33.948741 7fb8e7682700 20 bluestore(/var/lib/ceph/osd/ceph-3) _deferred_queue txc 0x56545091e680 osr 0x56544e271800
2017-07-08 11:29:33.948758 7fb8e7682700 20 bluestore.DeferredBatch(0x56544e2fc880) prepare_write seq 155 0xdc5f0000~4000 crc 9d8983a0
2017-07-08 11:29:33.948784 7fb8e7682700 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_state_proc txc 0x56545091eec0 deferred_cleanup
2017-07-08 11:29:33.948787 7fb8e7682700 20 bluestore(/var/lib/ceph/osd/ceph-3) _txc_finish 0x56545091eec0 onodes 0x5654516618c0
2017-07-08 11:29:33.948790 7fb8e7682700 20 bluestore.BufferSpace(0x565450608698 in 0x56544dc495e0) finish_write discard buffer(0x56545177e010 space 0x565450608698 0x0~80000 writing nocache)
2017-07-08 11:29:33.948804 7fb8e7682700 20 bluestore.BufferSpace(0x565450608858 in 0x56544dc495e0) finish_write discard buffer(0x56545177def0 space 0x565450608858 0x0~30000 writing nocache)
2017-07-08 11:29:33.948812 7fb8e7682700 20 bluestore.BufferSpace(0x565450609738 in 0x56544dc495e0) finish_write discard buffer(0x56545177df80 space 0x565450609738 0x0~80000 writing nocache)
2017-07-08 11:29:33.948822 7fb8e7682700 20 bluestore.BufferSpace(0x565450609c08 in 0x56544dc495e0) finish_write discard buffer(0x56545177de60 space 0x565450609c08 0x30000~7000 writing nocache)
2017-07-08 11:29:33.948836 7fb8e7682700 20 bluestore(/var/lib/ceph/osd/ceph-3) _txc_finish  txc 0x56545091eec0 done
2017-07-08 11:29:33.948838 7fb8e7682700 20 bluestore(/var/lib/ceph/osd/ceph-3) _txc_finish  txc 0x56544ddbd440 done
2017-07-08 11:29:33.948840 7fb8e7682700 20 bluestore(/var/lib/ceph/osd/ceph-3) _txc_finish  txc 0x56545091e680 deferred_queued
2017-07-08 11:29:33.948842 7fb8e7682700 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_release_alloc 0x56545091eec0 []
2017-07-08 11:29:33.948864 7fb8e7682700 10 bluestore(/var/lib/ceph/osd/ceph-3) _txc_release_alloc 0x56544ddbd440 []

Actions #2

Updated by Sage Weil over 6 years ago

  • Status changed from Need More Info to Can't reproduce
Actions

Also available in: Atom PDF