Project

General

Profile

Bug #21259

bluestore: segv in BlueStore::TwoQCache::_trim

Added by Henrik Korkuc 9 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Target version:
-
Start date:
09/06/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:

Description

Sometimes OSD segfaults:

     0> 2017-09-06 06:12:14.448894 7f9544649700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f9544649700 thread_name:bstore_mempool

Currently cluster is backfilling, but I am not sure if it is related and currently I am unable to reproduce this.

OSD log attached.

ceph-osd.76.log.gz (262 KB) Henrik Korkuc, 09/06/2017 06:25 AM

ceph-osd.18.log.gz (374 KB) Petr Ilyin, 02/16/2018 11:57 PM

crash-twoq-segfault.txt.gz (250 KB) Paul Emmerich, 03/02/2018 12:08 AM


Related issues

Related to RADOS - Bug #22039: bluestore: segv during unmount Resolved 11/03/2017
Related to bluestore - Bug #22977: High CPU load caused by operations on onode_map Need More Info 02/09/2018

History

#1 Updated by Josh Durgin 9 months ago

  • Project changed from Ceph to RADOS
  • Category changed from OSD to Correctness/Safety
  • Priority changed from Normal to Urgent
  • Component(RADOS) BlueStore added

Crash from the logs is:


 ceph version 12.2.0-6-gbcf8970 (bcf89707d6c3c3086f001d08c754095b9e7a85f1) luminous (rc)
 1: (()+0xa33dbc) [0xa1d31dcdbc]
 2: (()+0xf890) [0x7f9554120890]
 3: (BlueStore::TwoQCache::_trim(unsigned long, unsigned long)+0x327) [0xa1d309c6b7]
 4: (BlueStore::Cache::trim(unsigned long, float, float, float)+0x469) [0xa1d306d519]
 5: (BlueStore::MempoolThread::entry()+0x155) [0xa1d30739c5]
 6: (()+0x8064) [0x7f9554119064]
 7: (clone()+0x6d) [0x7f955320d62d]

There's no further bluestore logs:

   -30> 2017-09-06 06:12:14.400572 7f955063c700  5 -- 10.30.0.7:6805/108707 >> 10.30.4.33:6833/51572 conn(0xa252e4e800 :6805 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=4155 cs=7 l=0). rx osd.349 seq 45830 0xa25f5673c0 MOSDPGPush(1.d8s4 25646/15542 [PushOp(1:1b0776d3:::rbd_dat
a.2.6fa882ae8944a.00000000000d9edd:head, version: 6576'21852, data_included: [0~1048576], data_size: 1048576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 3, recovery_info: ObjectRecoveryInfo(1:1b0776d3:::rbd_data.2.6fa882ae8944a.00000000000d9edd:head@6576'21852, size: 4
194304, copy_subset: [], clone_subset: {}, snapset: 0=[]:[]), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_
complete:false, omap_recovered_to:, omap_complete:true, error:false))]) v3
   -29> 2017-09-06 06:12:14.400607 7f955063c700  1 -- 10.30.0.7:6805/108707 <== osd.349 10.30.4.33:6833/51572 45830 ==== MOSDPGPush(1.d8s4 25646/15542 [PushOp(1:1b0776d3:::rbd_data.2.6fa882ae8944a.00000000000d9edd:head, version: 6576'21852, data_included: [0~1048576], data_size: 104
8576, omap_header_size: 0, omap_entries_size: 0, attrset_size: 3, recovery_info: ObjectRecoveryInfo(1:1b0776d3:::rbd_data.2.6fa882ae8944a.00000000000d9edd:head@6576'21852, size: 4194304, copy_subset: [], clone_subset: {}, snapset: 0=[]:[]), after_progress: ObjectRecoveryProgress(!fi
rst, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:true, error:false))]) v3 ==== 1049660+0+0 (2554025136 
0 0) 0xa25f5673c0 con 0xa252e4e800
   -28> 2017-09-06 06:12:14.405897 7f953ce3a700  1 -- 10.30.0.7:6805/108707 --> 10.30.24.17:6805/14430 -- MOSDECSubOpReadReply(1.1accs0 25646/15544 ECSubReadReply(tid=349362, attrs_read=0)) v2 -- 0xa20a110000 con 0
   -27> 2017-09-06 06:12:14.407165 7f955063c700  5 -- 10.30.0.7:6820/108707 >> 10.164.4.2:0/81437 conn(0xa1e3ff9000 :6820 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=45 cs=1 l=1). rx osd.363 seq 37194 0xa251ae1c00 osd_ping(ping e25646 stamp 2017-09-06 06:12:14.402077) v4
   -26> 2017-09-06 06:12:14.407199 7f955063c700  1 -- 10.30.0.7:6820/108707 <== osd.363 10.164.4.2:0/81437 37194 ==== osd_ping(ping e25646 stamp 2017-09-06 06:12:14.402077) v4 ==== 2004+0+0 (1125801279 0 0) 0xa251ae1c00 con 0xa1e3ff9000
   -25> 2017-09-06 06:12:14.407214 7f955063c700  1 -- 10.30.0.7:6820/108707 --> 10.164.4.2:0/81437 -- osd_ping(ping_reply e25646 stamp 2017-09-06 06:12:14.402077) v4 -- 0xa251b4c200 con 0
   -24> 2017-09-06 06:12:14.407321 7f9550e3d700  5 -- 10.30.0.7:6807/108707 >> 10.164.4.2:0/81437 conn(0xa2603e5800 :6807 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=82 cs=1 l=1). rx osd.363 seq 37194 0xa20285b800 osd_ping(ping e25646 stamp 2017-09-06 06:12:14.402077) v4
   -23> 2017-09-06 06:12:14.407337 7f9550e3d700  1 -- 10.30.0.7:6807/108707 <== osd.363 10.164.4.2:0/81437 37194 ==== osd_ping(ping e25646 stamp 2017-09-06 06:12:14.402077) v4 ==== 2004+0+0 (1125801279 0 0) 0xa20285b800 con 0xa2603e5800
   -22> 2017-09-06 06:12:14.407346 7f9550e3d700  1 -- 10.30.0.7:6807/108707 --> 10.164.4.2:0/81437 -- osd_ping(ping_reply e25646 stamp 2017-09-06 06:12:14.402077) v4 -- 0xa23e3c8000 con 0
   -21> 2017-09-06 06:12:14.414556 7f955063c700  5 -- 10.30.0.7:6805/108707 >> 10.30.10.5:6837/48854 conn(0xa2527fb000 :6805 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=7035 cs=25 l=0). rx osd.394 seq 22954 0xa240e76a00 pg_backfill(progress 1.1654s2 e 25646/25646 lb 1:2a6c8150
:::rbd_data.2.6fae12ae8944a.00000000001ca50f:26) v3
   -20> 2017-09-06 06:12:14.414590 7f955063c700  1 -- 10.30.0.7:6805/108707 <== osd.394 10.30.10.5:6837/48854 22954 ==== pg_backfill(progress 1.1654s2 e 25646/25646 lb 1:2a6c8150:::rbd_data.2.6fae12ae8944a.00000000001ca50f:26) v3 ==== 979+0+0 (316524789 0 0) 0xa240e76a00 con 0xa2527
fb000
   -19> 2017-09-06 06:12:14.419852 7f955063c700  5 -- 10.30.0.7:6820/108707 >> 10.164.19.2:0/25733 conn(0xa1e78ef000 :6820 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=340 cs=1 l=1). rx osd.367 seq 37551 0xa251ae1c00 osd_ping(ping e25646 stamp 2017-09-06 06:12:14.414651) v4
   -18> 2017-09-06 06:12:14.419875 7f955063c700  1 -- 10.30.0.7:6820/108707 <== osd.367 10.164.19.2:0/25733 37551 ==== osd_ping(ping e25646 stamp 2017-09-06 06:12:14.414651) v4 ==== 2004+0+0 (1640435177 0 0) 0xa251ae1c00 con 0xa1e78ef000
   -17> 2017-09-06 06:12:14.419888 7f955063c700  1 -- 10.30.0.7:6820/108707 --> 10.164.19.2:0/25733 -- osd_ping(ping_reply e25646 stamp 2017-09-06 06:12:14.414651) v4 -- 0xa251b4c200 con 0
   -16> 2017-09-06 06:12:14.419977 7f955163e700  5 -- 10.30.0.7:6807/108707 >> 10.164.19.2:0/25733 conn(0xa207c6b800 :6807 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=340 cs=1 l=1). rx osd.367 seq 37551 0xa251c96a00 osd_ping(ping e25646 stamp 2017-09-06 06:12:14.414651) v4
   -15> 2017-09-06 06:12:14.420005 7f955063c700  5 -- 10.30.0.7:6820/108707 >> 10.30.10.5:0/48854 conn(0xa2522e6800 :6820 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=249 cs=1 l=1). rx osd.394 seq 37522 0xa251ae5400 osd_ping(ping e25646 stamp 2017-09-06 06:12:14.413955) v4
   -14> 2017-09-06 06:12:14.420018 7f955063c700  1 -- 10.30.0.7:6820/108707 <== osd.394 10.30.10.5:0/48854 37522 ==== osd_ping(ping e25646 stamp 2017-09-06 06:12:14.413955) v4 ==== 2004+0+0 (3857141869 0 0) 0xa251ae5400 con 0xa2522e6800
   -13> 2017-09-06 06:12:14.420017 7f955163e700  1 -- 10.30.0.7:6807/108707 <== osd.367 10.164.19.2:0/25733 37551 ==== osd_ping(ping e25646 stamp 2017-09-06 06:12:14.414651) v4 ==== 2004+0+0 (1640435177 0 0) 0xa251c96a00 con 0xa207c6b800
   -12> 2017-09-06 06:12:14.420027 7f955063c700  1 -- 10.30.0.7:6820/108707 --> 10.30.10.5:0/48854 -- osd_ping(ping_reply e25646 stamp 2017-09-06 06:12:14.413955) v4 -- 0xa251b4c200 con 0
   -11> 2017-09-06 06:12:14.420071 7f955163e700  1 -- 10.30.0.7:6807/108707 --> 10.164.19.2:0/25733 -- osd_ping(ping_reply e25646 stamp 2017-09-06 06:12:14.414651) v4 -- 0xa1e3637000 con 0
   -10> 2017-09-06 06:12:14.420181 7f955163e700  5 -- 10.30.0.7:6807/108707 >> 10.30.10.5:0/48854 conn(0xa2522e5000 :6807 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=249 cs=1 l=1). rx osd.394 seq 37522 0xa251b09000 osd_ping(ping e25646 stamp 2017-09-06 06:12:14.413955) v4
    -9> 2017-09-06 06:12:14.420203 7f955163e700  1 -- 10.30.0.7:6807/108707 <== osd.394 10.30.10.5:0/48854 37522 ==== osd_ping(ping e25646 stamp 2017-09-06 06:12:14.413955) v4 ==== 2004+0+0 (3857141869 0 0) 0xa251b09000 con 0xa2522e5000
    -8> 2017-09-06 06:12:14.420221 7f955163e700  1 -- 10.30.0.7:6807/108707 --> 10.30.10.5:0/48854 -- osd_ping(ping_reply e25646 stamp 2017-09-06 06:12:14.413955) v4 -- 0xa1e3637000 con 0
    -7> 2017-09-06 06:12:14.420826 7f953be38700  1 -- 10.30.0.7:6805/108707 --> 10.30.37.34:6845/58057 -- MOSDECSubOpReadReply(1.1e70s0 25646/16413 ECSubReadReply(tid=515210, attrs_read=0)) v2 -- 0xa23dac7700 con 0
    -6> 2017-09-06 06:12:14.423508 7f955163e700  5 -- 10.30.0.7:6807/108707 >> 10.30.12.66:0/81266 conn(0xa252457000 :6807 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=76 cs=1 l=1). rx osd.3 seq 37375 0xa251c96a00 osd_ping(ping e25646 stamp 2017-09-06 06:12:14.418873) v4
    -5> 2017-09-06 06:12:14.423526 7f955163e700  1 -- 10.30.0.7:6807/108707 <== osd.3 10.30.12.66:0/81266 37375 ==== osd_ping(ping e25646 stamp 2017-09-06 06:12:14.418873) v4 ==== 2004+0+0 (350090004 0 0) 0xa251c96a00 con 0xa252457000
    -4> 2017-09-06 06:12:14.423538 7f955163e700  1 -- 10.30.0.7:6807/108707 --> 10.30.12.66:0/81266 -- osd_ping(ping_reply e25646 stamp 2017-09-06 06:12:14.418873) v4 -- 0xa1e3637000 con 0
    -3> 2017-09-06 06:12:14.423568 7f9550e3d700  5 -- 10.30.0.7:6820/108707 >> 10.30.12.66:0/81266 conn(0xa2525a9000 :6820 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=38 cs=1 l=1). rx osd.3 seq 37375 0xa253f8c000 osd_ping(ping e25646 stamp 2017-09-06 06:12:14.418873) v4
    -2> 2017-09-06 06:12:14.423599 7f9550e3d700  1 -- 10.30.0.7:6820/108707 <== osd.3 10.30.12.66:0/81266 37375 ==== osd_ping(ping e25646 stamp 2017-09-06 06:12:14.418873) v4 ==== 2004+0+0 (350090004 0 0) 0xa253f8c000 con 0xa2525a9000
    -1> 2017-09-06 06:12:14.423620 7f9550e3d700  1 -- 10.30.0.7:6820/108707 --> 10.30.12.66:0/81266 -- osd_ping(ping_reply e25646 stamp 2017-09-06 06:12:14.418873) v4 -- 0xa23e3c8000 con 0
     0> 2017-09-06 06:12:14.448894 7f9544649700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f9544649700 thread_name:bstore_mempool

#2 Updated by Mike Lowe 9 months ago

I have also had this segfault on 5/244 osd's in the past 24 hours. It's not been a major problem yet as systemd kicks them back online in relatively short order.

#3 Updated by Henrik Korkuc 9 months ago

I thought that all needed information is between "begin dump of recent events" and "end dump of recent events" so I trimmed log only to these lines.

I posted log for full day of that OSD, it contains one abort and two "FAILED assert(p.second->shared_blob_set.empty())" asserts.

ceph-post-file: 562cbecf-8027-431e-a944-9932a086721b

#4 Updated by Sage Weil 9 months ago

I've pushed a branch, wip-21259 to shaman. Packages should appear at shaman.ceph.com in an hour or so. They add some debug output during bluestore shutdown that should help track this down. Do you mind giving it a try? Thanks!

#5 Updated by Henrik Korkuc 9 months ago

it looks like build failed... https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=trusty,DIST=trusty,MACHINE_SIZE=huge/6101/console

/build/ceph-12.2.0-47-g152806d/src/os/ObjectStore.h: In member function 'void BlueStore::OnodeSpace::dump(int)':
/build/ceph-12.2.0-47-g152806d/src/os/ObjectStore.h:73:16: error: invalid use of non-static data member 'ObjectStore::cct'
   CephContext* cct;
                ^
In file included from /build/ceph-12.2.0-47-g152806d/src/include/Context.h:19:0,
                 from /build/ceph-12.2.0-47-g152806d/src/common/Cond.h:19,
                 from /build/ceph-12.2.0-47-g152806d/src/common/Finisher.h:19,
                 from /build/ceph-12.2.0-47-g152806d/src/os/bluestore/BlueStore.h:36,
                 from /build/ceph-12.2.0-47-g152806d/src/os/bluestore/BlueStore.cc:22:
/build/ceph-12.2.0-47-g152806d/src/os/bluestore/BlueStore.cc:36:22: error: from this location
 #define dout_context cct
                      ^
/build/ceph-12.2.0-47-g152806d/src/common/dout.h:51:7: note: in definition of macro 'dout_impl'
   if (cct->_conf->subsys.should_gather(sub, v)) {   \
       ^
/build/ceph-12.2.0-47-g152806d/src/common/debug.h:23:17: note: in expansion of macro 'ldout'
 #define dout(v) ldout((dout_context), v)
                 ^
/build/ceph-12.2.0-47-g152806d/src/common/debug.h:23:24: note: in expansion of macro 'dout_context'
 #define dout(v) ldout((dout_context), v)
                        ^
/build/ceph-12.2.0-47-g152806d/src/os/bluestore/BlueStore.cc:1621:5: note: in expansion of macro 'dout'
     dout(lvl) << i.first << " : " << *i.second << dendl;
     ^
In file included from /build/ceph-12.2.0-47-g152806d/src/os/bluestore/BlueStore.h:39:0,
                 from /build/ceph-12.2.0-47-g152806d/src/os/bluestore/BlueStore.cc:22:
/build/ceph-12.2.0-47-g152806d/src/os/ObjectStore.h:73:16: error: invalid use of non-static data member 'ObjectStore::cct'
   CephContext* cct;
                ^
Scanning dependencies of target librbd
In file included from /build/ceph-12.2.0-47-g152806d/src/include/Context.h:19:0,
                 from /build/ceph-12.2.0-47-g152806d/src/common/Cond.h:19,
                 from /build/ceph-12.2.0-47-g152806d/src/common/Finisher.h:19,
                 from /build/ceph-12.2.0-47-g152806d/src/os/bluestore/BlueStore.h:36,
                 from /build/ceph-12.2.0-47-g152806d/src/os/bluestore/BlueStore.cc:22:
/build/ceph-12.2.0-47-g152806d/src/os/bluestore/BlueStore.cc:36:22: error: from this location
 #define dout_context cct
                      ^
/build/ceph-12.2.0-47-g152806d/src/common/dout.h:56:37: note: in definition of macro 'dout_impl'
     ceph::logging::Entry *_dout_e = cct->_log->create_entry(v, sub, &_log_exp_length); \
                                     ^
/build/ceph-12.2.0-47-g152806d/src/common/debug.h:23:17: note: in expansion of macro 'ldout'
 #define dout(v) ldout((dout_context), v)
                 ^
/build/ceph-12.2.0-47-g152806d/src/common/debug.h:23:24: note: in expansion of macro 'dout_context'
 #define dout(v) ldout((dout_context), v)
                        ^
/build/ceph-12.2.0-47-g152806d/src/os/bluestore/BlueStore.cc:1621:5: note: in expansion of macro 'dout'
     dout(lvl) << i.first << " : " << *i.second << dendl;
     ^
In file included from /build/ceph-12.2.0-47-g152806d/src/os/bluestore/BlueStore.h:39:0,
                 from /build/ceph-12.2.0-47-g152806d/src/os/bluestore/BlueStore.cc:22:
/build/ceph-12.2.0-47-g152806d/src/os/ObjectStore.h:73:16: error: invalid use of non-static data member 'ObjectStore::cct'
   CephContext* cct;
                ^
In file included from /build/ceph-12.2.0-47-g152806d/src/include/Context.h:19:0,
                 from /build/ceph-12.2.0-47-g152806d/src/common/Cond.h:19,
                 from /build/ceph-12.2.0-47-g152806d/src/common/Finisher.h:19,
                 from /build/ceph-12.2.0-47-g152806d/src/os/bluestore/BlueStore.h:36,
                 from /build/ceph-12.2.0-47-g152806d/src/os/bluestore/BlueStore.cc:22:
/build/ceph-12.2.0-47-g152806d/src/os/bluestore/BlueStore.cc:36:22: error: from this location
 #define dout_context cct
                      ^
/build/ceph-12.2.0-47-g152806d/src/common/dout.h:61:22: note: in definition of macro 'dout_impl'
     auto _dout_cct = cct;      \
                      ^
/build/ceph-12.2.0-47-g152806d/src/common/debug.h:23:17: note: in expansion of macro 'ldout'
 #define dout(v) ldout((dout_context), v)
                 ^
/build/ceph-12.2.0-47-g152806d/src/common/debug.h:23:24: note: in expansion of macro 'dout_context'
 #define dout(v) ldout((dout_context), v)
                        ^
/build/ceph-12.2.0-47-g152806d/src/os/bluestore/BlueStore.cc:1621:5: note: in expansion of macro 'dout'
     dout(lvl) << i.first << " : " << *i.second << dendl;
     ^
make[4]: Leaving directory `/build/ceph-12.2.0-47-g152806d/obj-x86_64-linux-gnu'
make -f src/librbd/CMakeFiles/librbd.dir/build.make src/librbd/CMakeFiles/librbd.dir/build
/build/ceph-12.2.0-47-g152806d/src/os/bluestore/BlueStore.cc:1621:26: error: cannot bind 'std::basic_ostream<char>' lvalue to 'std::basic_ostream<char>&&'
     dout(lvl) << i.first << " : " << *i.second << dendl;
                          ^
In file included from /usr/include/c++/4.8/iostream:39:0,
                 from /build/ceph-12.2.0-47-g152806d/src/include/cpp-btree/btree.h:109,
                 from /build/ceph-12.2.0-47-g152806d/src/include/cpp-btree/btree_set.h:27,
                 from /build/ceph-12.2.0-47-g152806d/src/os/bluestore/BlueStore.cc:20:
/usr/include/c++/4.8/ostream:602:5: error:   initializing argument 1 of 'std::basic_ostream<_CharT, _Traits>& std::operator<<(std::basic_ostream<_CharT, _Traits>&&, const _Tp&) [with _CharT = char; _Traits = std::char_traits<char>; _Tp = BlueStore::Onode]'
     operator<<(basic_ostream<_CharT, _Traits>&& __os, const _Tp& __x)
     ^

#6 Updated by Henrik Korkuc 9 months ago

btw I am running on debian Jessie, it looks like shaman does not build for it, so I am going to build it myself, I also saw multiple fixes pushed to luminous branch, maybe it is worth building them too?

#7 Updated by Sage Weil 8 months ago

Sorry, I missed your earlier message. Repushed the branch; should build now. It also includes the other bluestore fixes queued up for 12.2.1. Can you give it a try and see if you can reproduce the crash on shutdown? Thanks!

#8 Updated by Henrik Korkuc 8 months ago

I upgraded cluster to 12.2.0-178-gba746cd (ba746cd14ddd70a4f24a734f83ff9d276dd327d1) last week (to mitigate aio submission deadlock). I do not experience any segfaults anymore.

#9 Updated by Josh Durgin 8 months ago

  • Status changed from New to Resolved

#10 Updated by Kefu Chai 8 months ago

  • Status changed from Resolved to New

https://www.mail-archive.com/ceph-users@lists.ceph.com/msg40646.html

12.2.1 3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e still has this issue

#11 Updated by Sage Weil 8 months ago

https://github.com/ceph/ceph/pull/18103 is the luminous backport.. will ping you on this ticket once it is merged.

you'll need to enable 'osd debug shutdown = true' in [osd] section of ceph.conf to crank up debugging

#12 Updated by Sage Weil 8 months ago

  • Status changed from New to Need More Info

#13 Updated by Sage Weil 8 months ago

  • Subject changed from Ocasionally Bluestore OSDs segfault to bluestore: segv in BlueStore::TwoQCache::_trim

#14 Updated by Sage Weil 8 months ago

Ok, the patch is merged to teh latest luminous branch, which you can install from https://shaman.ceph.com/.

#15 Updated by Patrick Donnelly 7 months ago

Crash here:

2017-10-24T23:36:30.284 INFO:tasks.ceph.osd.1.smithi001.stderr:2017-10-24 23:36:30.264 34c57700 -1 received  signal: Terminated from  PID: 20208 task name: /usr/bin/python /bin/daemon-helper term valgrind --trace-children=no --child-silent-after-fork=yes --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/osd.1.log --time-stamp=yes --tool=memcheck ceph-osd -f --cluster ceph -i 1  UID: 0
2017-10-24T23:36:30.285 INFO:tasks.ceph.osd.1.smithi001.stderr:2017-10-24 23:36:30.268 34c57700 -1 osd.1 38 *** Got signal Terminated ***
2017-10-24T23:36:30.589 INFO:tasks.ceph.osd.1.smithi001.stderr:2017-10-24 23:36:30.535 34c57700 -1 osd.1 38 shutdown
2017-10-24T23:36:33.463 INFO:tasks.ceph.osd.1.smithi001.stderr:*** Caught signal (Segmentation fault) **
2017-10-24T23:36:33.463 INFO:tasks.ceph.osd.1.smithi001.stderr: in thread 34c57700 thread_name:signal_handler
2017-10-24T23:36:33.516 INFO:tasks.ceph.osd.1.smithi001.stderr: ceph version 13.0.0-2328-gfc31752 (fc317524b8a24874a4a57c192e5da46ea11d593f) mimic (dev)
2017-10-24T23:36:33.517 INFO:tasks.ceph.osd.1.smithi001.stderr: 1: (()+0xa8f831) [0xb97831]
2017-10-24T23:36:33.517 INFO:tasks.ceph.osd.1.smithi001.stderr: 2: (()+0xf5e0) [0xc57c5e0]
2017-10-24T23:36:33.517 INFO:tasks.ceph.osd.1.smithi001.stderr: 3: (operator<<(std::ostream&, bluestore_shared_blob_t const&)+0xc) [0xad0e0c]
2017-10-24T23:36:33.517 INFO:tasks.ceph.osd.1.smithi001.stderr: 4: (operator<<(std::ostream&, BlueStore::SharedBlob const&)+0xba) [0xa367aa]
2017-10-24T23:36:33.517 INFO:tasks.ceph.osd.1.smithi001.stderr: 5: (BlueStore::SharedBlobSet::dump(CephContext*, int)+0x16e) [0xa3695e]
2017-10-24T23:36:33.517 INFO:tasks.ceph.osd.1.smithi001.stderr: 6: (BlueStore::_flush_cache()+0x1ae) [0xa63e8e]
2017-10-24T23:36:33.517 INFO:tasks.ceph.osd.1.smithi001.stderr: 7: (BlueStore::umount()+0x1e6) [0xa644c6]
2017-10-24T23:36:33.517 INFO:tasks.ceph.osd.1.smithi001.stderr: 8: (OSD::shutdown()+0x1685) [0x690125]
2017-10-24T23:36:33.517 INFO:tasks.ceph.osd.1.smithi001.stderr: 9: (OSD::handle_signal(int)+0x11f) [0x69092f]
2017-10-24T23:36:33.518 INFO:tasks.ceph.osd.1.smithi001.stderr: 10: (SignalHandler::entry()+0x1d7) [0xb98b97]
2017-10-24T23:36:33.518 INFO:tasks.ceph.osd.1.smithi001.stderr: 11: (()+0x7e25) [0xc574e25]
2017-10-24T23:36:33.518 INFO:tasks.ceph.osd.1.smithi001.stderr: 12: (clone()+0x6d) [0xd4bb34d]
2017-10-24T23:36:33.558 INFO:tasks.ceph.osd.1.smithi001.stderr:2017-10-24 23:36:33.433 34c57700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _flush_cache stray shared blobs on 3.4s0_head
2017-10-24T23:36:33.559 INFO:tasks.ceph.osd.1.smithi001.stderr:2017-10-24 23:36:33.504 34c57700 -1 *** Caught signal (Segmentation fault) **

From: /ceph/teuthology-archive/pdonnell-2017-10-24_21:18:53-multimds-wip-pdonnell-testing-201710240526-testing-basic-smithi/1771028/teuthology.log

Very reproducible. Same crash in many other jobs.

Sage believes this crash reproduces this bug.

17:08:00            @sage | can probably put it under http://tracker.ceph.com/issues/21259, the bug it was added to track down.
17:08:41            @sage | (the fact that the debug dump si crashing means you're probably reproducing that bug, which is good :)

#16 Updated by Sage Weil 7 months ago

  • Related to Bug #22039: bluestore: segv during unmount added

#17 Updated by Sage Weil 6 months ago

  • Status changed from Need More Info to Duplicate

#18 Updated by Petr Ilyin 3 months ago

Having this issue multiple times on ceph version 12.2.2 with this patch https://github.com/ceph/ceph/pull/18805 already applyed
Same error occurs on different OSD and different nodes.
Using Ceph on Proxmox VE.

*2018-02-16 17:11:10.982422 7f960f443700 -1 *** Caught signal (Segmentation fault) **
  in thread 7f960f443700 thread_name:bstore_mempool

  ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)
  1: (()+0xa16664) [0x55fccb517664]
  2: (()+0x110c0) [0x7f961a0190c0]
  3: (BlueStore::TwoQCache::_trim(unsigned long, unsigned long)+0xaf0) [0x55fccb3c3eb0]
  4: (BlueStore::Cache::trim(unsigned long, float, float, float)+0x4dd) [0x55fccb39467d]
  5: (BlueStore::MempoolThread::entry()+0x15d) [0x55fccb39b49d]
  6: (()+0x7494) [0x7f961a00f494]
  7: (clone()+0x3f) [0x7f9619096aff]
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

 --- begin dump of recent events ---

#19 Updated by Igor Fedotov 3 months ago

There is another one issue reproduction at https://tracker.ceph.com/issues/22977

#20 Updated by Sage Weil 3 months ago

  • Project changed from RADOS to bluestore
  • Category deleted (Correctness/Safety)

#21 Updated by Sage Weil 3 months ago

  • Status changed from Duplicate to Verified

#22 Updated by Sage Weil 3 months ago

  • Related to Bug #22977: High CPU load caused by operations on onode_map added

#23 Updated by Paul Emmerich 3 months ago

I'm also seeing this on a completely unrelated 12.2.4 cluster. Only thing they have in common is: they use erasure coding and RBD and were backfilling during the crash.
It happens more often when increasing max_backfills.

But I've captured a 20/20 log, that might be helpful :)

Here's the crash itself, there's clearly some memory corruption:

  -301> 2018-03-01 23:51:11.260861 7fc2a5d73700 20 bluestore.2QCache(0x5637791caa00) _trim  8#54:b1ebb6f1:::`%f9%03&9V%00%00.41.6ee35c2ae8944a.000000000001a3a7:head# has 97451136 refs; skipping
  -298> 2018-03-01 23:51:11.260869 7fc2a5d73700 20 bluestore.2QCache(0x5637791caa00) _trim considering 0xffffffffffffff30
     0> 2018-03-01 23:51:11.262538 7fc2a5d73700 -1 *** Caught signal (Segmentation fault) **
 in thread 7fc2a5d73700 thread_name:bstore_mempool

 ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)
 1: (()+0xa74234) [0x56376f7a0234]
 2: (()+0x11390) [0x7fc2b08ea390]
 3: (BlueStore::TwoQCache::_trim(unsigned long, unsigned long)+0x4d6) [0x56376f650476]
 4: (BlueStore::Cache::trim(unsigned long, float, float, float)+0x4e4) [0x56376f61f594]
 5: (BlueStore::MempoolThread::entry()+0x155) [0x56376f625dd5]
 6: (()+0x76ba) [0x7fc2b08e06ba]
 7: (clone()+0x6d) [0x7fc2af95741d]

Full log is attached.

#24 Updated by Paul Emmerich 3 months ago

And another slightly different log:

 -1516> 2018-03-02 00:14:22.212311 7f0290f6b700 10 trim shard target 320 M ratio 0.833333 (266 M + 5 ),  current 320 M (320 M + 36864 ), need_to_free 41920  (5061  + 36859 ) -> max 139733 onodes + 5 buffer
 -1515> 2018-03-02 00:14:22.212319 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim onodes 139736 / 139733 buffers 36864 / 5
 -1514> 2018-03-02 00:14:22.212321 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim buffer_warm_in -> out buffer(0x559967a9e960 space 0x5598611ca5b8 0x5000~1000 clean)
 -1512> 2018-03-02 00:14:22.212325 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim buffer_warm_in -> out buffer(0x55995768f720 space 0x55995931ae08 0x5000~1000 clean)
 -1511> 2018-03-02 00:14:22.212327 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim buffer_warm_in -> out buffer(0x5597e1fa8800 space 0x559945ae5048 0x6000~1000 clean)
 -1509> 2018-03-02 00:14:22.212332 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim buffer_warm_in -> out buffer(0x55993d842780 space 0x5599bcfd04d8 0x9000~1000 clean)
 -1508> 2018-03-02 00:14:22.212336 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim buffer_warm_in -> out buffer(0x55995702eda0 space 0x5599bcfd05b8 0xc000~1000 clean)
 -1506> 2018-03-02 00:14:22.212339 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim buffer_warm_in -> out buffer(0x55998d774280 space 0x55980a331668 0x5000~1000 clean)
 -1505> 2018-03-02 00:14:22.212342 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim buffer_warm_in -> out buffer(0x55998bdaf180 space 0x5598d6c1ff28 0x0~1000 clean)
 -1504> 2018-03-02 00:14:22.212347 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim buffer_warm_in -> out buffer(0x55998d1c45a0 space 0x5598d6c1ff28 0x3000~1000 clean)
 -1503> 2018-03-02 00:14:22.212352 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim buffer_warm_in -> out buffer(0x5599ebbcef00 space 0x559983ed27e8 0x2000~1000 clean)
 -1499> 2018-03-02 00:14:22.212361 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim evicted 36864 bytes from warm_in list, done evicting warm_in buffers
 -1496> 2018-03-02 00:14:22.212372 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim buffer_warm_out rm buffer(0x559967a9e960 space 0x5598611ca5b8 0x5000~1000 empty)
 -1494> 2018-03-02 00:14:22.212383 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _rm_buffer buffer(0x559967a9e960 space 0x5598611ca5b8 0x5000~1000 empty)
 -1493> 2018-03-02 00:14:22.212390 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim buffer_warm_out rm buffer(0x55995768f720 space 0x55995931ae08 0x5000~1000 empty)
 -1490> 2018-03-02 00:14:22.212396 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _rm_buffer buffer(0x55995768f720 space 0x55995931ae08 0x5000~1000 empty)
 -1488> 2018-03-02 00:14:22.212399 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim buffer_warm_out rm buffer(0x5597e1fa8800 space 0x559945ae5048 0x6000~1000 empty)
 -1486> 2018-03-02 00:14:22.212410 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _rm_buffer buffer(0x5597e1fa8800 space 0x559945ae5048 0x6000~1000 empty)
 -1485> 2018-03-02 00:14:22.212414 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim buffer_warm_out rm buffer(0x55993d842780 space 0x5599bcfd04d8 0x9000~1000 empty)
 -1484> 2018-03-02 00:14:22.212418 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _rm_buffer buffer(0x55993d842780 space 0x5599bcfd04d8 0x9000~1000 empty)
 -1483> 2018-03-02 00:14:22.212421 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim buffer_warm_out rm buffer(0x55995702eda0 space 0x5599bcfd05b8 0xc000~1000 empty)
 -1482> 2018-03-02 00:14:22.212424 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _rm_buffer buffer(0x55995702eda0 space 0x5599bcfd05b8 0xc000~1000 empty)
 -1480> 2018-03-02 00:14:22.212426 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim buffer_warm_out rm buffer(0x55998d774280 space 0x55980a331668 0x5000~1000 empty)
 -1479> 2018-03-02 00:14:22.212428 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _rm_buffer buffer(0x55998d774280 space 0x55980a331668 0x5000~1000 empty)
 -1478> 2018-03-02 00:14:22.212430 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim buffer_warm_out rm buffer(0x55998bdaf180 space 0x5598d6c1ff28 0x0~1000 empty)
 -1477> 2018-03-02 00:14:22.212432 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _rm_buffer buffer(0x55998bdaf180 space 0x5598d6c1ff28 0x0~1000 empty)
 -1476> 2018-03-02 00:14:22.212435 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim buffer_warm_out rm buffer(0x55998d1c45a0 space 0x5598d6c1ff28 0x3000~1000 empty)
 -1475> 2018-03-02 00:14:22.212438 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _rm_buffer buffer(0x55998d1c45a0 space 0x5598d6c1ff28 0x3000~1000 empty)
 -1474> 2018-03-02 00:14:22.212440 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim buffer_warm_out rm buffer(0x5599ebbcef00 space 0x559983ed27e8 0x2000~1000 empty)
 -1473> 2018-03-02 00:14:22.212442 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _rm_buffer buffer(0x5599ebbcef00 space 0x559983ed27e8 0x2000~1000 empty)
 -1472> 2018-03-02 00:14:22.212448 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim considering 0x5599630ca000
 -1470> 2018-03-02 00:14:22.212460 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim considering 0x5598e82f0580
 -1468> 2018-03-02 00:14:22.212483 7f0290f6b700 20 bluestore.sharedblob(0x5598273889a0) put 0x5598273889a0 removing self from set 0x5597ce7e62f8
 -1466> 2018-03-02 00:14:22.212488 7f0290f6b700 20 bluestore.BufferSpace(0x5598273889b8 in 0x5597c10ae760) _clear
 -1465> 2018-03-02 00:14:22.212490 7f0290f6b700 10 bluestore.OnodeSpace(0x5597ce7e6368 in 0x5597c10ae760) clear
   -15> 2018-03-02 00:14:22.223445 7f0290f6b700 20 bluestore.2QCache(0x5597c10ae760) _trim considering 0x5599bd587600
     0> 2018-03-02 00:14:22.225070 7f0290f6b700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f0290f6b700 thread_name:bstore_mempool

 ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)
 1: (()+0xa74234) [0x5597b64c3234]
 2: (()+0x11390) [0x7f029b2e1390]
 3: (BlueStore::TwoQCache::_trim(unsigned long, unsigned long)+0x518) [0x5597b63734b8]
 4: (BlueStore::Cache::trim(unsigned long, float, float, float)+0x4e4) [0x5597b6342594]
 5: (BlueStore::MempoolThread::entry()+0x155) [0x5597b6348dd5]
 6: (()+0x76ba) [0x7f029b2d76ba]
 7: (clone()+0x6d) [0x7f029a34e41d]

#26 Updated by Sage Weil 2 months ago

  • Status changed from Verified to Need Review

#27 Updated by Sage Weil 2 months ago

  • Backport set to luminous

#28 Updated by Igor Fedotov about 2 months ago

  • Status changed from Need Review to Verified

#29 Updated by Shane Voss about 2 months ago

We have also had this fault a number of times.
This was during a migration to bluestore - so we were backfilling for weeks on end.

#30 Updated by Konstantin Shalygin about 2 months ago

The same issue here - segfaults on 12.2.4 when recovery:

2018-04-05 17:20:33.648507 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 37917 : cluster [INF] osd.0 172.16.16.11:6802/76528 boot
2018-04-05 18:25:04.059670 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 39617 : cluster [INF] osd.13 172.16.16.16:6808/77184 boot
2018-04-05 18:56:00.809524 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 40904 : cluster [INF] osd.35 172.16.16.14:6802/77973 boot
2018-04-05 20:10:08.159086 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 43018 : cluster [INF] osd.1 172.16.16.12:6814/76581 boot
2018-04-05 22:51:50.640285 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 46848 : cluster [INF] osd.34 172.16.16.14:6808/78249 boot
2018-04-05 23:33:32.429983 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 47734 : cluster [INF] osd.43 172.16.16.15:6802/76348 boot
2018-04-06 02:46:31.293435 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 51076 : cluster [INF] osd.52 172.16.16.13:6816/78283 boot
2018-04-06 02:58:07.583917 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 51458 : cluster [INF] osd.1 172.16.16.12:6814/76900 boot
2018-04-06 06:26:51.772896 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 55413 : cluster [INF] osd.13 172.16.16.16:6808/77833 boot
2018-04-06 06:47:53.799341 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 55886 : cluster [INF] osd.10 172.16.16.15:6816/76752 boot
2018-04-06 08:09:48.444658 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 58233 : cluster [INF] osd.12 172.16.16.13:6808/78592 boot
2018-04-06 08:13:11.737694 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 58540 : cluster [INF] osd.6 172.16.16.15:6812/76911 boot
2018-04-06 08:21:58.833487 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 58850 : cluster [INF] osd.41 172.16.16.12:6806/77182 boot
2018-04-06 09:00:44.845335 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 59860 : cluster [INF] osd.23 172.16.16.14:6806/78726 boot
2018-04-06 09:15:19.460898 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 60387 : cluster [INF] osd.14 172.16.16.13:6804/78735 boot
2018-04-06 09:31:53.586265 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 61023 : cluster [INF] osd.45 172.16.16.15:6808/77067 boot
2018-04-06 10:46:09.516441 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 63213 : cluster [INF] osd.1 172.16.16.12:6814/77367 boot
2018-04-06 11:06:44.451730 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 63841 : cluster [INF] osd.10 172.16.16.15:6816/77237 boot
2018-04-06 11:28:15.592588 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 64719 : cluster [INF] osd.22 172.16.16.14:6812/78921 boot
2018-04-06 11:39:21.630589 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 65199 : cluster [INF] osd.52 172.16.16.13:6816/78936 boot
2018-04-06 13:05:01.912893 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 67995 : cluster [INF] osd.51 172.16.16.13:6810/79096 boot
2018-04-06 13:20:31.406992 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 68651 : cluster [INF] osd.16 172.16.16.16:6804/78231 boot
2018-04-06 14:21:30.395650 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 70290 : cluster [INF] osd.13 172.16.16.16:6808/78375 boot
2018-04-06 16:22:24.596451 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 71208 : cluster [INF] osd.36 172.16.16.14:6814/79203 boot
2018-04-06 16:33:04.842059 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 71497 : cluster [INF] osd.46 172.16.16.11:6816/77612 boot
2018-04-06 16:33:33.161423 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 71513 : cluster [INF] osd.44 172.16.16.15:6804/78049 boot
2018-04-06 17:00:27.836423 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 71655 : cluster [INF] osd.49 172.16.16.11:6800/77732 boot
2018-04-06 17:19:06.133746 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 71707 : cluster [INF] osd.10 172.16.16.16:6814/81917 boot
2018-04-06 18:04:11.994488 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 72682 : cluster [INF] osd.11 172.16.16.15:6814/78219 boot
2018-04-06 18:06:35.167190 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 72916 : cluster [INF] osd.3 172.16.16.12:6810/77752 boot
2018-04-06 20:19:07.391961 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 75812 : cluster [INF] osd.14 172.16.16.15:6816/81250 boot
2018-04-06 20:23:20.064143 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 75904 : cluster [INF] osd.15 172.16.16.13:6804/82866 boot
2018-04-06 21:06:06.339924 mon.ceph-mon0 mon.0 172.16.16.2:6789/0 77066 : cluster [INF] osd.12 172.16.16.13:6808/85165 boot

#31 Updated by Sage Weil about 2 months ago

  • Status changed from Verified to Resolved

Igor deduced this was missing backports in luminous. THey're merged now, will be in 12.2.5:

https://github.com/ceph/ceph/pull/20964

#32 Updated by Igor Fedotov about 2 months ago

Sage, to be honest I'm still uncertain if that patch is related to this issue.
I haven't managed to reproduce this exact segv. And even can't explain how it might be caused due to the patch. Can you?
Patch is valid anyway though...

Also available in: Atom PDF