Bug #23273: segmentation fault in PrimaryLogPG::recover_got() - RADOS - Ceph

Actions

Copy link

Bug #23273

open

segmentation fault in PrimaryLogPG::recover_got()

Added by Yan Jun about 6 years ago. Updated about 6 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v12.2.2

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

we encounter this fault while using cephfs, it seems be similar to this issue http://tracker.ceph.com/issues/17645

```c++
2018-03-07 17:36:55.753543 7f82b0edc700 0 bad trim to 279'55156 when complete_to is 272'54099 on log((272'54097,355'65157], crt=355'65
157)
2018-03-07 17:37:19.439095 7f82af6d9700 0 bad trim to 279'55405 when complete_to is 272'54127 on log((272'54126,355'65406], crt=355'65
406)
2018-03-07 17:37:31.075067 7f82b1ede700 0 bad trim to 279'55505 when complete_to is 272'54169 on log((279'55405,355'65506], crt=355'65
506)
2018-03-07 17:37:36.870389 7f82b0edc700 0 bad trim to 279'55283 when complete_to is 272'54028 on log((269'54005,355'65284], crt=355'65
284)
2018-03-07 17:37:39.831625 7f82afeda700 0 bad trim to 279'55008 when complete_to is 272'53742 on log((272'53741,355'65009], crt=355'65
009)
2018-03-07 17:37:45.275853 7f82b26df700 -1 ** Caught signal (Segmentation fault) *
in thread 7f82b26df700 thread_name:tp_osd_tp

ceph version 12.2.2-10 (748f9de018390b7e4a53e94c74a6261333298d09) luminous (stable)
 1: (()+0xa51f31) [0x7f82e72f6f31]
 2: (()+0xf370) [0x7f82e453a370]
 3: (PrimaryLogPG::recover_got(hobject_t, eversion_t)+0x266) [0x7f82e6efd786]
 4: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo const&, std::shared_ptr&lt;ObjectContext&gt;, bool, ObjectStore::Tran
saction*)+0x2a4) [0x7f82e6f0a3b4]
 5: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, PushReplyOp*, ObjectStore::Transaction*)+0x2e2) [0x7f82e707bf82]
 6: (ReplicatedBackend::_do_push(boost::intrusive_ptr&lt;OpRequest&gt;)+0x194) [0x7f82e707c224]
 7: (ReplicatedBackend::_handle_message(boost::intrusive_ptr&lt;OpRequest&gt;)+0x2f1) [0x7f82e708ad41]
 8: (PGBackend::handle_message(boost::intrusive_ptr&lt;OpRequest&gt;)+0x50) [0x7f82e6fad470]
 9: (PrimaryLogPG::do_request(boost::intrusive_ptr&lt;OpRequest&gt;&, ThreadPool::TPHandle&)+0x5ae) [0x7f82e6f1c60e]
 10: (OSD::dequeue_op(boost::intrusive_ptr&lt;PG&gt;, boost::intrusive_ptr&lt;OpRequest&gt;, ThreadPool::TPHandle&)+0x42e) [0x7f82e6d82b8e]
 11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr&lt;OpRequest&gt; const&)+0x57) [0x7f82e701e937]
 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x1282) [0x7f82e6db4fd2]
 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x7f82e733b5b9]
 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f82e733d550]
 15: (()+0x7dc5) [0x7f82e4532dc5]
 16: (clone()+0x6d) [0x7f82e362876d]
 NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.
```

Files

Download all files

ceph-osd.8.log.1.gz (396 KB) ceph-osd.8.log.1.gz		Yan Jun, 03/08/2018 07:12 AM
ceph-osd.20.log.1.gz (936 KB) ceph-osd.20.log.1.gz		Yan Jun, 04/17/2018 06:13 AM

Actions

Copy link

Updated by Greg Farnum about 6 years ago

Project changed from Ceph to RADOS

Actions

Copy link

Updated by Josh Durgin about 6 years ago

Can you reproduce with osds configured with:

debug ms = 1
debug osd = 20
debug filestore = 20

Ideally with logs from the primary and secondary osds for the pg that is hitting this crash.

Actions

Copy link

Updated by Yan Jun about 6 years ago

File ceph-osd.20.log.1.gz ceph-osd.20.log.1.gz added

Sorry for late reply, but it's hard to reproduce. we reproduce it once with

debug osd = 1/20

I upload the OSD's log file here, and will provide more if we have useful information.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #23273

segmentation fault in PrimaryLogPG::recover_got()

Updated by Greg Farnum about 6 years ago

Updated by Josh Durgin about 6 years ago

Updated by Yan Jun about 6 years ago