Project

General

Profile

Bug #23273

segmentation fault in PrimaryLogPG::recover_got()

Added by Yan Jun almost 3 years ago. Updated almost 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature:

Description

we encounter this fault while using cephfs, it seems be similar to this issue http://tracker.ceph.com/issues/17645

```c++
2018-03-07 17:36:55.753543 7f82b0edc700 0 bad trim to 279'55156 when complete_to is 272'54099 on log((272'54097,355'65157], crt=355'65
157)
2018-03-07 17:37:19.439095 7f82af6d9700 0 bad trim to 279'55405 when complete_to is 272'54127 on log((272'54126,355'65406], crt=355'65
406)
2018-03-07 17:37:31.075067 7f82b1ede700 0 bad trim to 279'55505 when complete_to is 272'54169 on log((279'55405,355'65506], crt=355'65
506)
2018-03-07 17:37:36.870389 7f82b0edc700 0 bad trim to 279'55283 when complete_to is 272'54028 on log((269'54005,355'65284], crt=355'65
284)
2018-03-07 17:37:39.831625 7f82afeda700 0 bad trim to 279'55008 when complete_to is 272'53742 on log((272'53741,355'65009], crt=355'65
009)
2018-03-07 17:37:45.275853 7f82b26df700 -1 ** Caught signal (Segmentation fault) *
in thread 7f82b26df700 thread_name:tp_osd_tp

ceph version 12.2.2-10 (748f9de018390b7e4a53e94c74a6261333298d09) luminous (stable)
1: (()+0xa51f31) [0x7f82e72f6f31]
2: (()+0xf370) [0x7f82e453a370]
3: (PrimaryLogPG::recover_got(hobject_t, eversion_t)+0x266) [0x7f82e6efd786]
4: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo const&, std::shared_ptr<ObjectContext>, bool, ObjectStore::Tran
saction*)+0x2a4) [0x7f82e6f0a3b4]
5: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, PushReplyOp*, ObjectStore::Transaction*)+0x2e2) [0x7f82e707bf82]
6: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x194) [0x7f82e707c224]
7: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2f1) [0x7f82e708ad41]
8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x7f82e6fad470]
9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x5ae) [0x7f82e6f1c60e]
10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x42e) [0x7f82e6d82b8e]
11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x7f82e701e937]
12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x1282) [0x7f82e6db4fd2]
13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x7f82e733b5b9]
14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f82e733d550]
15: (()+0x7dc5) [0x7f82e4532dc5]
16: (clone()+0x6d) [0x7f82e362876d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
```

ceph-osd.8.log.1.gz (396 KB) Yan Jun, 03/08/2018 07:12 AM

ceph-osd.20.log.1.gz (936 KB) Yan Jun, 04/17/2018 06:13 AM

History

#1 Updated by Greg Farnum almost 3 years ago

  • Project changed from Ceph to RADOS

#2 Updated by Josh Durgin almost 3 years ago

Can you reproduce with osds configured with:

debug ms = 1
debug osd = 20
debug filestore = 20

Ideally with logs from the primary and secondary osds for the pg that is hitting this crash.

#3 Updated by Yan Jun almost 3 years ago

Sorry for late reply, but it's hard to reproduce. we reproduce it once with

debug osd = 1/20

I upload the OSD's log file here, and will provide more if we have useful information.

Also available in: Atom PDF