Bug #23273
opensegmentation fault in PrimaryLogPG::recover_got()
0%
Description
we encounter this fault while using cephfs, it seems be similar to this issue http://tracker.ceph.com/issues/17645
```c++
2018-03-07 17:36:55.753543 7f82b0edc700 0 bad trim to 279'55156 when complete_to is 272'54099 on log((272'54097,355'65157], crt=355'65
157)
2018-03-07 17:37:19.439095 7f82af6d9700 0 bad trim to 279'55405 when complete_to is 272'54127 on log((272'54126,355'65406], crt=355'65
406)
2018-03-07 17:37:31.075067 7f82b1ede700 0 bad trim to 279'55505 when complete_to is 272'54169 on log((279'55405,355'65506], crt=355'65
506)
2018-03-07 17:37:36.870389 7f82b0edc700 0 bad trim to 279'55283 when complete_to is 272'54028 on log((269'54005,355'65284], crt=355'65
284)
2018-03-07 17:37:39.831625 7f82afeda700 0 bad trim to 279'55008 when complete_to is 272'53742 on log((272'53741,355'65009], crt=355'65
009)
2018-03-07 17:37:45.275853 7f82b26df700 -1 ** Caught signal (Segmentation fault) *
in thread 7f82b26df700 thread_name:tp_osd_tp
ceph version 12.2.2-10 (748f9de018390b7e4a53e94c74a6261333298d09) luminous (stable)
1: (()+0xa51f31) [0x7f82e72f6f31]
2: (()+0xf370) [0x7f82e453a370]
3: (PrimaryLogPG::recover_got(hobject_t, eversion_t)+0x266) [0x7f82e6efd786]
4: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo const&, std::shared_ptr<ObjectContext>, bool, ObjectStore::Tran
saction*)+0x2a4) [0x7f82e6f0a3b4]
5: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, PushReplyOp*, ObjectStore::Transaction*)+0x2e2) [0x7f82e707bf82]
6: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x194) [0x7f82e707c224]
7: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2f1) [0x7f82e708ad41]
8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x7f82e6fad470]
9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x5ae) [0x7f82e6f1c60e]
10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x42e) [0x7f82e6d82b8e]
11: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x7f82e701e937]
12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x1282) [0x7f82e6db4fd2]
13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x7f82e733b5b9]
14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f82e733d550]
15: (()+0x7dc5) [0x7f82e4532dc5]
16: (clone()+0x6d) [0x7f82e362876d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
```
Files
Updated by Josh Durgin about 6 years ago
Can you reproduce with osds configured with:
debug ms = 1 debug osd = 20 debug filestore = 20
Ideally with logs from the primary and secondary osds for the pg that is hitting this crash.
Updated by Yan Jun about 6 years ago
- File ceph-osd.20.log.1.gz ceph-osd.20.log.1.gz added
Sorry for late reply, but it's hard to reproduce. we reproduce it once with
debug osd = 1/20
I upload the OSD's log file here, and will provide more if we have useful information.