Bug #16688
closedCrash in "FileStore::queue_transactions" in rados run
0%
Description
Run: http://pulpito.ceph.com/yuriw-2016-07-12_13:07:42-rados-wip-yuri-testing_2016_7_12-distro-basic-smithi/
Jobs: 309938
Logs: http://qa-proxy.ceph.com/teuthology/yuriw-2016-07-12_13:07:42-rados-wip-yuri-testing_2016_7_12-distro-basic-smithi/309938/
in remote/*/ceph-osd.0.log.gz
ceph-osd.0.log.gz:2037417389- in thread 7f718bcb6700 ceph-osd.0.log.gz:2037417413- ceph-osd.0.log.gz:2037417414: ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) ceph-osd.0.log.gz:2037417478- 1: ceph-osd() [0x98207f] ceph-osd.0.log.gz:2037417504- 2: (()+0x10340) [0x7f71a2b9a340] ceph-osd.0.log.gz:2037417538- 3: (gsignal()+0x39) [0x7f71a1241cc9] ceph-osd.0.log.gz:2037417576- 4: (abort()+0x148) [0x7f71a12450d8] ceph-osd.0.log.gz:2037417613- 5: (tcmalloc::Log(tcmalloc::LogMode, char const*, int, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem)+0x204) [0x7f71a2dc7114] ceph-osd.0.log.gz:2037417769- 6: (()+0x14184) [0x7f71a2dbc184] ceph-osd.0.log.gz:2037417803- 7: (operator delete(void*)+0x1e4) [0x7f71a2ddadf4] ceph-osd.0.log.gz:2037417855- 8: (FileStore::queue_transactions(ObjectStore::Sequencer*, std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, std::tr1::shared_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x6fa) [0x86eb6a] ceph-osd.0.log.gz:2037418072- 9: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction*, Context*, Context*, Context*, std::tr1::shared_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x165) [0x665a65] ceph-osd.0.log.gz:2037418264- 10: (remove_dir(CephContext*, ObjectStore*, SnapMapper*, OSDriver*, ObjectStore::Sequencer*, coll_t, std::tr1::shared_ptr<DeletingState>, ThreadPool::TPHandle&)+0x479) [0x603c69] ceph-osd.0.log.gz:2037418444- 11: (OSD::RemoveWQ::_process(std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<DeletingState> >, ThreadPool::TPHandle&)+0x1ee) [0x609b0e] ceph-osd.0.log.gz:2037418590- 12: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<DeletingState> >, std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<DeletingState> > >::_void_process(void*, ThreadPool::TPHandle&)+0x10a) [0x6662ba] ceph-osd.0.log.gz:2037418835- 13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xae9) [0xa55ad9] ceph-osd.0.log.gz:2037418903- 14: (ThreadPool::WorkThread::entry()+0x10) [0xa569c0] ceph-osd.0.log.gz:2037418958- 15: (()+0x8182) [0x7f71a2b92182] ceph-osd.0.log.gz:2037418992- 16: (clone()+0x6d) [0x7f71a130547d] ceph-osd.0.log.gz:2037419029- NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by David Zafman over 7 years ago
This was seen in another rados run:
loic-2016-07-18_07:08:24-rados-master---basic-smithi/320873
It was the same 11429.yaml which and upgrading still running firefly. In the log the following message came out:
2016-07-18 18:24:08.177793 7f138d7ff700 0 filestore(/var/lib/ceph/osd/ceph-0) queue_transactions filestore_blackhole = TRUE, dropping transaction
int FileStore::queue_transactions(Sequencer *posr, list<Transaction*> &tls, TrackedOpRef osd_op, ThreadPool::TPHandle *handle) { Context *onreadable; Context *ondisk; Context *onreadable_sync; ObjectStore::Transaction::collect_contexts( tls, &onreadable, &ondisk, &onreadable_sync); if (g_conf->filestore_blackhole) { dout(0) << "queue_transactions filestore_blackhole = TRUE, dropping transaction" << dendl; delete ondisk; delete onreadable; delete onreadable_sync; return 0; }
Updated by Kefu Chai over 7 years ago
- Status changed from New to Need More Info
- Assignee changed from Kefu Chai to Yuri Weinstein
@Yuri, are we still supporting firefly? 0.80.* is retired, i think.
Updated by Nathan Cutler over 7 years ago
@Kefu Chai: This is rados/singleton-nomsgr/{rados.yaml all/11429.yaml}
which might be an important regression test to have around? It installs firefly 0.80.8 here:
https://github.com/ceph/ceph-qa-suite/blob/master/suites/rados/singleton-nomsgr/all/11429.yaml#L41
There's also 13324.yaml
which installs dumpling!
https://github.com/ceph/ceph-qa-suite/blob/master/suites/rados/singleton-nomsgr/all/13234.yaml#L40
Updated by Kefu Chai over 7 years ago
thanks for looking into the tests involving retired releases. yeah, they are upgraded tests
- 11429.yaml: from firefly to hammer, then to current branch
- 13234.yaml: from dumpling to firefly, then to hammer, and then to current branch.
but the funny thing is, the crash happened in a retired release. and the intent of #11429 was trying verify that hammer is able to handle the case where pg is referencing non-existent osdmap.
Updated by Kefu Chai over 7 years ago
i tried to to reproduce this issue manually. but seems osd.0 is not slow enough that it finished handling the pg-remove request before "ceph daemon osd.0 config set filestore_blackhole true" returns.
Updated by Kefu Chai over 7 years ago
- Status changed from Need More Info to New
- Assignee deleted (
Yuri Weinstein)
resigning from myself for now.
Updated by Samuel Just over 7 years ago
- Status changed from New to Won't Fix
If this is a bug in firefly which is happening infrequently, I think we're going to ignore it.
Updated by Yuri Weinstein over 7 years ago
Just to keep record, also seen in http://qa-proxy.ceph.com/teuthology/yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/orig.config.yaml
description: rados/singleton-nomsgr/{rados.yaml all/11429.yaml}
http://qa-proxy.ceph.com/teuthology/yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/teuthology.log teuthology@teuthology:~$ zgrep "^ ceph version" /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/*osd* -b20 -a30 6:12.986418 7fa023c33700 15 filestore(/var/lib/ceph/osd/ceph-0) omap_get_values meta/a468ec03/snapmapper/0//-1 /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375115770- -2> 2016-09-20 15:16:12.986432 7fa023c33700 15 filestore(/var/lib/ceph/osd/ceph-0) omap_get_values meta/a468ec03/snapmapper/0//-1 /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375115904- -1> 2016-09-20 15:16:12.986448 7fa023c33700 0 filestore(/var/lib/ceph/osd/ceph-0) queue_transactions filestore_blackhole = TRUE, dropping transaction /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375116059- 0> 2016-09-20 15:16:12.988211 7fa023c33700 -1 *** Caught signal (Aborted) ** /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375116141- in thread 7fa023c33700 /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375116165- /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375116166: ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375116230- 1: ceph-osd() [0x98207f] /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375116256- 2: (()+0x10330) [0x7fa03ab23330] /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375116290- 3: (gsignal()+0x37) [0x7fa0391c8c37] /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375116328- 4: (abort()+0x148) [0x7fa0391cc028] /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375116365- 5: (tcmalloc::Log(tcmalloc::LogMode, char const*, int, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem)+0x204) [0x7fa03ad50114] /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375116521- 6: (()+0x14184) [0x7fa03ad45184] /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375116555- 7: (operator delete(void*)+0x1e4) [0x7fa03ad63df4] /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375116607- 8: (FileStore::queue_transactions(ObjectStore::Sequencer*, std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, std::tr1::shared_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x6fa) [0x86eb6a] /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375116824- 9: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction*, Context*, Context*, Context*, std::tr1::shared_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x165) [0x665a65] /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375117016- 10: (remove_dir(CephContext*, ObjectStore*, SnapMapper*, OSDriver*, ObjectStore::Sequencer*, coll_t, std::tr1::shared_ptr<DeletingState>, ThreadPool::TPHandle&)+0x816) [0x604006] /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375117196- 11: (OSD::RemoveWQ::_process(std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<DeletingState> >, ThreadPool::TPHandle&)+0x1ee) [0x609b0e] /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375117342- 12: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<DeletingState> >, std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<DeletingState> > >::_void_process(void*, ThreadPool::TPHandle&)+0x10a) [0x6662ba] /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375117587- 13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xae9) [0xa55ad9] /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375117655- 14: (ThreadPool::WorkThread::entry()+0x10) [0xa569c0] /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375117710- 15: (()+0x8184) [0x7fa03ab1b184] /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375117744- 16: (clone()+0x6d) [0x7fa03928c37d] /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375117781- NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375117874- /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375117875---- logging levels --- /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375117898- 0/ 5 none /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375117911- 0/ 1 lockdep /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375117927- 0/ 1 context /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375117943- 1/ 1 crush /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375117957- 1/ 5 mds /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375117969- 1/ 5 mds_balancer /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375117990- 1/ 5 mds_locker /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375118009- 1/ 5 mds_log /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375118025- 1/ 5 mds_log_expire /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375118048- 1/ 5 mds_migrator /a//yuriw-2016-09-20_14:50:21-rados-wip-yuri-testing2_2016_09_19-distro-basic-smithi/425869/remote/smithi104/log/ceph-osd.0.log.gz:2375118069- 0/ 1 buffer