Bug #11183
closedosdc/Objecter.cc: 405: FAILED assert(tick_event == __null) (giant)
0%
Description
http://pulpito.ceph.com/loic-2015-03-19_19:01:57-rados-giant-backports---basic-multi/811322
2015-03-19T17:53:34.458 INFO:tasks.workunit.client.0.plana55.stdout:[ OK ] LibRadosAioEC.StatRemovePP (9035 ms) 2015-03-19T17:53:34.459 INFO:tasks.workunit.client.0.plana55.stdout:[ RUN ] LibRadosAioEC.OmapPP 2015-03-19T17:53:42.471 INFO:tasks.workunit.client.0.plana55.stderr:osdc/Objecter.cc: In function 'void Objecter::shutdown()' thread 7f2a22c0c840 time 2015-03-19 17:53:42.470223 2015-03-19T17:53:42.471 INFO:tasks.workunit.client.0.plana55.stderr:osdc/Objecter.cc: 405: FAILED assert(tick_event == __null) 2015-03-19T17:53:42.472 INFO:tasks.workunit.client.0.plana55.stderr: ceph version 0.87.1-72-gb6bebbb (b6bebbbc99540b76221aeccb3693784b414f607b) 2015-03-19T17:53:42.472 INFO:tasks.workunit.client.0.plana55.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f2a2059f83b] 2015-03-19T17:53:42.472 INFO:tasks.workunit.client.0.plana55.stderr: 2: (Objecter::shutdown()+0xae8) [0x7f2a2051d728] 2015-03-19T17:53:42.473 INFO:tasks.workunit.client.0.plana55.stderr: 3: (librados::RadosClient::shutdown()+0x1c2) [0x7f2a204e90c2] 2015-03-19T17:53:42.473 INFO:tasks.workunit.client.0.plana55.stderr: 4: (librados::Rados::shutdown()+0x28) [0x7f2a204c2cc8] 2015-03-19T17:53:42.473 INFO:tasks.workunit.client.0.plana55.stderr: 5: (destroy_one_pool_pp(std::string const&, librados::Rados&)+0x22) [0x4a5f42] 2015-03-19T17:53:42.473 INFO:tasks.workunit.client.0.plana55.stderr: 6: (LibRadosAioEC_OmapPP_Test::TestBody()+0x93a) [0x42cb7a] 2015-03-19T17:53:42.474 INFO:tasks.workunit.client.0.plana55.stderr: 7: (testing::Test::Run()+0x8a) [0x49c77a] 2015-03-19T17:53:42.474 INFO:tasks.workunit.client.0.plana55.stderr: 8: (testing::internal::TestInfoImpl::Run()+0xd8) [0x49c858] 2015-03-19T17:53:42.474 INFO:tasks.workunit.client.0.plana55.stderr: 9: (testing::TestCase::Run()+0x95) [0x49c8f5] 2015-03-19T17:53:42.475 INFO:tasks.workunit.client.0.plana55.stderr: 10: (testing::internal::UnitTestImpl::RunAllTests()+0x247) [0x49f9b7] 2015-03-19T17:53:42.475 INFO:tasks.workunit.client.0.plana55.stderr: 11: (main()+0x35) [0x42c045] 2015-03-19T17:53:42.475 INFO:tasks.workunit.client.0.plana55.stderr: 12: (__libc_start_main()+0xf5) [0x7f2a1f814ec5] 2015-03-19T17:53:42.475 INFO:tasks.workunit.client.0.plana55.stderr: 13: ceph_test_rados_api_aio() [0x42c0d7] 2015-03-19T17:53:42.476 INFO:tasks.workunit.client.0.plana55.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Zhiqiang Wang about 9 years ago
I think the problem is caused by a locking issue in the following scenario:
The Objecte::timer erases the tick_event from its events and schedule queue, and then calls Objecter::tick(). At the same time, Objecter::shutdown() is running and holds the rwlock. So Objecter::tick() waits there to get the rwlock. Inside Objecter::shutdown(), it checks the tick_event, and tries to cancel it. However, since it's already removed from the events queue, cancel_event returns false. So tick_event is not set to NULL. This leads to the assert failure as showing above.
Updated by Zhiqiang Wang about 9 years ago
After digging more into this issue, looks like this is a giant specfic issue, and has been fixed in the current master by removing the following code in Objecter::tick().
if (!initialized.read()) return;
Need to find out which patch this is and do backport.
Updated by Kefu Chai about 9 years ago
the fix was introduced in 8253ead1748fc429bf48f5334ee4460ee865d941 .
Updated by Loïc Dachary about 9 years ago
Updated by Loïc Dachary about 9 years ago
- Status changed from New to Fix Under Review
- Assignee set to Loïc Dachary
- Affected Versions v0.93 - Last Hammer Sprint added
Updated by Loïc Dachary about 9 years ago
This is a giant specific regression introduced by an incorrect conflict resolution when backporting d790833cb84d6f6349146e4f9abdcdffb4db2ee0
Updated by Loïc Dachary about 9 years ago
- Status changed from Fix Under Review to In Progress
Updated by Loïc Dachary about 9 years ago
- Status changed from In Progress to Resolved
Updated by Yuri Weinstein about 9 years ago
- Status changed from Resolved to New
- Source changed from other to Q/A
- ceph-qa-suite upgrade/giant-x added
Run: http://pulpito.ceph.com/teuthology-2015-03-25_17:05:01-upgrade:giant-x-hammer-distro-basic-multi/
Job: ['821147']
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-03-25_17:05:01-upgrade:giant-x-hammer-distro-basic-multi/821147/
2015-03-25T21:08:35.410 INFO:tasks.rados.rados.1.plana16.stdout:2432: done (1 left) 2015-03-25T21:08:35.410 INFO:tasks.rados.rados.1.plana16.stdout:2433: read oid 15 snap -1 2015-03-25T21:08:35.410 INFO:tasks.rados.rados.1.plana16.stdout:2431: expect (ObjNum 857 snap 298 seq_num 857) 2015-03-25T21:08:35.814 INFO:tasks.workunit.client.1.plana16.stderr:osdc/Objecter.cc: In function 'void Objecter::shutdown()' thread 7f39e843c840 time 2015-03-25 21:08:35.807226 2015-03-25T21:08:35.814 INFO:tasks.workunit.client.1.plana16.stderr:osdc/Objecter.cc: 405: FAILED assert(tick_event == __null) 2015-03-25T21:08:35.829 INFO:tasks.workunit.client.1.plana16.stderr: ceph version 0.87.1-99-g2ccbc14 (2ccbc14d17b54ea4fd4126cb04a7b83cd64c7f1e) 2015-03-25T21:08:35.829 INFO:tasks.workunit.client.1.plana16.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f39e5dcf83b] 2015-03-25T21:08:35.829 INFO:tasks.workunit.client.1.plana16.stderr: 2: (Objecter::shutdown()+0xae8) [0x7f39e5d4d728] 2015-03-25T21:08:35.830 INFO:tasks.workunit.client.1.plana16.stderr: 3: (librados::RadosClient::shutdown()+0x1c2) [0x7f39e5d190c2] 2015-03-25T21:08:35.830 INFO:tasks.workunit.client.1.plana16.stderr: 4: (librados::Rados::shutdown()+0x28) [0x7f39e5cf2cc8] 2015-03-25T21:08:35.830 INFO:tasks.workunit.client.1.plana16.stderr: 5: (destroy_one_ec_pool_pp(std::string const&, librados::Rados&)+0xdd) [0x46eb8d] 2015-03-25T21:08:35.831 INFO:tasks.workunit.client.1.plana16.stderr: 6: (RadosTestECPP::TearDownTestCase()+0x30) [0x46c8f0] 2015-03-25T21:08:35.831 INFO:tasks.workunit.client.1.plana16.stderr: 7: (testing::TestCase::Run()+0xc5) [0x45b165] 2015-03-25T21:08:35.831 INFO:tasks.workunit.client.1.plana16.stderr: 8: (testing::internal::UnitTestImpl::RunAllTests()+0x247) [0x45e1f7] 2015-03-25T21:08:35.831 INFO:tasks.workunit.client.1.plana16.stderr: 9: (main()+0x35) [0x4250d5] 2015-03-25T21:08:35.832 INFO:tasks.workunit.client.1.plana16.stderr: 10: (__libc_start_main()+0xf5) [0x7f39e5044ec5] 2015-03-25T21:08:35.832 INFO:tasks.workunit.client.1.plana16.stderr: 11: ceph_test_rados_api_io() [0x4252d7] 2015-03-25T21:08:35.832 INFO:tasks.workunit.client.1.plana16.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Yuri Weinstein about 9 years ago
- Subject changed from osdc/Objecter.cc: 405: FAILED assert(tick_event == __null) (giant) to osdc/Objecter.cc: 405: FAILED assert(tick_event == __null) (giant,hammer)
Updated by Yuri Weinstein about 9 years ago
- Subject changed from osdc/Objecter.cc: 405: FAILED assert(tick_event == __null) (giant,hammer) to osdc/Objecter.cc: 405: FAILED assert(tick_event == __null) (giant)
- ceph-qa-suite rados added
- ceph-qa-suite deleted (
upgrade/giant-x)
Updated by Loïc Dachary about 9 years ago
- ceph-qa-suite upgrade/giant-x added
- ceph-qa-suite deleted (
rados)
<loicd> yuriw: it was resolved after the run started if I'm not mistaken https://github.com/ceph/ceph/pull/4175 <loicd> and http://tracker.ceph.com/issues/11183#note-9 <loicd> yuriw: about 13 hours ago today <loicd> it would be embarrassing if it shows again tomorrow... <yuriw> loicd - OK we will see
Updated by Yuri Weinstein about 9 years ago
Loic - the suite passed, see #11189
Updated by Loïc Dachary about 9 years ago
- Status changed from New to Resolved
Thanks for the update, such a relief :-)