Project

General

Profile

Actions

Bug #11183

closed

osdc/Objecter.cc: 405: FAILED assert(tick_event == __null) (giant)

Added by Loïc Dachary about 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/giant-x
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.ceph.com/loic-2015-03-19_19:01:57-rados-giant-backports---basic-multi/811322

2015-03-19T17:53:34.458 INFO:tasks.workunit.client.0.plana55.stdout:[       OK ] LibRadosAioEC.StatRemovePP (9035 ms)
2015-03-19T17:53:34.459 INFO:tasks.workunit.client.0.plana55.stdout:[ RUN      ] LibRadosAioEC.OmapPP
2015-03-19T17:53:42.471 INFO:tasks.workunit.client.0.plana55.stderr:osdc/Objecter.cc: In function 'void Objecter::shutdown()' thread 7f2a22c0c840 time 2015-03-19 17:53:42.470223
2015-03-19T17:53:42.471 INFO:tasks.workunit.client.0.plana55.stderr:osdc/Objecter.cc: 405: FAILED assert(tick_event == __null)
2015-03-19T17:53:42.472 INFO:tasks.workunit.client.0.plana55.stderr: ceph version 0.87.1-72-gb6bebbb (b6bebbbc99540b76221aeccb3693784b414f607b)
2015-03-19T17:53:42.472 INFO:tasks.workunit.client.0.plana55.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f2a2059f83b]
2015-03-19T17:53:42.472 INFO:tasks.workunit.client.0.plana55.stderr: 2: (Objecter::shutdown()+0xae8) [0x7f2a2051d728]
2015-03-19T17:53:42.473 INFO:tasks.workunit.client.0.plana55.stderr: 3: (librados::RadosClient::shutdown()+0x1c2) [0x7f2a204e90c2]
2015-03-19T17:53:42.473 INFO:tasks.workunit.client.0.plana55.stderr: 4: (librados::Rados::shutdown()+0x28) [0x7f2a204c2cc8]
2015-03-19T17:53:42.473 INFO:tasks.workunit.client.0.plana55.stderr: 5: (destroy_one_pool_pp(std::string const&, librados::Rados&)+0x22) [0x4a5f42]
2015-03-19T17:53:42.473 INFO:tasks.workunit.client.0.plana55.stderr: 6: (LibRadosAioEC_OmapPP_Test::TestBody()+0x93a) [0x42cb7a]
2015-03-19T17:53:42.474 INFO:tasks.workunit.client.0.plana55.stderr: 7: (testing::Test::Run()+0x8a) [0x49c77a]
2015-03-19T17:53:42.474 INFO:tasks.workunit.client.0.plana55.stderr: 8: (testing::internal::TestInfoImpl::Run()+0xd8) [0x49c858]
2015-03-19T17:53:42.474 INFO:tasks.workunit.client.0.plana55.stderr: 9: (testing::TestCase::Run()+0x95) [0x49c8f5]
2015-03-19T17:53:42.475 INFO:tasks.workunit.client.0.plana55.stderr: 10: (testing::internal::UnitTestImpl::RunAllTests()+0x247) [0x49f9b7]
2015-03-19T17:53:42.475 INFO:tasks.workunit.client.0.plana55.stderr: 11: (main()+0x35) [0x42c045]
2015-03-19T17:53:42.475 INFO:tasks.workunit.client.0.plana55.stderr: 12: (__libc_start_main()+0xf5) [0x7f2a1f814ec5]
2015-03-19T17:53:42.475 INFO:tasks.workunit.client.0.plana55.stderr: 13: ceph_test_rados_api_aio() [0x42c0d7]
2015-03-19T17:53:42.476 INFO:tasks.workunit.client.0.plana55.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions #1

Updated by Samuel Just about 9 years ago

  • Priority changed from Normal to Urgent
Actions #2

Updated by Zhiqiang Wang about 9 years ago

I think the problem is caused by a locking issue in the following scenario:
The Objecte::timer erases the tick_event from its events and schedule queue, and then calls Objecter::tick(). At the same time, Objecter::shutdown() is running and holds the rwlock. So Objecter::tick() waits there to get the rwlock. Inside Objecter::shutdown(), it checks the tick_event, and tries to cancel it. However, since it's already removed from the events queue, cancel_event returns false. So tick_event is not set to NULL. This leads to the assert failure as showing above.

Actions #3

Updated by Zhiqiang Wang about 9 years ago

After digging more into this issue, looks like this is a giant specfic issue, and has been fixed in the current master by removing the following code in Objecter::tick().

  if (!initialized.read())
    return;

Need to find out which patch this is and do backport.

Actions #4

Updated by Kefu Chai about 9 years ago

the fix was introduced in 8253ead1748fc429bf48f5334ee4460ee865d941 .

Actions #6

Updated by Loïc Dachary about 9 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Loïc Dachary
  • Affected Versions v0.93 - Last Hammer Sprint added
Actions #7

Updated by Loïc Dachary about 9 years ago

This is a giant specific regression introduced by an incorrect conflict resolution when backporting d790833cb84d6f6349146e4f9abdcdffb4db2ee0

Actions #8

Updated by Loïc Dachary about 9 years ago

  • Status changed from Fix Under Review to In Progress
Actions #9

Updated by Loïc Dachary about 9 years ago

  • Status changed from In Progress to Resolved
Actions #10

Updated by Yuri Weinstein about 9 years ago

  • Status changed from Resolved to New
  • Source changed from other to Q/A
  • ceph-qa-suite upgrade/giant-x added

Run: http://pulpito.ceph.com/teuthology-2015-03-25_17:05:01-upgrade:giant-x-hammer-distro-basic-multi/
Job: ['821147']
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-03-25_17:05:01-upgrade:giant-x-hammer-distro-basic-multi/821147/

2015-03-25T21:08:35.410 INFO:tasks.rados.rados.1.plana16.stdout:2432: done (1 left)
2015-03-25T21:08:35.410 INFO:tasks.rados.rados.1.plana16.stdout:2433: read oid 15 snap -1
2015-03-25T21:08:35.410 INFO:tasks.rados.rados.1.plana16.stdout:2431:  expect (ObjNum 857 snap 298 seq_num 857)
2015-03-25T21:08:35.814 INFO:tasks.workunit.client.1.plana16.stderr:osdc/Objecter.cc: In function 'void Objecter::shutdown()' thread 7f39e843c840 time 2015-03-25 21:08:35.807226
2015-03-25T21:08:35.814 INFO:tasks.workunit.client.1.plana16.stderr:osdc/Objecter.cc: 405: FAILED assert(tick_event == __null)
2015-03-25T21:08:35.829 INFO:tasks.workunit.client.1.plana16.stderr: ceph version 0.87.1-99-g2ccbc14 (2ccbc14d17b54ea4fd4126cb04a7b83cd64c7f1e)
2015-03-25T21:08:35.829 INFO:tasks.workunit.client.1.plana16.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f39e5dcf83b]
2015-03-25T21:08:35.829 INFO:tasks.workunit.client.1.plana16.stderr: 2: (Objecter::shutdown()+0xae8) [0x7f39e5d4d728]
2015-03-25T21:08:35.830 INFO:tasks.workunit.client.1.plana16.stderr: 3: (librados::RadosClient::shutdown()+0x1c2) [0x7f39e5d190c2]
2015-03-25T21:08:35.830 INFO:tasks.workunit.client.1.plana16.stderr: 4: (librados::Rados::shutdown()+0x28) [0x7f39e5cf2cc8]
2015-03-25T21:08:35.830 INFO:tasks.workunit.client.1.plana16.stderr: 5: (destroy_one_ec_pool_pp(std::string const&, librados::Rados&)+0xdd) [0x46eb8d]
2015-03-25T21:08:35.831 INFO:tasks.workunit.client.1.plana16.stderr: 6: (RadosTestECPP::TearDownTestCase()+0x30) [0x46c8f0]
2015-03-25T21:08:35.831 INFO:tasks.workunit.client.1.plana16.stderr: 7: (testing::TestCase::Run()+0xc5) [0x45b165]
2015-03-25T21:08:35.831 INFO:tasks.workunit.client.1.plana16.stderr: 8: (testing::internal::UnitTestImpl::RunAllTests()+0x247) [0x45e1f7]
2015-03-25T21:08:35.831 INFO:tasks.workunit.client.1.plana16.stderr: 9: (main()+0x35) [0x4250d5]
2015-03-25T21:08:35.832 INFO:tasks.workunit.client.1.plana16.stderr: 10: (__libc_start_main()+0xf5) [0x7f39e5044ec5]
2015-03-25T21:08:35.832 INFO:tasks.workunit.client.1.plana16.stderr: 11: ceph_test_rados_api_io() [0x4252d7]
2015-03-25T21:08:35.832 INFO:tasks.workunit.client.1.plana16.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #11

Updated by Yuri Weinstein about 9 years ago

  • Subject changed from osdc/Objecter.cc: 405: FAILED assert(tick_event == __null) (giant) to osdc/Objecter.cc: 405: FAILED assert(tick_event == __null) (giant,hammer)
Actions #12

Updated by Yuri Weinstein about 9 years ago

  • Subject changed from osdc/Objecter.cc: 405: FAILED assert(tick_event == __null) (giant,hammer) to osdc/Objecter.cc: 405: FAILED assert(tick_event == __null) (giant)
  • ceph-qa-suite rados added
  • ceph-qa-suite deleted (upgrade/giant-x)
Actions #13

Updated by Loïc Dachary about 9 years ago

  • ceph-qa-suite upgrade/giant-x added
  • ceph-qa-suite deleted (rados)
<loicd> yuriw: it was resolved after the run started if I'm not mistaken https://github.com/ceph/ceph/pull/4175 
<loicd> and http://tracker.ceph.com/issues/11183#note-9
<loicd> yuriw: about 13 hours ago today
<loicd> it would be embarrassing if it shows again tomorrow...
<yuriw> loicd - OK we will see
Actions #14

Updated by Yuri Weinstein about 9 years ago

Loic - the suite passed, see #11189

Actions #15

Updated by Loïc Dachary about 9 years ago

  • Status changed from New to Resolved

Thanks for the update, such a relief :-)

Actions

Also available in: Atom PDF