Project

General

Profile

Actions

Bug #16306

closed

"osd/ReplicatedPG.cc: 10514: FAILED assert(obc)" in upgrade:hammer-x-infernalis-distro-basic-openstack

Added by Yuri Weinstein almost 8 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/hammer-x
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Run: http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2016-06-14_17:20:02-upgrade:hammer-x-infernalis-distro-basic-openstack/
Job: 24044
Logs: http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2016-06-14_17:20:02-upgrade:hammer-x-infernalis-distro-basic-openstack/24044/teuthology.log

2016-06-14T19:42:06.387 INFO:tasks.workunit.client.1.target089160.stdout: got ls 1465933317,1465933319,1465933320,1465933322,1465933323,1465933325,1465933326,0
2016-06-14T19:42:07.399 INFO:tasks.workunit.client.1.target089160.stdout: got ls 1465933317,1465933319,1465933320,1465933322,1465933323,1465933325,1465933326,0
2016-06-14T19:42:08.404 INFO:tasks.ceph.osd.5.target089162.stderr:osd/ReplicatedPG.cc: In function 'void ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)' thread 7fbd69236700 time 2016-06-14 19:42:08.424829
2016-06-14T19:42:08.405 INFO:tasks.ceph.osd.5.target089162.stderr:osd/ReplicatedPG.cc: 10514: FAILED assert(obc)
2016-06-14T19:42:08.408 INFO:tasks.ceph.osd.5.target089162.stderr: ceph version 0.94.7-24-g054a90e (054a90edb2812f78426b8cb1dac2e768b2e7fc51)
2016-06-14T19:42:08.409 INFO:tasks.ceph.osd.5.target089162.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbb214b]
2016-06-14T19:42:08.409 INFO:tasks.ceph.osd.5.target089162.stderr: 2: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)+0x73f) [0x86c41f]
2016-06-14T19:42:08.409 INFO:tasks.ceph.osd.5.target089162.stderr: 3: (ReplicatedPG::hit_set_persist()+0xee7) [0x86d427]
2016-06-14T19:42:08.410 INFO:tasks.ceph.osd.5.target089162.stderr: 4: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0xe2e) [0x88eabe]
2016-06-14T19:42:08.410 INFO:tasks.ceph.osd.5.target089162.stderr: 5: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x66a) [0x82b67a]
2016-06-14T19:42:08.410 INFO:tasks.ceph.osd.5.target089162.stderr: 6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3d5) [0x697ea5]
2016-06-14T19:42:08.410 INFO:tasks.ceph.osd.5.target089162.stderr: 7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x338) [0x6983f8]
2016-06-14T19:42:08.411 INFO:tasks.ceph.osd.5.target089162.stderr: 8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x85f) [0xba1c9f]
2016-06-14T19:42:08.411 INFO:tasks.ceph.osd.5.target089162.stderr: 9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xba3dc0]
2016-06-14T19:42:08.411 INFO:tasks.ceph.osd.5.target089162.stderr: 10: (()+0x8184) [0x7fbd861ab184]
2016-06-14T19:42:08.411 INFO:tasks.ceph.osd.5.target089162.stderr: 11: (clone()+0x6d) [0x7fbd8471637d]
2016-06-14T19:42:08.412 INFO:tasks.ceph.osd.5.target089162.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #1

Updated by Yuri Weinstein almost 8 years ago

Run: http://pulpito.ceph.com/teuthology-2016-06-16_18:05:02-upgrade:hammer-x-infernalis-distro-basic-vps/
Job: 264591
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2016-06-16_18:05:02-upgrade:hammer-x-infernalis-distro-basic-vps/264591/teuthology.log

2016-06-16T19:52:20.358 INFO:tasks.workunit.client.1.vpm157.stdout: got ls 1466131931,1466131933,1466131934,1466131936,1466131937,1466131939,1466131940,0
2016-06-16T19:52:21.364 INFO:tasks.workunit.client.1.vpm157.stdout: got ls 1466131931,1466131933,1466131934,1466131936,1466131937,1466131939,1466131940,0
2016-06-16T19:52:22.381 INFO:tasks.ceph.osd.5.vpm105.stderr:osd/ReplicatedPG.cc: In function 'void ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)' thread 7fcabd621700 time 2016-06-17 02:52:22.365989
2016-06-16T19:52:22.381 INFO:tasks.ceph.osd.5.vpm105.stderr:osd/ReplicatedPG.cc: 10514: FAILED assert(obc)
2016-06-16T19:52:22.389 INFO:tasks.ceph.osd.5.vpm105.stderr: ceph version 0.94.7-26-g2e156d7 (2e156d7ad4b9f4ffd6028df3a460b50b30c8b0d3)
2016-06-16T19:52:22.390 INFO:tasks.ceph.osd.5.vpm105.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbb214b]
2016-06-16T19:52:22.390 INFO:tasks.ceph.osd.5.vpm105.stderr: 2: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)+0x73f) [0x86c41f]
2016-06-16T19:52:22.390 INFO:tasks.ceph.osd.5.vpm105.stderr: 3: (ReplicatedPG::hit_set_persist()+0xee7) [0x86d427]
2016-06-16T19:52:22.391 INFO:tasks.ceph.osd.5.vpm105.stderr: 4: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0xe2e) [0x88eabe]
2016-06-16T19:52:22.391 INFO:tasks.ceph.osd.5.vpm105.stderr: 5: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x66a) [0x82b67a]
2016-06-16T19:52:22.391 INFO:tasks.ceph.osd.5.vpm105.stderr: 6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3d5) [0x697ea5]
2016-06-16T19:52:22.391 INFO:tasks.ceph.osd.5.vpm105.stderr: 7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x338) [0x6983f8]
2016-06-16T19:52:22.392 INFO:tasks.ceph.osd.5.vpm105.stderr: 8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x85f) [0xba1c9f]
2016-06-16T19:52:22.392 INFO:tasks.ceph.osd.5.vpm105.stderr: 9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xba3dc0]
2016-06-16T19:52:22.392 INFO:tasks.ceph.osd.5.vpm105.stderr: 10: (()+0x8184) [0x7fcada4c7184]
2016-06-16T19:52:22.392 INFO:tasks.ceph.osd.5.vpm105.stderr: 11: (clone()+0x6d) [0x7fcad8a3237d]
2016-06-16T19:52:22.393 INFO:tasks.ceph.osd.5.vpm105.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #3

Updated by Yuri Weinstein almost 8 years ago

  • Priority changed from Normal to Urgent

looks reproducible
Run: http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2016-06-28_17:20:02-upgrade:hammer-x-infernalis-distro-basic-openstack/
Job: 29803
Logs: http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2016-06-28_17:20:02-upgrade:hammer-x-infernalis-distro-basic-openstack/29803/teuthology.log

2016-06-28T18:53:02.267 INFO:tasks.ceph.osd.4.target093166.stderr:osd/ReplicatedPG.cc: 10514: FAILED assert(obc)
2016-06-28T18:53:02.267 INFO:tasks.ceph.osd.4.target093166.stderr: ceph version 0.94.7-28-gdac65d0 (dac65d048919f701877de96d3271131853e532ed)
2016-06-28T18:53:02.267 INFO:tasks.ceph.osd.4.target093166.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbb214b]
2016-06-28T18:53:02.268 INFO:tasks.ceph.osd.4.target093166.stderr: 2: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)+0x73f) [0x86c41f]
2016-06-28T18:53:02.268 INFO:tasks.ceph.osd.4.target093166.stderr: 3: (ReplicatedPG::hit_set_persist()+0xee7) [0x86d427]
2016-06-28T18:53:02.268 INFO:tasks.ceph.osd.4.target093166.stderr: 4: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0xe2e) [0x88eabe]
2016-06-28T18:53:02.268 INFO:tasks.ceph.osd.4.target093166.stderr: 5: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x66a) [0x82b67a]
2016-06-28T18:53:02.269 INFO:tasks.ceph.osd.4.target093166.stderr: 6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3d5) [0x697ea5]
2016-06-28T18:53:02.269 INFO:tasks.ceph.osd.4.target093166.stderr: 7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x338) [0x6983f8]
2016-06-28T18:53:02.269 INFO:tasks.ceph.osd.4.target093166.stderr: 8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x85f) [0xba1c9f]
2016-06-28T18:53:02.269 INFO:tasks.ceph.osd.4.target093166.stderr: 9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xba3dc0]
2016-06-28T18:53:02.270 INFO:tasks.ceph.osd.4.target093166.stderr: 10: (()+0x8184) [0x7f248bbf6184]
2016-06-28T18:53:02.270 INFO:tasks.ceph.osd.4.target093166.stderr: 11: (clone()+0x6d) [0x7f248a16137d]
2016-06-28T18:53:02.270 INFO:tasks.ceph.osd.4.target093166.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #4

Updated by Samuel Just almost 8 years ago

  • Assignee set to Kefu Chai
Actions #5

Updated by Kefu Chai almost 8 years ago

teuthology-suite --suite 'upgrade' --suite-branch infernalis --ceph infernalis --machine-type smithi --filter="hammer-x/point-to-point-x/{0-tz-eastern.yaml point-to-point.yaml ubuntu_14.04.yaml}" 

039240418060c9a49298dacc0478772334526dce introduced the fix for naming hitset obj using GMT to hammer. it was cherry-picked from 42f8c5daad16aa849a0b99871d50161673c0c370 in master.

$ git tag --contains 039240418060c9a49298dacc0478772334526dce^C
v0.94.7

0.94.7 has this fix. for the backward compatibility, a boolean "use_gmt_hitset" is added as a property of the pg pool in 0.94.7. pools created by cluster pre-0.94.7 do not have this property, and after they are upgraded, the property of these pools are "false", as we won't assume that user was using GMT for the localtime.

but the upgraded OSD can not survive if a cluster with legacy pools but has its peer OSD in different timezone. because the use_gmt_hitset is still false. i think this behaviour is expected. might want to disable this test for this case.

will be on it.

Actions #7

Updated by Kefu Chai almost 8 years ago

  • Status changed from New to Fix Under Review
Actions #8

Updated by Kefu Chai almost 8 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF