Project

General

Profile

Bug #38344

src/osd/PG.h: 1245: FAILED ceph_assert(prev >= 0)

Added by David Zafman 6 months ago. Updated 6 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
02/15/2019
Due date:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:

Description

http://pulpito.ceph.com/dzafman-2019-02-14_22:11:34-rados:thrash-master-distro-basic-smithi/3593928

2019-02-15 07:21:18.007 7f1611b42700 20 osd.1 424 share_map osd.6 v1:172.21.15.89:6805/11455 424
2019-02-15 07:21:18.007 7f1611b42700 20 osd.1 424 should_share_map osd.6 v1:172.21.15.89:6805/11455 424
2019-02-15 07:21:18.007 7f1611b42700 10 osd.1 pg_epoch: 424 pg[4.13( v 424'13715 (421'13711,424'13715] lb MIN (bitwise) local-lis/les=423/424 n=0 ec=309/277 lis/c 423/415 les/c/f 424/416/0 422/423/415) [6,4,1]/[6,4,5] r=-1 lpr=423 pi=[309,423)/2 luod=0'0 lua=421'13712 crt=424'13715 lcod 424'13714 active+remapped mbc={}] _handle_message: 0x557c06385c80
2019-02-15 07:21:18.007 7f1611b42700 10 osd.1 pg_epoch: 424 pg[4.13( v 424'13715 (421'13711,424'13715] lb MIN (bitwise) local-lis/les=423/424 n=0 ec=309/277 lis/c 423/415 les/c/f 424/416/0 422/423/415) [6,4,1]/[6,4,5] r=-1 lpr=423 pi=[309,423)/2 luod=0'0 lua=421'13712 crt=424'13715 lcod 424'13714 active+remapped mbc={}] handle_push ObjectRecoveryInfo(4:c829da80:::benchmark_data_smithi002_34250_object155360:head@328'8830, size: 65536, copy_subset: [0~65536], clone_subset: {}, snapset: 0=[]:{})ObjectRecoveryProgress(!first, data_recovered_to:65536, data_complete:true, omap_recovered_to:, omap_complete:true, error:false)
2019-02-15 07:21:18.007 7f1611b42700 10 bluestore(/var/lib/ceph/osd/ceph-1) stat 4.13_head #4:c829da80:::benchmark_data_smithi002_34250_object155360:head#
2019-02-15 07:21:18.059 7f1611b42700 -1 /build/ceph-14.0.1-3734-gb1b7174/src/osd/PG.h: In function 'void PG::add_local_num_bytes(int64_t)' thread 7f1611b42700 time 2019-02-15 07:21:18.011974
/build/ceph-14.0.1-3734-gb1b7174/src/osd/PG.h: 1245: FAILED ceph_assert(prev >= 0)

 ceph version 14.0.1-3734-gb1b7174 (b1b7174cd698cf5e04319837183cdeb537127844) nautilus (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x557bd506ed60]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x557bd506ef3b]
 3: (boost::statechart::event<PG::Backfilled, std::allocator<void> >::clone() const+0) [0x557bd537b340]
 4: (ReplicatedBackend::submit_push_data(ObjectRecoveryInfo const&, bool, bool, bool, interval_set<unsigned long, std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > > > const&, ceph::buffer::list, ceph::buffer::list, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::list> > > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::list> > > const&, ObjectStore::Transaction*)+0x5d7) [0x557bd54b7fe7]
 5: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, PushReplyOp*, ObjectStore::Transaction*)+0x193) [0x557bd54b8c93]
 6: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x153) [0x557bd54b9033]
 7: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2a8) [0x557bd54c0668]
 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x97) [0x557bd53c5c37]
 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x705) [0x557bd5374d25]
 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1b3) [0x557bd51a08a3]
 11: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x557bd544b032]
 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xbf5) [0x557bd51be585]
 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x557bd57d9e2c]
 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x557bd57dcf00]
 15: (()+0x76db) [0x7f163928e6db]
 16: (clone()+0x3f) [0x7f163802988f]

History

#1 Updated by David Zafman 6 months ago

Just prior to the crash pg 4.13 ended up with a local_num_bytes of -42624KiB.

2019-02-15 07:21:17.899 7f1615b4a700 10 osd.1 pg_epoch: 424 pg[4.13( v 421'13712 (421'13711,421'13712] lb MIN (bitwise) local-lis/les=423/424 n=0 ec=309/277 lis/c 423/415 les/c/f 424/416/0 422/423/415) [6,4,1]/[6,4,5] r=-1 lpr=423 pi=[309,423)/2 luod=0'0 crt=421'13712 lcod 392'11600 active+remapped mbc={}] state<Started/ReplicaActive/RepNotRecovering>: react primary_num_bytes 4096KiB local -42624KiB pending_adjustments 46720KiB

#2 Updated by Neha Ojha 6 months ago

I have seen a similar failure here: /a/nojha-2019-02-13_14:45:41-rados:thrash-erasure-code-wip-test-revert-2-distro-basic-smithi/3584395/

2019-02-13T16:56:22.708 INFO:tasks.ceph.osd.4.smithi163.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.1-3543-g4c06aeb/rpm/el7/BUILD/ceph-14.0.1-3543-g4c06aeb/src/osd/PG.h: In function 'void PG::add_local_num_bytes(int64_t)' thread 7f9fb6ac9700 time 2019-02-13 16:56:22.720055
2019-02-13T16:56:22.708 INFO:tasks.ceph.osd.4.smithi163.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.1-3543-g4c06aeb/rpm/el7/BUILD/ceph-14.0.1-3543-g4c06aeb/src/osd/PG.h: 1245: FAILED ceph_assert(prev >= 0)
2019-02-13T16:56:22.708 INFO:tasks.ceph.osd.0.smithi061.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.1-3543-g4c06aeb/rpm/el7/BUILD/ceph-14.0.1-3543-g4c06aeb/src/osd/PG.h: In function 'void PG::add_local_num_bytes(int64_t)' thread 7f173f6a7700 time 2019-02-13 16:56:22.716059
2019-02-13T16:56:22.708 INFO:tasks.ceph.osd.0.smithi061.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.1-3543-g4c06aeb/rpm/el7/BUILD/ceph-14.0.1-3543-g4c06aeb/src/osd/PG.h: 1245: FAILED ceph_assert(prev >= 0)
2019-02-13T16:56:22.714 INFO:tasks.ceph.osd.0.smithi061.stderr: ceph version 14.0.1-3543-g4c06aeb (4c06aeb3ee72da65c2ce5f941660a77d659ff5af) nautilus (dev)
2019-02-13T16:56:22.714 INFO:tasks.ceph.osd.0.smithi061.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x557b38faa3e6]
2019-02-13T16:56:22.714 INFO:tasks.ceph.osd.0.smithi061.stderr: 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x557b38faa5b4]
2019-02-13T16:56:22.714 INFO:tasks.ceph.osd.0.smithi061.stderr: 3: (ReplicatedBackend::get_is_readable_predicate() const+0) [0x557b392b0090]
2019-02-13T16:56:22.714 INFO:tasks.ceph.osd.0.smithi061.stderr: 4: (ECBackend::handle_recovery_push(PushOp const&, RecoveryMessages*)+0xa18) [0x557b39400a58]
2019-02-13T16:56:22.714 INFO:tasks.ceph.osd.0.smithi061.stderr: 5: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x116) [0x557b3941a936]
2019-02-13T16:56:22.714 INFO:tasks.ceph.osd.0.smithi061.stderr: 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x87) [0x557b392f7727]
2019-02-13T16:56:22.715 INFO:tasks.ceph.osd.0.smithi061.stderr: 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x695) [0x557b392a4a25]
2019-02-13T16:56:22.715 INFO:tasks.ceph.osd.0.smithi061.stderr: 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1a9) [0x557b390e5eb9]
2019-02-13T16:56:22.715 INFO:tasks.ceph.osd.0.smithi061.stderr: 9: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x557b3937adb2]
2019-02-13T16:56:22.715 INFO:tasks.ceph.osd.0.smithi061.stderr: 10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xa0c) [0x557b39102d4c]
2019-02-13T16:56:22.715 INFO:tasks.ceph.osd.0.smithi061.stderr: 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) [0x557b396f4173]
2019-02-13T16:56:22.715 INFO:tasks.ceph.osd.0.smithi061.stderr: 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x557b396f7210]
2019-02-13T16:56:22.715 INFO:tasks.ceph.osd.0.smithi061.stderr: 13: (()+0x7e25) [0x7f1764ed6e25]
2019-02-13T16:56:22.716 INFO:tasks.ceph.osd.0.smithi061.stderr: 14: (clone()+0x6d) [0x7f1763d9fbad]
2019-02-13T16:56:22.716 INFO:tasks.ceph.osd.0.smithi061.stderr:*** Caught signal (Aborted) **

#3 Updated by David Zafman 6 months ago

  • Pull request ID set to 26465

#4 Updated by David Zafman 6 months ago

  • Status changed from Verified to Need Review

#5 Updated by David Zafman 6 months ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF