Project

General

Profile

Bug #48712

ceph_assert(is_primary()) in PG::scrub()

Added by Kefu Chai over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Category:
Scrub/Repair
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
12/27/2020
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-12-23T12:18:56.924+0000 7f331cda1700 -1 /build/ceph-16.0.0-8560-g2ec13aff/src/osd/PG.cc: In function 'void PG::scrub(epoch_t, ThreadPool::TPHandle&)' thread 7f331cda1700 time 2020-12-23T12:18:56.925527+0000
/build/ceph-16.0.0-8560-g2ec13aff/src/osd/PG.cc: 2071: FAILED ceph_assert(is_primary())

 ceph version 16.0.0-8560-g2ec13aff (2ec13aff05dfff9dc716b3dd3fa03bbff2cb80a7) pacific (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) [0x55da93d9a19b]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x55da93d9a376]
 3: (PG::scrub(unsigned int, ThreadPool::TPHandle&)+0x195) [0x55da93ed3ff5]
 4: (ceph::osd::scheduler::PGScrub::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x1a) [0x55da9408186a]
 5: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xcd5) [0x55da93e3f5f5]
 6: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0x55da944b0f3c]
 7: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55da944b41f0]
 8: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f334a7e76db]
 9: clone()

/a/kchai-2020-12-23_05:37:18-rados-wip-kefu-testing-2020-12-23-1139-distro-basic-smithi/5732801

see remote/*/log/ceph-osd.7*


Related issues

Duplicated by RADOS - Bug #48775: FAILED ceph_assert(is_primary()) in PG::scrub() Duplicate
Copied to RADOS - Backport #49691: pacific: ceph_assert(is_primary()) in PG::scrub() Rejected

History

#1 Updated by Kefu Chai over 3 years ago

Hi Ronen, do you mind taking a look?

#2 Updated by Ronen Friedman over 3 years ago

  • Category set to Scrub/Repair
  • Status changed from New to In Progress
  • Reviewed set to 12/27/2020

Caused when a PGScrub message is queued by a primary, but only de-queued when after an interval change.

(Specifically, in this run:
the scrub is initiated at 12:18:56.820

2020-12-23T12:18:56.820+0000 7f333dadf700 20 osd.7 op_wq(0) _enqueue OpSchedulerItem(1.1f0 PGScrub(pgid=1.1f0epoch_queued=53) prio 5 cost 52428800 e53)

but osd.7 is no longer the primary when received:

2020-12-23T12:18:56.828+0000 7f331cda1700 20 osd.7 pg_epoch: 54 pg[1.1f0( empty local-lis/les=50/51 n=0 ec=50/16 lis/c=50/50 les/c/f=51/51/0 sis=50) [7,0] r=0 lpr=50 crt=0'0 mlcod 0'0 active+clean+scrubbing [ 1.1f0: ] ] new interval newup [0,5] newacting [0,5]

#3 Updated by Ronen Friedman about 3 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 38730

#4 Updated by Neha Ojha about 3 years ago

  • Duplicated by Bug #48775: FAILED ceph_assert(is_primary()) in PG::scrub() added

#5 Updated by David Zafman about 3 years ago

  • Status changed from Fix Under Review to Resolved

#6 Updated by David Zafman about 3 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to pacific

#7 Updated by Backport Bot about 3 years ago

  • Copied to Backport #49691: pacific: ceph_assert(is_primary()) in PG::scrub() added

#8 Updated by David Zafman about 3 years ago

  • Status changed from Pending Backport to Resolved
  • Backport deleted (pacific)

Also available in: Atom PDF