Project

General

Profile

Actions

Bug #39581

closed

osd/PG.cc: 2523: FAILED ceph_assert(scrub_queued)

Added by David Zafman almost 5 years ago. Updated almost 5 years ago.

Status:
Duplicate
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

dzafman-2019-05-02_19:43:04-rados:thrash-wip-zafman-testing-distro-basic-smithi/3919741

This appears to be PG 2.4 which is in peering state dequeuing a scrub and hitting an assert.

2019-05-03 05:04:03.258 7fcedc056700 10 osd.1 pg_epoch: 194 pg[2.4( v 176'374 lc 0'0 (144'329,176'374] local-lis/les=153/154 n=34 ec=19/19 lis/c 153/153 les/c/f 154/154/0 19/194/19) [1,0] r=0 lpr=194 pi=[153,194)/1 crt=176'374 lcod 176'373 mlcod 0'0 peering mbc={}] can_handle_while_inactive: 0x55868b3691e0
2019-05-03 05:04:03.258 7fcedc056700 10 osd.1 194 dequeue_op 0x55868b3691e0 finish
2019-05-03 05:04:03.258 7fcedc056700 20 osd.1 op_wq(4) _process 2.4 to_process <> waiting <> waiting_peering {}
2019-05-03 05:04:03.258 7fcedc056700 20 osd.1 op_wq(4) _process OpQueueItem(2.4 PGScrub(pgid=2.4epoch_queued=194) prio 5 cost 52428800 e194) queued
2019-05-03 05:04:03.258 7fcedc056700 20 osd.1 op_wq(4) _process 2.4 to_process <OpQueueItem(2.4 PGScrub(pgid=2.4epoch_queued=194) prio 5 cost 52428800 e194)> waiting <> waiting_peering {}
2019-05-03 05:04:03.258 7fcedc056700 20 osd.1 op_wq(4) _process OpQueueItem(2.4 PGScrub(pgid=2.4epoch_queued=194) prio 5 cost 52428800 e194) pg 0x558685625000
2019-05-03 05:04:03.261 7fcedc056700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/15.0.0-970-gcf09525/rpm/el7/BUILD/ceph-15.0.0-970-gcf09525/src/osd/PG.cc: In function 'void PG::scrub(epoch_t, ThreadPool::TPHandle&)' thread 7fcedc056700 time 2019-05-03 05:04:03.259417
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/15.0.0-970-gcf09525/rpm/el7/BUILD/ceph-15.0.0-970-gcf09525/src/osd/PG.cc: 2523: FAILED ceph_assert(scrub_queued)

 ceph version 15.0.0-970-gcf09525 (cf09525d8f32cd7baa310426d76132a34b0eafca) octopus (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x5586780aa0d9]
 2: (()+0x4c32a1) [0x5586780aa2a1]
 3: (PG::scrub(unsigned int, ThreadPool::TPHandle&)+0x4ef) [0x5586782ef9ff]
 4: (PGScrub::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x12) [0x55867846ddd2]
 5: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x14fa) [0x55867825dc0a]
 6: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) [0x558678824356]
 7: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x558678826480]
 8: (()+0x7e25) [0x7fcf00635e25]
 9: (clone()+0x6d) [0x7fceff4febad]
Actions #1

Updated by Neha Ojha almost 5 years ago

  • Status changed from New to 12
  • Priority changed from Normal to High

/a/nojha-2019-05-07_17:20:56-rados-fix-pg-notify-distro-basic-smithi/3938003/

Actions #2

Updated by Neha Ojha almost 5 years ago

/a/nojha-2019-06-17_20:20:02-rados-wip-ec-below-min-size-2019-06-17-distro-basic-smithi/4043727

Actions #3

Updated by Neha Ojha almost 5 years ago

  • Status changed from 12 to Duplicate
Actions #4

Updated by David Zafman almost 5 years ago

This was probably cause by 40f71cda0ed4fe78dbcbc1ba73f0cc973bf5c415

osd/: move start_peering_interval and callees into PeeringState

Actions

Also available in: Atom PDF