Project

General

Profile

Actions

Bug #40451

closed

osd/PG.cc: 2410: FAILED ceph_assert(scrub_queued)

Added by Sage Weil almost 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

...
2019-06-19T18:19:46.445+0000 7f7ceaed7700 10 osd.2 pg_epoch: 532 pg[2.4( v 514'363 (0'0,514'363] local-lis/les=490/491 n=85 ec=19/19 lis/c 490/490 les/c/f 491/491/0 486/532/490) [2,1] r=0 lpr=532 pi=[490,532)/1 crt=514'363 lcod 494'361 mlcod 494'361 scrubbing mbc={}] clear_primary_state
2019-06-19T18:19:46.445+0000 7f7ceaed7700 10 osd.2 pg_epoch: 532 pg[2.4( v 514'363 (0'0,514'363] local-lis/les=490/491 n=85 ec=19/19 lis/c 490/490 les/c/f 491/491/0 486/532/490) [2,1] r=0 lpr=532 pi=[490,532)/1 crt=514'363 lcod 494'361 mlcod 0'0 scrubbing mbc={}] release_backoffs [2:20000000::::0,2:28000000::::head
)
2019-06-19T18:19:46.445+0000 7f7ceaed7700 20 osd.2 pg_epoch: 532 pg[2.4( v 514'363 (0'0,514'363] local-lis/les=490/491 n=85 ec=19/19 lis/c 490/490 les/c/f 491/491/0 486/532/490) [2,1] r=0 lpr=532 pi=[490,532)/1 crt=514'363 lcod 494'361 mlcod 0'0 scrubbing mbc={}] agent_stop
2019-06-19T18:19:46.445+0000 7f7ceaed7700 10 osd.2 pg_epoch: 532 pg[2.4( v 514'363 (0'0,514'363] local-lis/les=490/491 n=85 ec=19/19 lis/c 490/490 les/c/f 491/491/0 486/532/490) [2,1] r=0 lpr=532 pi=[490,532)/1 crt=514'363 lcod 494'361 mlcod 0'0 scrubbing mbc={}] on_change
2019-06-19T18:19:46.445+0000 7f7ceaed7700 10 osd.2 pg_epoch: 532 pg[2.4( v 514'363 (0'0,514'363] local-lis/les=490/491 n=85 ec=19/19 lis/c 490/490 les/c/f 491/491/0 486/532/490) [2,1] r=0 lpr=532 pi=[490,532)/1 crt=514'363 lcod 494'361 mlcod 0'0 scrubbing mbc={}] cancel_copy_ops
2019-06-19T18:19:46.445+0000 7f7ceaed7700 10 osd.2 pg_epoch: 532 pg[2.4( v 514'363 (0'0,514'363] local-lis/les=490/491 n=85 ec=19/19 lis/c 490/490 les/c/f 491/491/0 486/532/490) [2,1] r=0 lpr=532 pi=[490,532)/1 crt=514'363 lcod 494'361 mlcod 0'0 scrubbing mbc={}] cancel_copy 2:2a602f7f:::smithi08114715-44:head from
 2:8e5a1ae1:::smithi08114715-28:head @2 v284
2019-06-19T18:19:46.445+0000 7f7ceaed7700 10 osd.2 pg_epoch: 532 pg[2.4( v 514'363 (0'0,514'363] local-lis/les=490/491 n=85 ec=19/19 lis/c 490/490 les/c/f 491/491/0 486/532/490) [2,1] r=0 lpr=532 pi=[490,532)/1 crt=514'363 lcod 494'361 mlcod 0'0 scrubbing mbc={}] kick_object_context_blocked 2:2a602f7f:::smithi08114
715-44:head requeuing 1 requests
2019-06-19T18:19:46.445+0000 7f7ceaed7700 20 osd.2 pg_epoch: 532 pg[2.4( v 514'363 (0'0,514'363] local-lis/les=490/491 n=85 ec=19/19 lis/c 490/490 les/c/f 491/491/0 486/532/490) [2,1] r=0 lpr=532 pi=[490,532)/1 crt=514'363 lcod 494'361 mlcod 0'0 scrubbing mbc={}] requeue_op 0x5564d99238c0
2019-06-19T18:19:46.445+0000 7f7ceaed7700 20 osd.2 op_wq(4) _enqueue_front OpQueueItem(2.4 PGOpItem(op=osd_op(client.4292.0:5853 2.4 2:2a602f7f:::smithi08114715-44:head [stat] snapc 0=[] ondisk+read+rwordered+known_if_redirected e516) v8) prio 63 cost 0 e532)
2019-06-19T18:19:46.445+0000 7f7ceaed7700 10 osd.2 pg_epoch: 532 pg[2.4( v 514'363 (0'0,514'363] local-lis/les=490/491 n=85 ec=19/19 lis/c 490/490 les/c/f 491/491/0 486/532/490) [2,1] r=0 lpr=532 pi=[490,532)/1 crt=514'363 lcod 494'361 mlcod 0'0 scrubbing mbc={}] requeue_scrub: queueing
2019-06-19T18:19:46.445+0000 7f7ceaed7700 20 osd.2 op_wq(4) _enqueue OpQueueItem(2.4 PGScrub(pgid=2.4epoch_queued=532) prio 5 cost 52428800 e532)
...
  -203> 2019-06-19T18:19:46.487+0000 7f7ce6ecf700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/15.0.0-1795-g2851aac/rpm/el7/BUILD/ceph-15.0.0-1795-g2851aac/src/osd/PG.cc: In function 'void PG::scrub(epoch_t,
 ThreadPool::TPHandle&)' thread 7f7ce6ecf700 time 2019-06-19T18:19:46.483066+0000
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/15.0.0-1795-g2851aac/rpm/el7/BUILD/ceph-15.0.0-1795-g2851aac/src/osd/PG.cc: 2410: FAILED ceph_assert(scrub_queued)

 ceph version 15.0.0-1795-g2851aac (2851aac1b33ac18f9f5295c6e80bb395d621676e) octopus (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x5564bc54e78f]
 2: (()+0x4c2957) [0x5564bc54e957]
 3: (PG::scrub(unsigned int, ThreadPool::TPHandle&)+0x4ef) [0x5564bc792b9f]
 4: (PGScrub::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x12) [0x5564bc913c22]
 5: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x1508) [0x5564bc702b08]
 6: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) [0x5564bcca9546]
 7: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5564bccab6a0]
 8: (()+0x7dd5) [0x7f7d0f2badd5]

/a/sage-2019-06-19_13:01:10-rados:thrash-wip-sloppy-snaps-distro-basic-smithi/4048437


Related issues 1 (0 open1 closed)

Copied to RADOS - Backport #40537: nautilus: osd/PG.cc: 2410: FAILED ceph_assert(scrub_queued)ResolvedPrashant DActions
Actions #1

Updated by Sage Weil almost 5 years ago

It looks to me like this happened as a side-effect of unblocking the op.

  if (obc->requeue_scrub_on_unblock) {
    obc->requeue_scrub_on_unblock = false;
    requeue_scrub();
  }

A simple fix should be to make requeue_scrub() a no-op if the PG is not active?

Actions #2

Updated by Sage Weil almost 5 years ago

  • Status changed from 12 to Fix Under Review
  • Backport set to nautilus
Actions #3

Updated by Kefu Chai almost 5 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Pull request ID set to 28660
Actions #4

Updated by Nathan Cutler almost 5 years ago

  • Copied to Backport #40537: nautilus: osd/PG.cc: 2410: FAILED ceph_assert(scrub_queued) added
Actions #5

Updated by Nathan Cutler over 4 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF