Project

General

Profile

Actions

Bug #51942

closed

src/osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive*>())

Added by Neha Ojha over 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
-
% Done:

100%

Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-07-27T19:37:03.699 INFO:tasks.ceph.osd.4.smithi102.stderr:/build/ceph-16.2.5-121-gfa0293ed/src/osd/scrub_machine.cc: In function 'void Scrub::ScrubMachine::assert_not_active() const' thread 7f1a8adb6700 time 2021-07-27T19:37:03.699793+0000
2021-07-27T19:37:03.699 INFO:tasks.ceph.osd.4.smithi102.stderr:/build/ceph-16.2.5-121-gfa0293ed/src/osd/scrub_machine.cc: 55: FAILED ceph_assert(state_cast<const NotActive*>())
2021-07-27T19:37:03.702 INFO:tasks.ceph.osd.6.smithi102.stderr:2021-07-27T19:37:03.700+0000 7f38de066700 -1 received  signal: Hangup from /usr/bin/python3 /usr/bin/daemon-helper kill ceph-osd -f --cluster ceph -i 6  (PID: 18788) UID: 0
2021-07-27T19:37:03.711 INFO:tasks.ceph.osd.4.smithi102.stderr: ceph version 16.2.5-121-gfa0293ed (fa0293ed9fd265d1033b992129d53bf08dbf7b22) pacific (stable)
2021-07-27T19:37:03.712 INFO:tasks.ceph.osd.4.smithi102.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x55893cb52c7d]
2021-07-27T19:37:03.712 INFO:tasks.ceph.osd.4.smithi102.stderr: 2: ceph-osd(+0xac5e85) [0x55893cb52e85]
2021-07-27T19:37:03.712 INFO:tasks.ceph.osd.4.smithi102.stderr: 3: ceph-osd(+0xf735c6) [0x55893d0005c6]
2021-07-27T19:37:03.712 INFO:tasks.ceph.osd.4.smithi102.stderr: 4: (PgScrubber::reset_epoch(unsigned int)+0x4a) [0x55893cfe710a]
2021-07-27T19:37:03.713 INFO:tasks.ceph.osd.4.smithi102.stderr: 5: (PgScrubber::initiate_regular_scrub(unsigned int)+0x7e) [0x55893cfe72ce]
2021-07-27T19:37:03.713 INFO:tasks.ceph.osd.4.smithi102.stderr: 6: (PG::forward_scrub_event(void (ScrubPgIF::*)(unsigned int), unsigned int)+0x72) [0x55893cceeea2]
2021-07-27T19:37:03.713 INFO:tasks.ceph.osd.4.smithi102.stderr: 7: (PG::scrub(unsigned int, ThreadPool::TPHandle&)+0x51) [0x55893ccef2d1]
2021-07-27T19:37:03.713 INFO:tasks.ceph.osd.4.smithi102.stderr: 8: (ceph::osd::scheduler::PGScrub::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x1e) [0x55893ced614e]
2021-07-27T19:37:03.713 INFO:tasks.ceph.osd.4.smithi102.stderr: 9: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x872) [0x55893cc50692]
2021-07-27T19:37:03.714 INFO:tasks.ceph.osd.4.smithi102.stderr: 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x403) [0x55893d372e73]
2021-07-27T19:37:03.714 INFO:tasks.ceph.osd.4.smithi102.stderr: 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55893d375c94]
2021-07-27T19:37:03.714 INFO:tasks.ceph.osd.4.smithi102.stderr: 12: /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7f1aa6f74609]
2021-07-27T19:37:03.714 INFO:tasks.ceph.osd.4.smithi102.stderr: 13: clone()

/a/yuriw-2021-07-27_17:19:39-rados-wip-yuri-testing-2021-07-27-0830-pacific-distro-basic-smithi/6297040 - no logs


Related issues 2 (0 open2 closed)

Has duplicate RADOS - Bug #51338: osd/scrub_machine.cc: FAILED ceph_assert(state_cast&lt;const NotActive*&gt;())Duplicate

Actions
Copied to RADOS - Backport #53339: pacific: src/osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive*>())ResolvedRonen FriedmanActions
Actions #1

Updated by Neha Ojha over 2 years ago

  • Assignee set to Ronen Friedman

rados/thrash/{0-size-min-size-overrides/2-size-2-min-size 1-pg-log-overrides/normal_pg_log 2-recovery-overrides/{default} 3-scrub-overrides/{max-simultaneous-scrubs-3} backoff/peering_and_degraded ceph clusters/{fixed-2 openstack} crc-failures/bad_map_crc_failure d-balancer/crush-compat mon_election/classic msgr-failures/osd-dispatch-delay msgr/async objectstore/bluestore-comp-snappy rados supported-random-distro$/{ubuntu_latest} thrashers/mapgap thrashosds-health workloads/cache-agent-big}

Note that the test uses max-simultaneous-scrubs-3

Actions #2

Updated by Neha Ojha over 2 years ago

https://pulpito.ceph.com/nojha-2021-08-03_18:59:59-rados-wip-yuri-testing-2021-07-27-0830-pacific-distro-basic-smithi/6309433/

2021-08-03T20:12:00.045+0000 7fd1593b8700 -1 /build/ceph-16.2.5-121-gfa0293ed/src/osd/scrub_machine.cc: In function 'void Scrub::ScrubMachine::assert_not_active() const' thread 7fd1593b8700 time 2021-08-03T20:12:00.041665+0000
/build/ceph-16.2.5-121-gfa0293ed/src/osd/scrub_machine.cc: 55: FAILED ceph_assert(state_cast<const NotActive*>())

 ceph version 16.2.5-121-gfa0293ed (fa0293ed9fd265d1033b992129d53bf08dbf7b22) pacific (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x56459bf77c7d]
 2: ceph-osd(+0xac5e85) [0x56459bf77e85]
 3: ceph-osd(+0xf735c6) [0x56459c4255c6]
 4: (PgScrubber::reset_epoch(unsigned int)+0x4a) [0x56459c40c10a]
 5: (PgScrubber::initiate_regular_scrub(unsigned int)+0x7e) [0x56459c40c2ce]
 6: (PG::forward_scrub_event(void (ScrubPgIF::*)(unsigned int), unsigned int)+0x72) [0x56459c113ea2]
 7: (PG::scrub(unsigned int, ThreadPool::TPHandle&)+0x51) [0x56459c1142d1]
 8: (ceph::osd::scheduler::PGScrub::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x1e) [0x56459c2fb14e]
 9: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x872) [0x56459c075692]
 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x403) [0x56459c797e73]
 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x56459c79ac94]
 12: /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7fd17ad81609]
 13: clone()

Logs temporarily copied to /home/nojha/6309433 in teuthology.

Actions #3

Updated by Sage Weil over 2 years ago

/a/sage-2021-10-28_02:19:01-rados-wip-sage3-testing-2021-10-27-1300-distro-basic-smithi/6464393

with osd logs

Actions #4

Updated by Sage Weil over 2 years ago

and /a/sage-2021-10-28_02:19:01-rados-wip-sage3-testing-2021-10-27-1300-distro-basic-smithi/6464056

with logs

Actions #5

Updated by Neha Ojha over 2 years ago

  • Priority changed from Normal to High

Ronen, let's prioritize this.

Actions #6

Updated by Neha Ojha over 2 years ago

  • Priority changed from High to Urgent

rados/thrash/{0-size-min-size-overrides/2-size-2-min-size 1-pg-log-overrides/normal_pg_log 2-recovery-overrides/{default} 3-scrub-overrides/{max-simultaneous-scrubs-3} backoff/peering_and_degraded ceph clusters/{fixed-2 openstack} crc-failures/bad_map_crc_failure d-balancer/crush-compat mon_election/classic msgr-failures/osd-dispatch-delay msgr/async objectstore/bluestore-comp-snappy rados supported-random-distro$/{centos_8} thrashers/mapgap thrashosds-health workloads/cache-agent-big}

/a/yuriw-2021-11-17_15:36:30-rados-wip-yuri6-testing-2021-11-16-0926-pacific-distro-basic-smithi/6509819

Actions #7

Updated by Neha Ojha over 2 years ago

  • Status changed from New to Pending Backport
  • Pull request ID set to 42780
Actions #8

Updated by Backport Bot over 2 years ago

  • Copied to Backport #53339: pacific: src/osd/scrub_machine.cc: FAILED ceph_assert(state_cast<const NotActive*>()) added
Actions #9

Updated by Neha Ojha about 2 years ago

  • Has duplicate Bug #51338: osd/scrub_machine.cc: FAILED ceph_assert(state_cast&lt;const NotActive*&gt;()) added
Actions #10

Updated by Backport Bot over 1 year ago

  • Tags set to backport_processed
Actions #11

Updated by Konstantin Shalygin over 1 year ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100
  • Tags deleted (backport_processed)
Actions

Also available in: Atom PDF