Bug #48680: mds: scrubbing stuck "scrub active (0 inodes in the stack)" - CephFS - Ceph

Actions

Copy link

Bug #48680

open

mds: scrubbing stuck "scrub active (0 inodes in the stack)"

Added by Patrick Donnelly over 3 years ago. Updated 10 days ago.

Status:

New

Priority:

High

Assignee:

Milind Changire

Category:

Target version:

% Done:

Source:

Q/A

Tags:

Backport:

quincy, pacific

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

qa-failure

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

2020-12-18T02:16:25.467 INFO:teuthology.orchestra.run.smithi193.stdout:{
2020-12-18T02:16:25.467 INFO:teuthology.orchestra.run.smithi193.stdout:    "status": "scrub active (0 inodes in the stack)",
2020-12-18T02:16:25.468 INFO:teuthology.orchestra.run.smithi193.stdout:    "scrubs": {
2020-12-18T02:16:25.468 INFO:teuthology.orchestra.run.smithi193.stdout:        "e366f5ff-325e-460c-b4c7-e6095d429c92": {
2020-12-18T02:16:25.468 INFO:teuthology.orchestra.run.smithi193.stdout:            "path": "/",
2020-12-18T02:16:25.468 INFO:teuthology.orchestra.run.smithi193.stdout:            "tag": "e366f5ff-325e-460c-b4c7-e6095d429c92",
2020-12-18T02:16:25.469 INFO:teuthology.orchestra.run.smithi193.stdout:            "options": "recursive,force" 
2020-12-18T02:16:25.469 INFO:teuthology.orchestra.run.smithi193.stdout:        }
2020-12-18T02:16:25.469 INFO:teuthology.orchestra.run.smithi193.stdout:    }
2020-12-18T02:16:25.469 INFO:teuthology.orchestra.run.smithi193.stdout:}
2020-12-18T02:16:25.470 INFO:tasks.fwd_scrub.fs.[cephfs]:scrub status for tag:e366f5ff-325e-460c-b4c7-e6095d429c92 - {'path': '/', 'tag': 'e366f5ff-325e-460c-b4c7-e6095d429c92', 'options': 'recursive,force'}
2020-12-18T02:16:25.470 ERROR:tasks.fwd_scrub.fs.[cephfs]:exception:
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20201217.205941/qa/tasks/fwd_scrub.py", line 32, in _run
    self.do_scrub()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20201217.205941/qa/tasks/fwd_scrub.py", line 50, in do_scrub
    self.wait_until_scrub_complete(tag)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20201217.205941/qa/tasks/fwd_scrub.py", line 55, in wait_until_scrub_complete
    while proceed():
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 133, in __call__
    raise MaxWhileTries(error_msg)
teuthology.exceptions.MaxWhileTries: reached maximum tries (10) after waiting for 300 seconds
2020-12-18T02:16:25.470 ERROR:tasks.fwd_scrub.fs.[cephfs]:exception:
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20201217.205941/qa/tasks/fwd_scrub.py", line 100, in _run
    self.do_scrub()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20201217.205941/qa/tasks/fwd_scrub.py", line 144, in do_scrub
    raise RuntimeError('error during scrub thrashing')
RuntimeError: error during scrub thrashing
2020-12-18T02:16:28.197 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.6 is failed for ~280s
2020-12-18T02:16:28.198 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.9 is failed for ~266s
2020-12-18T02:16:28.198 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.5 is failed for ~259s
2020-12-18T02:16:28.198 INFO:tasks.daemonwatchdog.daemon_watchdog:thrasher.fs.[cephfs] failed
2020-12-18T02:16:28.198 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons

From: /ceph/teuthology-archive/pdonnell-2020-12-17_23:13:08-fs-wip-pdonnell-testing-20201217.205941-distro-basic-smithi/5715913/teuthology.log

Related issues 2 (2 open — 0 closed)

Actions

Copy link

Updated by Patrick Donnelly over 3 years ago

Target version changed from v16.0.0 to v17.0.0
Backport set to pacific,octopus,nautilus

Actions

Copy link

Updated by Patrick Donnelly about 3 years ago

Related to Bug #48773: qa: scrub does not complete added

Actions

Copy link

Updated by Milind Changire about 2 years ago

scrape logs point to a crash in osd:

2020-12-18T15:32:48.646 INFO:scrape:Crash: Command failed on smithi193 with status 1: 'sudo rm -rf -- /home/ubuntu/cephtest/mnt.0/client.0/tmp'
2020-12-18T15:32:48.646 INFO:scrape:ceph version 16.0.0-8467-gd02b6b21 (d02b6b2187aba6b98f4df50520d865a75a745267) pacific (dev)
 1: /lib64/libpthread.so.0(+0x12b20) [0x7fa2fc29cb20]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1b6) [0x560c391a61ad]
 5: (KernelDevice::_aio_thread()+0x1254) [0x560c39ccf464]
 6: (KernelDevice::AioCompletionThread::entry()+0x11) [0x560c39cda761]
 7: /lib64/libpthread.so.0(+0x814a) [0x7fa2fc29214a]
 8: clone()
2020-12-18T15:32:48.647 INFO:scrape:1 jobs: ['5715913']
2020-12-18T15:32:48.647 INFO:scrape:suites: ['clusters/1a2s-mds-1c-client-3node', 'conf/{client', 'distro/{rhel_8}', 'fs/workload/{begin', 'mds', 'mon', 'mount/kclient/{mount', 'ms-die-on-skipped}}', 'objectstore-ec/bluestore-ec-root', 'omap_limit/10000', 'osd-asserts', 'osd}', 'overrides/{distro/stock/{k-stock', 'overrides/{frag_enable', 'rhel_8}', 'scrub/yes', 'session_timeout', 'tasks/workunit/suites/iogen}', 'whitelist_health', 'whitelist_wrongly_marked_down}']

Actions

Copy link

Updated by Venky Shankar about 2 years ago

Backport changed from pacific,octopus,nautilus to quincy, pacific

Maybe related (but no backtrace in OSDs): https://pulpito.ceph.com/vshankar-2022-04-09_12:55:41-fs-wip-vshankar-testing-55110-20220408-203242-testing-default-smithi/6783864/

Actions

Copy link

Updated by Patrick Donnelly almost 2 years ago

Target version deleted (~~v17.0.0~~)

Actions

Copy link

Updated by Milind Changire 10 days ago

This might be due to enabling of frags as seen in the job description for the job mentioned in comment#4 and probably fixed by https://github.com/ceph/ceph/pull/53636

Actions

Copy link

Updated by Venky Shankar 10 days ago

Related to Bug #62658: error during scrub thrashing: reached maximum tries (31) after waiting for 900 seconds added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #48680

mds: scrubbing stuck "scrub active (0 inodes in the stack)"

Updated by Patrick Donnelly over 3 years ago

Updated by Patrick Donnelly about 3 years ago

Updated by Milind Changire about 2 years ago

Updated by Venky Shankar about 2 years ago

Updated by Patrick Donnelly almost 2 years ago

Updated by Milind Changire 10 days ago

Updated by Venky Shankar 10 days ago