Project

General

Profile

Actions

Bug #50346

closed

OSD crash FAILED ceph_assert(!is_scrubbing())

Added by 玮文 胡 about 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When I see warning PG_NOT_SCRUBBED, I set osd flag "nodeep-scrub", set config osd_max_scrubs to 2, and run:

for pg in $(ceph health detail | awk '{print $2}' | tail -n +3); do ceph pg scrub $pg; done

I'm intended to accelerate scrub to resolve this warning. After some minutes, one OSD crashed. "ceph crash info" shows"

{
    "assert_condition": "!is_scrubbing()",
    "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.0/rpm/el8/BUILD/ceph-16.2.0/src/osd/PG.cc",
    "assert_func": "bool PG::sched_scrub()",
    "assert_line": 1339,
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.0/rpm/el8/BUILD/ceph-16.2.0/src/osd/PG.cc: In function 'bool PG::sched_scrub()' thread 7fa63b19c700 time 2021-04-14T06:50:16.690936+0000\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.0/rpm/el8/BUILD/ceph-16.2.0/src/osd/PG.cc: 1339: FAILED ceph_assert(!is_scrubbing())\n",
    "assert_thread_name": "safe_timer",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12b20) [0x7fa644996b20]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x5641acec9d49]",
        "/usr/bin/ceph-osd(+0x568f12) [0x5641acec9f12]",
        "(PG::sched_scrub()+0x561) [0x5641ad07a201]",
        "(OSD::sched_scrub()+0x8e3) [0x5641acfc4063]",
        "(OSD::tick_without_osd_lock()+0x678) [0x5641acfd59d8]",
        "(Context::complete(int)+0xd) [0x5641ad00917d]",
        "(SafeTimer::timer_thread()+0x1b7) [0x5641ad64e807]",
        "(SafeTimerThread::entry()+0x11) [0x5641ad64fde1]",
        "/lib64/libpthread.so.0(+0x814a) [0x7fa64498c14a]",
        "clone()" 
    ],
    "ceph_version": "16.2.0",
    "crash_id": "2021-04-14T06:50:16.721313Z_ba82e6bc-e025-4c14-9431-8522393cb79d",
    "entity_name": "osd.6",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-osd",
    "stack_sig": "dc53b29bcd5e6e90adf9cd40bff50b2b558b52cc78ef2c401896ad21b883bfa5",
    "timestamp": "2021-04-14T06:50:16.721313Z",
    "utsname_hostname": "gpu014",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.0-56-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#62-Ubuntu SMP Mon Nov 23 19:20:19 UTC 2020" 
}

Then it is automatically restarted and seems OK.


Files

osd.6.log.gz (146 KB) osd.6.log.gz 玮文 胡, 04/14/2021 07:16 AM
pg (201 KB) pg Neha Ojha, 06/24/2021 10:15 PM

Related issues 1 (0 open1 closed)

Copied to RADOS - Backport #51371: pacific: OSD crash FAILED ceph_assert(!is_scrubbing())ResolvedRonen FriedmanActions
Actions

Also available in: Atom PDF