Project

General

Profile

Actions

Bug #54408

open

Ceph can't tolerate to remove a pool with a huge amount of objects, 90-95% of the osds crashed

Added by Ist Gab about 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

I've removed the old RGW data pool with 2B objects because in multisite seems like the user remove with data purge doesn't work, so need to cleanup somehow.
Luckily I've expected that the cluster will crash so no user in it, but I wonder how this can be done on a smooth way.

I've deleted the pool 23 Feb 2-3pm and the I'd say 90-95% osd down happened around 7:30am on 24.
I've manually compacted all the osds so it got's back to normal, but I'd be curious what operations happens in the morning 7:30am in ceph?

The only way that I guess this can be prevented, if this cleanup operation happens 7:30 am, I should do this kind of delete thing like 8am, compact all the osd before the next day 7:30am, so might be not crash?

This is the osds crash report:

{
    "archived": "2022-02-24 02:07:04.239917",
    "backtrace": [
        "(()+0x12b20) [0x7fd428b36b20]",
        "(pthread_kill()+0x35) [0x7fd428b338d5]",
        "(ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, unsigned long)+0x258) [0x556c183d3808]",
        "(ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, unsigned long, unsigned long)+0x262) [0x556c183d3e52]",
        "(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b0) [0x556c183f56e0]",
        "(ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x556c183f8354]",
        "(()+0x814a) [0x7fd428b2c14a]",
        "(clone()+0x43) [0x7fd42785cdc3]" 
    ],
    "ceph_version": "15.2.14",
    "crash_id": "2022-02-24T01:19:40.185709Z_02e171f1-4948-4b23-9dcc-cffba2004cdd",
    "entity_name": "osd.2",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-osd",
    "stack_sig": "07cf80499e65bb162f3a94cabb32a4d5e7fe02c49432355e9714b429f6249112",
    "timestamp": "2022-02-24T01:19:40.185709Z",
    "utsname_hostname": "sh-cephosd-8006",
    "utsname_machine": "x86_64",
    "utsname_release": "4.18.0-305.19.1.el8_4.x86_64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Wed Sep 15 15:39:39 UTC 2021" 
}

No data to display

Actions

Also available in: Atom PDF