Project

General

Profile

Actions

Bug #41228

closed

mon: deleting a CephFS and its pools causes MONs to crash

Added by Janek Bevendorff over 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
octopus,nautilus
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Disclaimer: I am not entirely sure if this is strictly related to CephFS or a general problem when deleting pools with many objects. Feel free to move this issue to the correct place.

This is the second time this has happened to me. I have not had real data loss yet (still in the messing-around phase), but in production this could be disastrous.

I had a large-ish CephFS with millions of small files and wanted to delete and recreate it. Thus, I failed all MDSs and removed the FS. Unfortunately, after deleting the pools, the MONs started crashing with these messages:

enable_multiple, ever_enabled_multiple: 1,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table
,9=file layout v2,10=snaprealm v2}
legacy client fscid: 5

Filesystem 'cephfs.storage' (5)
fs_name cephfs.storage
epoch   43346
flags   12
created 2019-07-24 14:06:34.245092
modified        2019-07-24 17:35:02.533177
tableserver     0
root    0
session_timeout 60
session_autoclose       300
max_file_size   1099511627776
min_compat_client       -1 (unspecified)
last_failure    0
last_failure_osd_epoch  17274
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table
,9=file layout v2,10=snaprealm v2}
legacy client fscid: 5

Filesystem 'cephfs.storage' (5)
fs_name cephfs.storage
epoch   43346
flags   12
created 2019-07-24 14:06:34.245092
modified        2019-07-24 17:35:02.533177
tableserver     0
root    0
session_timeout 60
session_autoclose       300
max_file_size   1099511627776
min_compat_client       -1 (unspecified)
last_failure    0
last_failure_osd_epoch  172741
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table
,9=file layout v2,10=snaprealm v2}
max_mds 1
in      0
up      {}
failed  0
damaged 
stopped 1
data_pools      [105]
metadata_pool   104
inline_data     disabled
balancer        
standby_count_wanted    1

2019-07-24 17:35:02.534 7f84989c0700  0 log_channel(audit) log [INF] : from='client.26469240 IP:0/523350059' entity='client.admin' cmd='[{"prefix": "mds fail", "role_or_gid": "HOST"}]': finished
2019-07-24 17:35:02.534 7f84989c0700  0 log_channel(cluster) log [DBG] : fsmap cephfs.storage:0/1, 1 failed
2019-07-24 17:35:03.350 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:03.350 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/2224503905' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:03.710 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "fs rm", "fs_name": "cephfs.storage", "yes_i_really_mean_it": true} v 0) v1
2019-07-24 17:35:03.710 7f849c1c7700  0 log_channel(audit) log [INF] : from='client.? IP:0/3380953550' entity='client.admin' cmd=[{"prefix": "fs rm", "fs_name": "cephfs.storage", "yes_i_really_mean_it": tru
e}]: dispatch
2019-07-24 17:35:03.858 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:03.858 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/4223835059' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:04.542 7f849e9cc700  0 log_channel(cluster) log [INF] : Health check cleared: FS_DEGRADED (was: 1 filesystem is degraded)
2019-07-24 17:35:04.542 7f849e9cc700  0 log_channel(cluster) log [INF] : Health check cleared: FS_WITH_FAILED_MDS (was: 1 filesystem has a failed mds daemon)
2019-07-24 17:35:04.542 7f849e9cc700  0 log_channel(cluster) log [INF] : Health check cleared: MDS_ALL_DOWN (was: 1 filesystem is offline)
2019-07-24 17:35:04.542 7f849e9cc700  0 log_channel(cluster) log [INF] : Health check cleared: MDS_INSUFFICIENT_STANDBY (was: insufficient standby MDS daemons available)
2019-07-24 17:35:04.542 7f849e9cc700  0 log_channel(cluster) log [INF] : Cluster is now healthy
2019-07-24 17:35:04.554 7f84989c0700  0 mon.HOST@0(leader).mds e43347 new map
2019-07-24 17:35:04.554 7f84989c0700  0 mon.HOST@0(leader).mds e43347 print_map
e43347
enable_multiple, ever_enabled_multiple: 1,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table
,9=file layout v2,10=snaprealm v2}
legacy client fscid: -1

No filesystems configured
2019-07-24 17:35:04.554 7f84989c0700  1 mon.HOST@0(leader).mds e43347 Client subscribed for legacy filesystem but none is configured
2019-07-24 17:35:04.558 7f84989c0700  0 log_channel(audit) log [INF] : from='client.? IP:0/3380953550' entity='client.admin' cmd='[{"prefix": "fs rm", "fs_name": "cephfs.storage", "yes_i_really_mean_it": tr
ue}]': finished
2019-07-24 17:35:04.558 7f84989c0700  0 log_channel(cluster) log [DBG] : fsmap 
2019-07-24 17:35:09.390 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:09.390 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/3920459854' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:10.750 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:10.750 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.26469335 IP:0/1112662152' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:12.106 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:12.106 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.26469365 IP:0/2094380662' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:15.606 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:15.606 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.26469415 IP:0/3722380370' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:18.303 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:18.303 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/3052985649' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:20.287 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:20.287 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/413031953' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:21.027 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:21.027 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/3782415477' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:22.403 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:22.403 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/4047846888' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:23.183 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"var": "pg_num", "prefix": "osd pool set", "val": "65536", "pool": "de-weimar.rgw.buckets.data"} v 0) v1
2019-07-24 17:35:23.183 7f849c1c7700  0 log_channel(audit) log [INF] : from='mgr.25314350 ' entity='mgr.HOST' cmd=[{"var": "pg_num", "prefix": "osd pool set", "val": "65536", "pool": "de-weimar.rgw.buckets.dat
a"}]: dispatch
2019-07-24 17:35:25.159 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:25.159 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.26469545 IP:0/2971064078' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:25.739 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:25.739 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/4173892011' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:27.867 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:27.867 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.26469585 IP:0/1210535081' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:29.211 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "config set", "who": "mon", "name": "mon_allow_pool_delete", "value": "true"} v 0) v1
2019-07-24 17:35:29.211 7f849c1c7700  0 log_channel(audit) log [INF] : from='client.? ' entity='client.admin' cmd=[{"prefix": "config set", "who": "mon", "name": "mon_allow_pool_delete", "value": "true"}]: dispatch
2019-07-24 17:35:29.231 7f84989c0700  0 log_channel(audit) log [INF] : from='client.? ' entity='client.admin' cmd='[{"prefix": "config set", "who": "mon", "name": "mon_allow_pool_delete", "value": "true"}]': finished
2019-07-24 17:35:30.623 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:30.623 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/2415832508' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:31.971 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:31.971 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/2158999092' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:32.563 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:32.563 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.26469670 IP:0/1662692119' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:33.927 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:33.927 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/3247185013' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:35.295 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:35.295 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.26469710 IP:0/3564459348' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch

2019-07-24 17:35:37.375 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:37.375 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.26469735 IP:0/1504902306' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:38.051 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:38.051 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/1064715093' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:38.735 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:38.735 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/674656464' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:43.531 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:43.531 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/2204702323' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:45.527 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:45.527 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.26469845 IP:0/3819528033' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:46.887 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:46.887 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/3837001654' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:48.251 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:48.251 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/2498201380' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:50.167 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"pool2": "cephfs.storage.meta", "prefix": "osd pool rm", "pool": "cephfs.storage.meta", "yes_i_really_really_mean_it":
 true} v 0) v1
2019-07-24 17:35:50.167 7f849c1c7700  0 log_channel(audit) log [INF] : from='client.? ' entity='client.admin' cmd=[{"pool2": "cephfs.storage.meta", "prefix": "osd pool rm", "pool": "cephfs.storage.meta", "yes_i_reall
y_really_mean_it": true}]: dispatch
2019-07-24 17:35:50.875 7f849e9cc700  1 mon.HOST@0(leader).osd e172741 do_prune osdmap full prune enabled
2019-07-24 17:35:50.903 7f84989c0700  1 mon.HOST@0(leader).osd e172742 e172742: 1216 total, 1216 up, 1216 in
2019-07-24 17:35:50.911 7f84989c0700  0 log_channel(audit) log [INF] : from='client.? ' entity='client.admin' cmd='[{"pool2": "cephfs.storage.meta", "prefix": "osd pool rm", "pool": "cephfs.storage.meta", "yes_i_real
ly_really_mean_it": true}]': finished
2019-07-24 17:35:50.911 7f84989c0700  0 log_channel(cluster) log [DBG] : osdmap e172742: 1216 total, 1216 up, 1216 in
2019-07-24 17:35:53.727 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:53.727 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/3260840221' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:55.791 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:55.791 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/1627071451' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:58.495 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:58.495 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/3948058856' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:59.023 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"pool2": "cephfs.storage.data", "prefix": "osd pool rm", "pool": "cephfs.storage.data", "yes_i_really_really_mean_it":
 true} v 0) v1
2019-07-24 17:35:59.023 7f849c1c7700  0 log_channel(audit) log [INF] : from='client.? ' entity='client.admin' cmd=[{"pool2": "cephfs.storage.data", "prefix": "osd pool rm", "pool": "cephfs.storage.data", "yes_i_reall
y_really_mean_it": true}]: dispatch
2019-07-24 17:35:59.963 7f849e9cc700  1 mon.HOST@0(leader).osd e172742 do_prune osdmap full prune enabled
2019-07-24 17:35:59.987 7f84989c0700  1 mon.HOST@0(leader).osd e172743 e172743: 1216 total, 1216 up, 1216 in
2019-07-24 17:35:59.991 7f84989c0700  0 log_channel(audit) log [INF] : from='client.? ' entity='client.admin' cmd='[{"pool2": "cephfs.storage.data", "prefix": "osd pool rm", "pool": "cephfs.storage.data", "yes_i_real
ly_really_mean_it": true}]': finished
2019-07-24 17:35:59.991 7f84989c0700  0 log_channel(cluster) log [DBG] : osdmap e172743: 1216 total, 1216 up, 1216 in
2019-07-24 17:36:00.583 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:36:00.583 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.26470030 IP:0/2144602860' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:36:01.951 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:36:01.951 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/3286028704' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:36:05.355 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:36:05.355 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.26444613 IP:0/2957845693' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:36:05.355 7f849c1c7700  1 mon.HOST@0(leader).mds e43347 Client subscribed for legacy filesystem but none is configured
2019-07-24 17:36:05.359 7f849c1c7700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f849c1c7700 thread_name:ms_dispatch

 ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)
 1: (()+0x12890) [0x7f84a860d890]
 2: (()+0x2b142e) [0x7f84a97e142e]
 3: (PGMapDigest::get_statfs(OSDMap&, boost::optional<long>) const+0xd8) [0x7f84a9b66918]
 4: (MgrStatMonitor::preprocess_statfs(boost::intrusive_ptr<MonOpRequest>)+0x3be) [0x5654367ef28e]
 5: (MgrStatMonitor::preprocess_query(boost::intrusive_ptr<MonOpRequest>)+0x153) [0x5654367ef683]
 6: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x25a) [0x56543670673a]
 7: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x891) [0x56543660fa01]
 8: (Monitor::_ms_dispatch(Message*)+0xdbb) [0x56543661138b]
 9: (Monitor::ms_dispatch(Message*)+0x26) [0x56543663e5d6]
 10: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x26) [0x56543663b716]
 11: (DispatchQueue::entry()+0x1a49) [0x7f84a99dfa49]
 12: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f84a9a8a88d]
 13: (()+0x76db) [0x7f84a86026db]
 14: (clone()+0x3f) [0x7f84a77e888f]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

From then on the MONs kept crashing right after they were started or at the latest after they tried to form a quorum.

During election, I got these messages:

2019-07-20 11:02:24.827 7eff4e6d2700  1 mon.HOST@0(electing) e10 handle_auth_request failed to assign global_id
2019-07-20 11:02:25.479 7eff4e6d2700  1 mon.HOST@0(electing) e10 handle_auth_request failed to assign global_id
2019-07-20 11:02:27.243 7eff4e6d2700  1 mon.HOST@0(electing) e10 handle_auth_request failed to assign global_id

In the beginning, there were also messages about clients being subscribed to a legacy FS (although I unmounted all clients), but those disappeared after a while.

The only way I could recover the cluster both times was to stop all services (including OSDs to prevent or at least minimize actual data loss), make a backup copy of the /var/lib/ceph/mon folder, and then manipulate the MON map to form a quorum with fewer MONs. The first time this happened, I could bootstrap again from one MON and then let the others rejoin with a clean slate. The second time I managed to start up three of them without crashes and then reset the other two. To get the OSDs working again, I had to set the noout and reboot them a few times.

After this procedure, the cluster appears to be in a healthy state again, though I don't know if all objects are consistent.


Related issues 2 (0 open2 closed)

Is duplicate of mgr - Bug #40011: ceph -s shows wrong number of pools when pool was deletedResolvedDaniel Oliveira

Actions
Copied to CephFS - Bug #47182: mon: deleting a CephFS and its pools causes MONs to crashResolvedPatrick Donnelly

Actions
Actions

Also available in: Atom PDF