Project

General

Profile

Bug #41228

mon: deleting a CephFS and its pools causes MONs to crash

Added by Janek Bevendorff 9 days ago. Updated 1 day ago.

Status:
New
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Source:
Tags:
Backport:
nautilus,mimic
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:

Description

Disclaimer: I am not entirely sure if this is strictly related to CephFS or a general problem when deleting pools with many objects. Feel free to move this issue to the correct place.

This is the second time this has happened to me. I have not had real data loss yet (still in the messing-around phase), but in production this could be disastrous.

I had a large-ish CephFS with millions of small files and wanted to delete and recreate it. Thus, I failed all MDSs and removed the FS. Unfortunately, after deleting the pools, the MONs started crashing with these messages:

enable_multiple, ever_enabled_multiple: 1,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table
,9=file layout v2,10=snaprealm v2}
legacy client fscid: 5

Filesystem 'cephfs.storage' (5)
fs_name cephfs.storage
epoch   43346
flags   12
created 2019-07-24 14:06:34.245092
modified        2019-07-24 17:35:02.533177
tableserver     0
root    0
session_timeout 60
session_autoclose       300
max_file_size   1099511627776
min_compat_client       -1 (unspecified)
last_failure    0
last_failure_osd_epoch  17274
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table
,9=file layout v2,10=snaprealm v2}
legacy client fscid: 5

Filesystem 'cephfs.storage' (5)
fs_name cephfs.storage
epoch   43346
flags   12
created 2019-07-24 14:06:34.245092
modified        2019-07-24 17:35:02.533177
tableserver     0
root    0
session_timeout 60
session_autoclose       300
max_file_size   1099511627776
min_compat_client       -1 (unspecified)
last_failure    0
last_failure_osd_epoch  172741
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table
,9=file layout v2,10=snaprealm v2}
max_mds 1
in      0
up      {}
failed  0
damaged 
stopped 1
data_pools      [105]
metadata_pool   104
inline_data     disabled
balancer        
standby_count_wanted    1

2019-07-24 17:35:02.534 7f84989c0700  0 log_channel(audit) log [INF] : from='client.26469240 IP:0/523350059' entity='client.admin' cmd='[{"prefix": "mds fail", "role_or_gid": "HOST"}]': finished
2019-07-24 17:35:02.534 7f84989c0700  0 log_channel(cluster) log [DBG] : fsmap cephfs.storage:0/1, 1 failed
2019-07-24 17:35:03.350 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:03.350 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/2224503905' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:03.710 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "fs rm", "fs_name": "cephfs.storage", "yes_i_really_mean_it": true} v 0) v1
2019-07-24 17:35:03.710 7f849c1c7700  0 log_channel(audit) log [INF] : from='client.? IP:0/3380953550' entity='client.admin' cmd=[{"prefix": "fs rm", "fs_name": "cephfs.storage", "yes_i_really_mean_it": tru
e}]: dispatch
2019-07-24 17:35:03.858 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:03.858 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/4223835059' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:04.542 7f849e9cc700  0 log_channel(cluster) log [INF] : Health check cleared: FS_DEGRADED (was: 1 filesystem is degraded)
2019-07-24 17:35:04.542 7f849e9cc700  0 log_channel(cluster) log [INF] : Health check cleared: FS_WITH_FAILED_MDS (was: 1 filesystem has a failed mds daemon)
2019-07-24 17:35:04.542 7f849e9cc700  0 log_channel(cluster) log [INF] : Health check cleared: MDS_ALL_DOWN (was: 1 filesystem is offline)
2019-07-24 17:35:04.542 7f849e9cc700  0 log_channel(cluster) log [INF] : Health check cleared: MDS_INSUFFICIENT_STANDBY (was: insufficient standby MDS daemons available)
2019-07-24 17:35:04.542 7f849e9cc700  0 log_channel(cluster) log [INF] : Cluster is now healthy
2019-07-24 17:35:04.554 7f84989c0700  0 mon.HOST@0(leader).mds e43347 new map
2019-07-24 17:35:04.554 7f84989c0700  0 mon.HOST@0(leader).mds e43347 print_map
e43347
enable_multiple, ever_enabled_multiple: 1,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table
,9=file layout v2,10=snaprealm v2}
legacy client fscid: -1

No filesystems configured
2019-07-24 17:35:04.554 7f84989c0700  1 mon.HOST@0(leader).mds e43347 Client subscribed for legacy filesystem but none is configured
2019-07-24 17:35:04.558 7f84989c0700  0 log_channel(audit) log [INF] : from='client.? IP:0/3380953550' entity='client.admin' cmd='[{"prefix": "fs rm", "fs_name": "cephfs.storage", "yes_i_really_mean_it": tr
ue}]': finished
2019-07-24 17:35:04.558 7f84989c0700  0 log_channel(cluster) log [DBG] : fsmap 
2019-07-24 17:35:09.390 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:09.390 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/3920459854' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:10.750 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:10.750 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.26469335 IP:0/1112662152' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:12.106 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:12.106 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.26469365 IP:0/2094380662' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:15.606 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:15.606 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.26469415 IP:0/3722380370' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:18.303 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:18.303 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/3052985649' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:20.287 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:20.287 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/413031953' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:21.027 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:21.027 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/3782415477' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:22.403 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:22.403 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/4047846888' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:23.183 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"var": "pg_num", "prefix": "osd pool set", "val": "65536", "pool": "de-weimar.rgw.buckets.data"} v 0) v1
2019-07-24 17:35:23.183 7f849c1c7700  0 log_channel(audit) log [INF] : from='mgr.25314350 ' entity='mgr.HOST' cmd=[{"var": "pg_num", "prefix": "osd pool set", "val": "65536", "pool": "de-weimar.rgw.buckets.dat
a"}]: dispatch
2019-07-24 17:35:25.159 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:25.159 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.26469545 IP:0/2971064078' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:25.739 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:25.739 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/4173892011' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:27.867 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:27.867 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.26469585 IP:0/1210535081' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:29.211 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "config set", "who": "mon", "name": "mon_allow_pool_delete", "value": "true"} v 0) v1
2019-07-24 17:35:29.211 7f849c1c7700  0 log_channel(audit) log [INF] : from='client.? ' entity='client.admin' cmd=[{"prefix": "config set", "who": "mon", "name": "mon_allow_pool_delete", "value": "true"}]: dispatch
2019-07-24 17:35:29.231 7f84989c0700  0 log_channel(audit) log [INF] : from='client.? ' entity='client.admin' cmd='[{"prefix": "config set", "who": "mon", "name": "mon_allow_pool_delete", "value": "true"}]': finished
2019-07-24 17:35:30.623 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:30.623 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/2415832508' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:31.971 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:31.971 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/2158999092' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:32.563 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:32.563 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.26469670 IP:0/1662692119' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:33.927 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:33.927 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/3247185013' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:35.295 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:35.295 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.26469710 IP:0/3564459348' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch

2019-07-24 17:35:37.375 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:37.375 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.26469735 IP:0/1504902306' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:38.051 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:38.051 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/1064715093' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:38.735 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:38.735 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/674656464' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:43.531 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:43.531 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/2204702323' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:45.527 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:45.527 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.26469845 IP:0/3819528033' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:46.887 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:46.887 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/3837001654' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:48.251 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:48.251 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/2498201380' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:50.167 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"pool2": "cephfs.storage.meta", "prefix": "osd pool rm", "pool": "cephfs.storage.meta", "yes_i_really_really_mean_it":
 true} v 0) v1
2019-07-24 17:35:50.167 7f849c1c7700  0 log_channel(audit) log [INF] : from='client.? ' entity='client.admin' cmd=[{"pool2": "cephfs.storage.meta", "prefix": "osd pool rm", "pool": "cephfs.storage.meta", "yes_i_reall
y_really_mean_it": true}]: dispatch
2019-07-24 17:35:50.875 7f849e9cc700  1 mon.HOST@0(leader).osd e172741 do_prune osdmap full prune enabled
2019-07-24 17:35:50.903 7f84989c0700  1 mon.HOST@0(leader).osd e172742 e172742: 1216 total, 1216 up, 1216 in
2019-07-24 17:35:50.911 7f84989c0700  0 log_channel(audit) log [INF] : from='client.? ' entity='client.admin' cmd='[{"pool2": "cephfs.storage.meta", "prefix": "osd pool rm", "pool": "cephfs.storage.meta", "yes_i_real
ly_really_mean_it": true}]': finished
2019-07-24 17:35:50.911 7f84989c0700  0 log_channel(cluster) log [DBG] : osdmap e172742: 1216 total, 1216 up, 1216 in
2019-07-24 17:35:53.727 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:35:53.727 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/3260840221' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:35:55.791 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:55.791 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/1627071451' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:58.495 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:35:58.495 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/3948058856' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:35:59.023 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"pool2": "cephfs.storage.data", "prefix": "osd pool rm", "pool": "cephfs.storage.data", "yes_i_really_really_mean_it":
 true} v 0) v1
2019-07-24 17:35:59.023 7f849c1c7700  0 log_channel(audit) log [INF] : from='client.? ' entity='client.admin' cmd=[{"pool2": "cephfs.storage.data", "prefix": "osd pool rm", "pool": "cephfs.storage.data", "yes_i_reall
y_really_mean_it": true}]: dispatch
2019-07-24 17:35:59.963 7f849e9cc700  1 mon.HOST@0(leader).osd e172742 do_prune osdmap full prune enabled
2019-07-24 17:35:59.987 7f84989c0700  1 mon.HOST@0(leader).osd e172743 e172743: 1216 total, 1216 up, 1216 in
2019-07-24 17:35:59.991 7f84989c0700  0 log_channel(audit) log [INF] : from='client.? ' entity='client.admin' cmd='[{"pool2": "cephfs.storage.data", "prefix": "osd pool rm", "pool": "cephfs.storage.data", "yes_i_real
ly_really_mean_it": true}]': finished
2019-07-24 17:35:59.991 7f84989c0700  0 log_channel(cluster) log [DBG] : osdmap e172743: 1216 total, 1216 up, 1216 in
2019-07-24 17:36:00.583 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:36:00.583 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.26470030 IP:0/2144602860' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:36:01.951 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "status"} v 0) v1
2019-07-24 17:36:01.951 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.? IP:0/3286028704' entity='client.admin' cmd=[{"prefix": "status"}]: dispatch
2019-07-24 17:36:05.355 7f849c1c7700  0 mon.HOST@0(leader) e12 handle_command mon_command({"prefix": "health", "detail": "detail"} v 0) v1
2019-07-24 17:36:05.355 7f849c1c7700  0 log_channel(audit) log [DBG] : from='client.26444613 IP:0/2957845693' entity='client.admin' cmd=[{"prefix": "health", "detail": "detail"}]: dispatch
2019-07-24 17:36:05.355 7f849c1c7700  1 mon.HOST@0(leader).mds e43347 Client subscribed for legacy filesystem but none is configured
2019-07-24 17:36:05.359 7f849c1c7700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f849c1c7700 thread_name:ms_dispatch

 ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)
 1: (()+0x12890) [0x7f84a860d890]
 2: (()+0x2b142e) [0x7f84a97e142e]
 3: (PGMapDigest::get_statfs(OSDMap&, boost::optional<long>) const+0xd8) [0x7f84a9b66918]
 4: (MgrStatMonitor::preprocess_statfs(boost::intrusive_ptr<MonOpRequest>)+0x3be) [0x5654367ef28e]
 5: (MgrStatMonitor::preprocess_query(boost::intrusive_ptr<MonOpRequest>)+0x153) [0x5654367ef683]
 6: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x25a) [0x56543670673a]
 7: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x891) [0x56543660fa01]
 8: (Monitor::_ms_dispatch(Message*)+0xdbb) [0x56543661138b]
 9: (Monitor::ms_dispatch(Message*)+0x26) [0x56543663e5d6]
 10: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x26) [0x56543663b716]
 11: (DispatchQueue::entry()+0x1a49) [0x7f84a99dfa49]
 12: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f84a9a8a88d]
 13: (()+0x76db) [0x7f84a86026db]
 14: (clone()+0x3f) [0x7f84a77e888f]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

From then on the MONs kept crashing right after they were started or at the latest after they tried to form a quorum.

During election, I got these messages:

2019-07-20 11:02:24.827 7eff4e6d2700  1 mon.HOST@0(electing) e10 handle_auth_request failed to assign global_id
2019-07-20 11:02:25.479 7eff4e6d2700  1 mon.HOST@0(electing) e10 handle_auth_request failed to assign global_id
2019-07-20 11:02:27.243 7eff4e6d2700  1 mon.HOST@0(electing) e10 handle_auth_request failed to assign global_id

In the beginning, there were also messages about clients being subscribed to a legacy FS (although I unmounted all clients), but those disappeared after a while.

The only way I could recover the cluster both times was to stop all services (including OSDs to prevent or at least minimize actual data loss), make a backup copy of the /var/lib/ceph/mon folder, and then manipulate the MON map to form a quorum with fewer MONs. The first time this happened, I could bootstrap again from one MON and then let the others rejoin with a clean slate. The second time I managed to start up three of them without crashes and then reset the other two. To get the OSDs working again, I had to set the noout and reboot them a few times.

After this procedure, the cluster appears to be in a healthy state again, though I don't know if all objects are consistent.

History

#1 Updated by Patrick Donnelly 8 days ago

  • Project changed from fs to RADOS
  • Subject changed from Deleting a CephFS and its pools causes MONs to crash to mon: deleting a CephFS and its pools causes MONs to crash
  • Priority changed from Normal to Urgent
  • Target version set to v15.0.0
  • Backport set to nautilus,mimic
  • Component(RADOS) Monitor added

#2 Updated by Greg Farnum 7 days ago

Can you include exactly what commands you ran? Did you still have clients mounted while deleting the FS?

#3 Updated by Greg Farnum 7 days ago

[14:11:31] <@gregsfortytwo> batrick: looking at tracker.ceph.com/issues/41228 and it's got a lot going on but part of it appears to be MDSMonitor issues
[14:11:49] <@gregsfortytwo> notice the "Client subscribed for legacy filesystem but none is configured" outputs
[14:14:13] <batrick> It looked to me like get_statfs is loading the value from boost::optional<> without checking it is valid
[14:14:41] <batrick> it does so in one code path but not another
[14:23:23] <@gregsfortytwo> ah you're right, but that seems like it's a follow-on issue (in that the client is confused and thinks there's still an FS to be looking at to begin with)

Referring to "PGMapDigest::get_statfs" and its "boost::optional<int64_t> data_pool" parameter

#4 Updated by Janek Bevendorff 7 days ago

I think the first time I used the standard mimic workflow of mds fail and once all MDSs are stopped, fs remove. The problems started kicking in once I deleted the actual pools, although I don't know if it would have happened otherwise had I waited longer.

The second time around I set joinable to false on the FS, failed all MDS, then used fs remove. The MONs started crashing half a minute or so after deleting the metadata pool again.

I think at least once I still had a client mounted accidentally, but the MONs kept crashing even after all were definitely unmounted. The "subscribed to legacy FS" messages stopped appearing after a (long) while, but the MONs where still crashing with the same stacktrace and I could only recover by bootstrapping from a degraded quorum.

#5 Updated by Greg Farnum 1 day ago

  • Assignee set to Greg Farnum

Okay so we definitely need to figure out the PGStats fix, and try to dig up the appropriate FS bug as a separate ticket. I'll try and do both of those.

#6 Updated by Greg Farnum 1 day ago

  • Project changed from RADOS to fs
  • Assignee deleted (Greg Farnum)

Okay, the problem is not what I first thought.

The branch where we look at the data pool unconditionally is only taken after we've verified it is valid.

My guess is that the compiler inlines PGMapDigest::get_pool_free_space into PGMapDigest::get_statfs, and the problem arises because
  • the data pool has been deleted in the OSD Map
  • get_statfs looks up the data pool in pg_pool_sum, which still has it because the PGs have not been fully removed from the OSDs yet
  • get_pool_free_space assumes the lookup in the OSDMap succeeds

I'm moving this in to the FS tracker because all this functionality is used solely by them and there are several choices for resolving it, but I'm not sure where the best/easiest break point will be.

Janek, it looks like you can work around this be being more careful to unmount all your CephFS clients before deleting the CephFS. Once the request is in the system it can take some doing to remove it, but as long as they're gone before the pools get removed you'll be okay.

Also available in: Atom PDF