Project

General

Profile

Actions

Feature #8690

open

MDS: Allow some kind of recovery when pools are deleted out from underneath us

Added by Dmitry Smirnov almost 10 years ago. Updated almost 8 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Reviewed:
Affected Versions:
Component(FS):
Labels (FS):
Pull request ID:

Description

I had a secondary (cache) pool once connected to CephFS directory as follows:

ceph mds add_data_pool data_cache
cephfs /mnt/ceph/cached set_layout --pool {NN}

Pool got badly broken and due to all those uncatched OSD crashes the only viable option was to delete it in order to keep cluster operational. Prior to pool removal I've given the following command:

ceph mds remove_data_pool data_cache

Now when cluster is healthy MDSes are in endless crash-replay-active loop staying active only for several seconds before yet another crash:

mds/CInode.cc: In function 'virtual void C_Inode_StoredBacktrace::finish(int)' thread 7f4788d5c700 time 2014-06-28 14:08:29.922256
mds/CInode.cc: 1041: FAILED assert(r == 0)

 ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
 1: (()+0x3a5725) [0x7f478e2e5725]
 2: (Context::complete(int)+0x9) [0x7f478e0c65c9]
 3: (C_Gather::sub_finish(Context*, int)+0x217) [0x7f478e0c7dc7]
 4: (C_Gather::C_GatherSub::finish(int)+0x12) [0x7f478e0c7f02]
 5: (Context::complete(int)+0x9) [0x7f478e0c65c9]
 6: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xf3e) [0x7f478e34aa5e]
 7: (MDS::handle_core_message(Message*)+0xb3f) [0x7f478e0e8b7f]
 8: (MDS::_dispatch(Message*)+0x32) [0x7f478e0e8d72]
 9: (MDS::ms_dispatch(Message*)+0xab) [0x7f478e0ea75b]
 10: (DispatchQueue::entry()+0x58a) [0x7f478e51fc1a]
 11: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f478e43c07d]
 12: (()+0x80ca) [0x7f478d8960ca]
 13: (clone()+0x6d) [0x7f478c20bffd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

See full log attached (all cluster components are 0.80.1).

At the moment I'm trying to remove all files that are still visible under "/mnt/ceph/cached" but it goes very slowly as only few dozen files get removed before yet another MDS crash.


Files

ceph-mds.debstor.log.xz (120 KB) ceph-mds.debstor.log.xz Dmitry Smirnov, 06/27/2014 09:35 PM
Actions

Also available in: Atom PDF