Project

General

Profile

Actions

Bug #16829

closed

ceph-mds crashing constantly

Added by Tomasz Torcz almost 8 years ago. Updated almost 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm using CEPH packages from Fedora 24: ceph-mds-10.2.2-2.fc24.x86_64

I've created simple cephfs once, stored some data inside. Then I removed the filesystem using "ceph fs rm" and recreated a new one. Now I cannot access this filesystem, ceph-mds process crash just after startup:

ceph-mds8599: starting mds.dashboardpc at :/0
ceph-mds8599: 2016-07-27 14:34:34.420858 7f27934bb700 1 log_channel(cluster) log [ERR] : replayed stray Session close event for client.14334 3.193.149.4:0/2612743178 from time 2016-06-05 09:23:48.286189, ignoring
ceph-mds8599: 2016-07-27 14:34:39.440266 7f27958ca700 -1 log_channel(cluster) log [ERR] : loaded dup inode 10000000000 [2,head] v18767 at /dane, but inode 10000000000.head v77080 already exists at /tmp2
ceph-mds8599: mds/CDir.cc: In function 'void CDir::try_remove_dentries_for_stray()' thread 7f27960cb700 time 2016-07-27 14:34:39.528175
ceph-mds8599: mds/CDir.cc: 699: FAILED assert(dn
>get_linkage()->is_null())
ceph-mds8599: ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
ceph-mds8599: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) [0x55f2f9cc72a0]
ceph-mds8599: 2: (CDir::try_remove_dentries_for_stray()+0x150) [0x55f2f9a83220]
ceph-mds8599: 3: (StrayManager::__eval_stray(CDentry*, bool)+0x8e9) [0x55f2f9a02589]
ceph-mds8599: 4: (StrayManager::eval_stray(CDentry*, bool)+0x22) [0x55f2f9a02a12]
ceph-mds8599: 5: (MDCache::scan_stray_dir(dirfrag_t)+0x16d) [0x55f2f99571ad]
ceph-mds8599: 6: (MDSInternalContextBase::complete(int)+0x20b) [0x55f2f9b0dffb]
ceph-mds8599: 7: (MDSRank::_advance_queues()+0x66b) [0x55f2f98b7a0b]
ceph-mds8599: 8: (MDSRank::ProgressThread::entry()+0x4a) [0x55f2f98b7f0a]
ceph-mds8599: 9: (()+0x75ca) [0x7f279f6bc5ca]
ceph-mds8599: 10: (clone()+0x6d) [0x7f279e0fbead]
ceph-mds8599: NOTE: a copy of the executable, or `objdump rdS <executable>` is needed to interpret this.
ceph-mds8599: 2016-07-27 14:34:39.531349 7f27960cb700 -1 mds/CDir.cc: In function 'void CDir::try_remove_dentries_for_stray()' thread 7f27960cb700 time 2016-07-27 14:34:39.528175
ceph-mds8599: mds/CDir.cc: 699: FAILED assert(dn
>get_linkage()->is_null())
ceph-mds8599: ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
ceph-mds8599: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) [0x55f2f9cc72a0]
ceph-mds8599: 2: (CDir::try_remove_dentries_for_stray()+0x150) [0x55f2f9a83220]
ceph-mds8599: 3: (StrayManager::__eval_stray(CDentry*, bool)+0x8e9) [0x55f2f9a02589]
ceph-mds8599: 4: (StrayManager::eval_stray(CDentry*, bool)+0x22) [0x55f2f9a02a12]
ceph-mds8599: 5: (MDCache::scan_stray_dir(dirfrag_t)+0x16d) [0x55f2f99571ad]
ceph-mds8599: 6: (MDSInternalContextBase::complete(int)+0x20b) [0x55f2f9b0dffb]
ceph-mds8599: 7: (MDSRank::_advance_queues()+0x66b) [0x55f2f98b7a0b]
ceph-mds8599: 8: (MDSRank::ProgressThread::entry()+0x4a) [0x55f2f98b7f0a]
ceph-mds8599: 9: (()+0x75ca) [0x7f279f6bc5ca]
ceph-mds8599: 10: (clone()+0x6d) [0x7f279e0fbead]
ceph-mds8599: NOTE: a copy of the executable, or `objdump rdS <executable>` is needed to interpret this.
audit8660: ANOM_ABEND auid=4294967295 uid=167 gid=167 ses=4294967295 subj=system_u:system_r:ceph_t:s0 pid=8660 comm="mds_rank_progr" exe="/usr/bin/ceph-mds" sig=6
ceph-mds8599: -451> 2016-07-27 14:34:34.420858 7f27934bb700 -1 log_channel(cluster) log [ERR] : replayed stray Session close event for client.14334 3.193.149.4:0/2612743178 from time 2016-06-05 09:23:48.286189, ignoring
ceph-mds8599: -329> 2016-07-27 14:34:39.440266 7f27958ca700 -1 log_channel(cluster) log [ERR] : loaded dup inode 10000000000 [2,head] v18767 at /dane, but inode 10000000000.head v77080 already exists at /tmp2
ceph-mds8599: 0> 2016-07-27 14:34:39.531349 7f27960cb700 -1 mds/CDir.cc: In function 'void CDir::try_remove_dentries_for_stray()' thread 7f27960cb700 time 2016-07-27 14:34:39.528175
ceph-mds8599: mds/CDir.cc: 699: FAILED assert(dn
>get_linkage()->is_null())
ceph-mds8599: ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
ceph-mds8599: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) [0x55f2f9cc72a0]
ceph-mds8599: 2: (CDir::try_remove_dentries_for_stray()+0x150) [0x55f2f9a83220]
ceph-mds8599: 3: (StrayManager::__eval_stray(CDentry*, bool)+0x8e9) [0x55f2f9a02589]
ceph-mds8599: 4: (StrayManager::eval_stray(CDentry*, bool)+0x22) [0x55f2f9a02a12]
ceph-mds8599: 5: (MDCache::scan_stray_dir(dirfrag_t)+0x16d) [0x55f2f99571ad]
ceph-mds8599: 6: (MDSInternalContextBase::complete(int)+0x20b) [0x55f2f9b0dffb]
ceph-mds8599: 7: (MDSRank::_advance_queues()+0x66b) [0x55f2f98b7a0b]
ceph-mds8599: 8: (MDSRank::ProgressThread::entry()+0x4a) [0x55f2f98b7f0a]
ceph-mds8599: 9: (()+0x75ca) [0x7f279f6bc5ca]
ceph-mds8599: 10: (clone()+0x6d) [0x7f279e0fbead]
ceph-mds8599: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
ceph-mds8599: * Caught signal (Aborted) *
ceph-mds8599: in thread 7f27960cb700 thread_name:mds_rank_progr
ceph-mds8599: ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
ceph-mds8599: 1: (()+0x514a2e) [0x55f2f9bb9a2e]
ceph-mds8599: 2: (()+0x10c30) [0x7f279f6c5c30]
ceph-mds8599: 3: (gsignal()+0x35) [0x7f279e02d6f5]
ceph-mds8599: 4: (abort()+0x16a) [0x7f279e02f2fa]
ceph-mds8599: 5: (ceph::__ceph_assert_fail(char const
, char const*, int, char const*)+0x26b) [0x55f2f9cc748b]
ceph-mds8599: 6: (CDir::try_remove_dentries_for_stray()+0x150) [0x55f2f9a83220]
ceph-mds8599: 7: (StrayManager::__eval_stray(CDentry*, bool)+0x8e9) [0x55f2f9a02589]
ceph-mds8599: 8: (StrayManager::eval_stray(CDentry*, bool)+0x22) [0x55f2f9a02a12]
ceph-mds8599: 9: (MDCache::scan_stray_dir(dirfrag_t)+0x16d) [0x55f2f99571ad]
ceph-mds8599: 10: (MDSInternalContextBase::complete(int)+0x20b) [0x55f2f9b0dffb]
ceph-mds8599: 11: (MDSRank::_advance_queues()+0x66b) [0x55f2f98b7a0b]
ceph-mds8599: 12: (MDSRank::ProgressThread::entry()+0x4a) [0x55f2f98b7f0a]
ceph-mds8599: 13: (()+0x75ca) [0x7f279f6bc5ca]
ceph-mds8599: 14: (clone()+0x6d) [0x7f279e0fbead]
ceph-mds8599: 2016-07-27 14:34:39.535832 7f27960cb700 -1
Caught signal (Aborted)
ceph-mds8599: in thread 7f27960cb700 thread_name:mds_rank_progr
ceph-mds8599: ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
ceph-mds8599: 1: (()+0x514a2e) [0x55f2f9bb9a2e]
ceph-mds8599: 2: (()+0x10c30) [0x7f279f6c5c30]
ceph-mds8599: 3: (gsignal()+0x35) [0x7f279e02d6f5]
ceph-mds8599: 4: (abort()+0x16a) [0x7f279e02f2fa]
ceph-mds8599: 5: (ceph::__ceph_assert_fail(char const
, char const*, int, char const*)+0x26b) [0x55f2f9cc748b]
ceph-mds8599: 6: (CDir::try_remove_dentries_for_stray()+0x150) [0x55f2f9a83220]
ceph-mds8599: 7: (StrayManager::__eval_stray(CDentry*, bool)+0x8e9) [0x55f2f9a02589]
ceph-mds8599: 8: (StrayManager::eval_stray(CDentry*, bool)+0x22) [0x55f2f9a02a12]
ceph-mds8599: 9: (MDCache::scan_stray_dir(dirfrag_t)+0x16d) [0x55f2f99571ad]
ceph-mds8599: 10: (MDSInternalContextBase::complete(int)+0x20b) [0x55f2f9b0dffb]
ceph-mds8599: 11: (MDSRank::_advance_queues()+0x66b) [0x55f2f98b7a0b]
ceph-mds8599: 12: (MDSRank::ProgressThread::entry()+0x4a) [0x55f2f98b7f0a]
ceph-mds8599: 13: (()+0x75ca) [0x7f279f6bc5ca]
ceph-mds8599: 14: (clone()+0x6d) [0x7f279e0fbead]
ceph-mds8599: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
ceph-mds8599: 0> 2016-07-27 14:34:39.535832 7f27960cb700 -1
Caught signal (Aborted) *
ceph-mds8599: in thread 7f27960cb700 thread_name:mds_rank_progr
ceph-mds8599: ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
ceph-mds8599: 1: (()+0x514a2e) [0x55f2f9bb9a2e]
ceph-mds8599: 2: (()+0x10c30) [0x7f279f6c5c30]
ceph-mds8599: 3: (gsignal()+0x35) [0x7f279e02d6f5]
ceph-mds8599: 4: (abort()+0x16a) [0x7f279e02f2fa]
ceph-mds8599: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x26b) [0x55f2f9cc748b]
ceph-mds8599: 6: (CDir::try_remove_dentries_for_stray()+0x150) [0x55f2f9a83220]
ceph-mds8599: 7: (StrayManager::__eval_stray(CDentry*, bool)+0x8e9) [0x55f2f9a02589]
ceph-mds8599: 8: (StrayManager::eval_stray(CDentry*, bool)+0x22) [0x55f2f9a02a12]
ceph-mds8599: 9: (MDCache::scan_stray_dir(dirfrag_t)+0x16d) [0x55f2f99571ad]
ceph-mds8599: 10: (MDSInternalContextBase::complete(int)+0x20b) [0x55f2f9b0dffb]
ceph-mds8599: 11: (MDSRank::_advance_queues()+0x66b) [0x55f2f98b7a0b]
ceph-mds8599: 12: (MDSRank::ProgressThread::entry()+0x4a) [0x55f2f98b7f0a]
ceph-mds8599: 13: (()+0x75ca) [0x7f279f6bc5ca]
ceph-mds8599: 14: (clone()+0x6d) [0x7f279e0fbead]
ceph-mds8599: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
systemd1: : Main process exited, code=killed, status=6/ABRT
systemd1: : Unit entered failed state.

  1. ceph fs ls
    name: CoEPH, metadata pool: CoEPH_metadata, data pools: [CoEPH_data_ec ]
  1. ceph fs dump
    dumped fsmap epoch 1081503
    e1081503
    enable_multiple, ever_enabled_multiple: 0,0
    compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table}

Filesystem 'CoEPH' (5)
fs_name CoEPH
epoch 1081503
flags 4
created 2016-06-02 09:47:20.833036
modified 2016-07-25 09:06:15.506987
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
last_failure 0
last_failure_osd_epoch 608353
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table}
max_mds 1
in 0,1
up {0=22021782,1=21854109}
failed
damaged
stopped
data_pools 3
metadata_pool 1
inline_data enabled
22021782: 3.193.150.32:6800/19218 'dashboardpc' mds.0.1081499 up:active seq 5
21854109: 3.193.148.63:6800/885 'switcheroo' mds.1.957438 up:active seq 8

  1. ceph df
    GLOBAL:
    SIZE AVAIL RAW USED %RAW USED
    5733G 3885G 1845G 32.18
    POOLS:
    NAME ID USED %USED MAX AVAIL OBJECTS
    rbd 0 142G 4.98 1293G 36578
    CoEPH_metadata 1 66134k 0 862G 94
    CoEPH_data_ec 3 858G 22.47 1725G 224082
    CoEPH_data_replicated 5 81496M 4.16 862G 25780

CoEPH_data_replicated is a caching pool for CoEPH_data_ec.

Actions

Also available in: Atom PDF