Project

General

Profile

Actions

Bug #20535

closed

mds segmentation fault ceph_lock_state_t::get_overlapping_locks

Added by Webert Lima almost 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
jewel
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The Active MDS crashes, all clients freeze and the standby (or standby-replay) daemon takes hours to recover.

ceph version 10.2.2-1-gad1a6d7 (ad1a6d77eda05dd1a0190814022e73a56f630117)
1: (()+0x4ec572) [0x563d0e1de572]
2: (()+0x10340) [0x7f7023876340]
3: (ceph_lock_state_t::get_overlapping_locks(ceph_filelock const&, std::list<std::_Rb_tree_iterator<std::pair<unsigned long const, ceph_filelock> >, std::allocator<std::_Rb_tree_iterator<std::pair<unsigned long const, ceph_filelock> > > >&, std::list<std::_Rb_tree_iterator<std::pair<unsigned long const, ceph_filelock> >, std::allocator<std::_Rb_tree_iterator<std::pair<unsigned long const, ceph_filelock> > > >)+0x21) [0x563d0e301c51]
4: (ceph_lock_state_t::is_deadlock(ceph_filelock const&, std::list<std::_Rb_tree_iterator<std::pair<unsigned long const, ceph_filelock> >, std::allocator<std::_Rb_tree_iterator<std::pair<unsigned long const, ceph_filelock> > > >&, ceph_filelock const
, unsigned int)+0x655) [0x563d0e3034b5]
5: (ceph_lock_state_t::add_lock(ceph_filelock&, bool, bool, bool*)+0x3da) [0x563d0e306b8a]
6: (Server::handle_client_file_setlock(std::shared_ptr<MDRequestImpl>&)+0x409) [0x563d0df58d19]
7: (Server::dispatch_client_request(std::shared_ptr<MDRequestImpl>&)+0x9a1) [0x563d0df84e91]
8: (Server::handle_client_request(MClientRequest*)+0x47f) [0x563d0df854ef]
9: (Server::dispatch(Message*)+0x3bb) [0x563d0df8973b]
10: (MDSRank::handle_deferrable_message(Message*)+0x80c) [0x563d0df0fc0c]
11: (MDSRank::_dispatch(Message*, bool)+0x1e1) [0x563d0df18d01]
12: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x563d0df19e55]
13: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x563d0df01c23]
14: (DispatchQueue::entry()+0x78b) [0x563d0e3c705b]
15: (DispatchQueue::DispatchThread::entry()+0xd) [0x563d0e2b5e0d]
16: (()+0x8182) [0x7f702386e182]
17: (clone()+0x6d) [0x7f7021dc547d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Usage scenario:

  • Single Active MDS
  • Single FS
  • Multiple Mountpoints
  • 2 Subtrees
  • Used for Dovecot LMTP/IMAP/POP3

~# ceph mds dump
dumped fsmap epoch 18434
fs_name cephfs
epoch 18434
flags 0
created 2016-08-01 11:07:47.592124
modified 2017-07-03 10:32:44.426431
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
last_failure 0
last_failure_osd_epoch 12530
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2}
max_mds 1
in 0
up {0=1042087}
failed
damaged
stopped
data_pools 8,9
metadata_pool 7
inline_data disabled
1042087: 10.0.2.3:6800/28835 'c' mds.0.18417 up:active seq 289417 (standby for rank 0)
1382729: 10.0.2.4:6800/22945 'd' mds.0.0 up:standby-replay seq 379 (standby for rank 0)

~# ceph -v
ceph version 10.2.2-1-gad1a6d7 (ad1a6d77eda05dd1a0190814022e73a56f630117)

~# ceph osd pool ls detail
pool 5 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 3051 lfor 1736 flags hashpspool tiers 6 read_tier 6 write_tier 6 stripe_width 0
removed_snaps [1~3]
pool 6 'rbd_cache' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 12440 flags hashpspool,incomplete_clones tier_of 5 cache_mode writeback hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 1200s x0 decay_rate 0 search_last_n 0 stripe_width 0
removed_snaps [1~3]
pool 7 'cephfs_metadata' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 9597 flags hashpspool stripe_width 0
pool 8 'cephfs_data_ssd' replicated size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 9599 flags hashpspool stripe_width 0
pool 9 'cephfs_data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 9617 lfor 9610 flags hashpspool crash_replay_interval 45 tiers 10 read_tier 10 write_tier 10 stripe_width 0
pool 10 'cephfs_data_cache' replicated size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 128 pgp_num 128 last_change 12510 flags hashpspool,incomplete_clones tier_of 9 cache_mode writeback target_bytes 268435456000 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 0s x0 decay_rate 0 search_last_n 0 stripe_width 0

~# ceph fs ls
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data_ssd cephfs_data ]

:~# getfattr -d -m ceph.dir.* /srv/dovecot2/mail
getfattr: Removing leading '/' from absolute path names
  1. file: srv/dovecot2/mail
    ceph.dir.entries="1613"
    ceph.dir.files="0"
    ceph.dir.rbytes="18757341556329"
    ceph.dir.rctime="1499371837.09279502014"
    ceph.dir.rentries="5048251"
    ceph.dir.rfiles="4449946"
    ceph.dir.rsubdirs="598305"
    ceph.dir.subdirs="1613"
~# getfattr -d -m ceph.dir.* /srv/dovecot2/index
getfattr: Removing leading '/' from absolute path names
  1. file: srv/dovecot2/index
    ceph.dir.entries="1584"
    ceph.dir.files="0"
    ceph.dir.rbytes="52658042908"
    ceph.dir.rctime="1499371833.09531511882"
    ceph.dir.rentries="927423"
    ceph.dir.rfiles="582449"
    ceph.dir.rsubdirs="344974"
    ceph.dir.subdirs="1584"

Files

ceph-mds.tgz (490 KB) ceph-mds.tgz Webert Lima, 07/06/2017 08:19 PM

Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #20564: jewel: mds segmentation fault ceph_lock_state_t::get_overlapping_locksResolvedPatrick DonnellyActions
Actions

Also available in: Atom PDF